Tautomerism in chemical information management systems Dr. Wendy A. Warr http://www.warr.com
Tautomerism in chemical information management systems Author: Wendy A. Warr DOI: 10.1007/s10822-010-9338-4 Perspectives Issue Devoted to Tautomerism in Molecular Design Edited by Yvonne Martin
Chemical Information Aspects Registration procedures Storage of tautomers Exact and substructure search Depiction of results
Software and Database Vendors Accelrys ACD/Labs Beilstein/Reaxys CambridgeSoft CAS CCDC CCG ChemAxon ChemoSoft ChemSpider CWM Global Search Daylight Dialog IDBS InfoChem InhibOx John Wiley & Sons Molecular Networks NCI/CADD OpenEye Thieme PubChem Questel Schrödinger SciTouch Symyx Thomson Reuters Xemistry (CACTVS)
Not Included ABCD (J&J) BioRad (KnowItAll) CDK emolecules SimBioSys Tripos ZINC
Chemical Structure Representation
Morgan Algorithm Morgan, H. L. The generation of a unique machine description for chemical structures - a technique developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5(2),107-113.
CTfile
SMILES CC1=CC(Br)CCC1
SMILES OpenEye canonical SMILES Daylight canonical SMILES SciTouch canonical SMILES ChemAxon canonical SMILES
IUPAC International Chemical Identifier (InChI) InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 InChIKey=RYYVLZVUVIJVGH-UHFFFAOYSA-N
NCI/CADD Identifiers (CACTVS Hashcodes) 9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-78 9850FD9F9E2B4E25-uuuuu-01-27
Definition of Tautomerism M=Q-ZH HM-Q=Z Q = C, N, S, P, Sb, As, Se, Te, Br, Cl or I M, Z = trivalent N, bivalent O, S, Se or Te [Either M or Z = C] H = H, D, T [or + or -] Extended system, ring/chain, etc.
Straightforward 1,3 shift 1,5 shift 1,7 shift
More Complex 1 2 3 4
Degree of Unsaturation
Ring Opening
Fluxional structures
Mesomers Different NEMA Keys NEMA Key=6P1SUP7NENNHV4V61WRZP5S2ES8NZF InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-4H3;1H/q+1;/p-1 NEMA Key=CKGEHDBX4KZPW3VV6DXTVM5BB689GB InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-4H3;1H/q+1;/p-1 CXKWCBBOMKCUKX-UHFFFAOYSA-M Same InChIKey
Tautomers Different NEMA Keys NEMA Key=CU3YSHT7DX8KUTKGRNS5GH3B4UQBFA InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2, (H3,11,13,14,17)/t5-,6+/m0/s1 NEMA Key=CTDBHWQW8CQJHC3S5AH6X4QJWAVMKD InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2, (H3,11,13,14,17)/t5-,6+/m0/s1 KITPKMKMNZXFDK-NTSWFWBYSA-N Same InChIKey
Unreasonable
5 Multiple Overlapping 6 7 8
Overlapping 9 10 11
Registration
Registration Objectives Corporate database Stock room database Predicting spectra Reaction mechanisms Ultra-low temperature lab
Registration Options Enumerate all tautomers; store all tautomers Calculate canonical tautomer; store canonical tautomer Enumerate all tautomers Rank [as major, minor, or conditions dependent (ACD/Labs)] Allow user to choose which form to store
Epik Schrödinger Enumerate all energetically reasonable tautomers Enumerate all energetically reasonable ionization states Store all tautomers and ionization states Canvas identifies duplicates by canonical SMILES
Are A and B Tautomers? If A and B are identical, accept If the total number of hydrogen atoms or charges is not identical, reject Examine the heavy-atom skeletons; reject if not identical Enumerate all tautomers for A; if any is the same as B, accept Enumerate all tautomers for B; if any is the same as A, accept Otherwise reject.
Enumeration of tautomers Sayle, R. A.; Delany, J. J. Canonicalization and enumeration of tautomers. Paper presented at EuroMUG99, Cambridge, UK, 28-29 Oct 1999 Oellien, F.; Cramer, J.; Beyer, C.; Ihlenfeldt, W-D.; Selzer, P. M. (2006) The impact of tautomer forms on pharmacophore-based virtual screening. J. Chem. Inf. Model. 2006, 46, 2342-2354. Greenwood, J. R.; Calkins, D.; Sullivan, A. P.; Shelley, J. C. Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. J. Comput.-Aided Mol. Des. 2010, published online March 31, 2010
Storage of Tautomers
Concept A Generate all tautomers Impossible to calculate lowest energy tautomer Use rules for consistent generation [of a low energy form] Store this form [as canonical SMILES]
Concept B Generate all tautomers Impossible to calculate lowest energy tautomer Store all tautomers [Store all protomers]
Structure Search
Structure Search Exact matches done by flexmatch, SMILES, hashcodes etc. Substructure search Hard to perceive all tautomers for a substructure
Approaches to Substructure Search Address problem at registration stage store all tautomers Address problem at search stage enumerate database structures on the fly or enumerate query structure or user takes care specifying query Combine methods Ignore the problem
Depiction of Results
Depicting Results Display input, registered structure Display matched tautomer good approach if substructure is highlighted Display standard form Let user choose Experimental results match displayed tautomer
ChemAxon Approaches Normalize the structure ( generic tautomers ) Allow for tautomers at search time Choose a preferred tautomer Customize preferences in Standardizer
ChemAxon Tautomerization Plugin Generates all, dominant and canonical tautomers Calculates canonical tautomer by empirical rules Tries to make canonical tautomer the dominant tautomer (includes pk a filter) Handles dearomatization and stereochemistry
Customization Choose dominant tautomer set operating ph set maximum distance (# bonds) of a single proton migration protect structural features aromaticity, charge, stereochemistry, stable functional groups exclude unstable antiaromatic compounds
ChemAxon software Stores canonical form, or all tautomers Enumerates query tautomers (as far as possible) Usually displays structure originally input Optionally displays standard tautomer
Observations Computational chemistry companies does the ligand match the receptor? ligand preparation pk a algorithms, rules, energetics rigorous approaches Informatics companies does the compound match the patent? building registries and inventories graph theory examples (structures) pragmatic approaches Hybrid companies
Acknowledgments All 28 vendors including ChemAxon Jonathan Brecher Geoff Skillman Russ Hillard Keith Taylor