Functional Group Fingerprints CNS Chemistry Wilmington, USA

Similar documents
Table 8.2 Detailed Table of Characteristic Infrared Absorption Frequencies

Molecular Similarity Searching Using Inference Network

Introduction. OntoChem

General Infrared Absorption Ranges of Various Functional Groups

ORGANIC CHEMISTRY. Fifth Edition. Stanley H. Pine

Aromatic Hydrocarbons

Identifying Functional Groups. Why is this necessary? Alkanes. Why is this so important? What is a functional group? 2/1/16

GENERAL METHODS OF ORGANIC CHEMISTRY; APPARATUS THEREFOR (preparation of carboxylic acid esters by telomerisation C07C 67/47; telomerisation C08F)

Look for absorption bands in decreasing order of importance:

The Basics of General, Organic, and Biological Chemistry

More information can be found in Chapter 12 in your textbook for CHEM 3750/ 3770 and on pages in your laboratory manual.

Chapter 2. Molecular Representations

PHARMACEUTICAL CHEMISTRY EXAM #1 Februrary 21, 2008

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Synthesis of Nitriles a. dehydration of 1 amides using POCl 3 : b. SN2 reaction of cyanide ion on halides:

Molecular Graphics. Molecular Graphics Expt. 1 1

Jonathan S. Mason,, Isabelle Morize, Paul R. Menard,*, Daniel L. Cheney, Christopher Hulme, and Richard F. Labaudiniere

Using NMR and IR Spectroscopy to Determine Structures Dr. Carl Hoeger, UCSD

Machine Learning Concepts in Chemoinformatics

Chem 1075 Chapter 19 Organic Chemistry Lecture Outline

How to Interpret an Infrared (IR) Spectrum

MOLECULAR REPRESENTATIONS AND INFRARED SPECTROSCOPY

ORGANIC - BROWN 8E CH INFRARED SPECTROSCOPY.

Infrared Characteristic Group Frequencies

GENERAL, ORGANIC AND BIOLOGICAL CHEMISTRY. JOHN R. AMEND Montana State University. BRADFORD P. MUNDY Colby College

Chemistry 11. Unit 10 Organic Chemistry Part I Introduction

Ligand-receptor interactions

CHEM 203. Midterm Exam 1 October 31, 2008 ANSWERS. This a closed-notes, closed-book exam. You may use your set of molecular models

Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods

Bio-elements. Living organisms requires only 27 of the 90 common chemical elements found in the crust of the earth, to be as its essential components.

Chapter 25: The Chemistry of Life: Organic and Biological Chemistry

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

Course Information. Instructor Information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Infra-red Spectroscopy

Chapter 2 Molecular Representations

Lecture 11. IR Theory. Next Class: Lecture Problem 4 due Thin-Layer Chromatography

ORGANIC - EGE 5E CH. 2 - COVALENT BONDING AND CHEMICAL REACTIVITY

Chapter 12: Carbonyl Compounds II

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Chapter 20: Carboxylic Acids

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

CHEMISTRY 1A Fall 2010 Final Exam Key

Introduc)on to Func)onal Groups in Organic Molecules

An alcohol is a compound obtained by substituting a hydoxyl group ( OH) for an H atom on a carbon atom of a hydrocarbon group.

Exam 1 (Monday, July 6, 2015)

Infrared Spectroscopy

Similarity methods for ligandbased virtual screening

QUALITATIVE ORGANIC CHEMICAL ANALYSIS

2Dstructuredrawing Chem314 Beauchamp

Chapter 9. Organic Chemistry: The Infinite Variety of Carbon Compounds. Organic Chemistry

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

12.1 The Nature of Organic molecules

Chapter 25 Organic and Biological Chemistry

SAR. Structure - Activity Relationships (alkoholy, amíny, aldehydy, ketóny, estery, amidy, kyseliny, uhľovodíky) 2/28/2016

Infrared Spectroscopy: How to use the 5 zone approach to identify functional groups

2. Separate the ions based on their mass to charge (m/e) ratio. 3. Measure the relative abundance of the ions that are produced

Montgomery County Community College CHE 132 Chemistry for Technology II 4-3-3

OXFORD H i g h e r E d u c a t i o n Oxford University Press, All rights reserved.

Fragment-based de novo Design

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

1.4A: Common functional groups in organic compounds

Chemistry 11. Organic Chemistry

Chapter 22. Organic and Biological Molecules

10. Amines (text )

Alkanes 3/27/17. Hydrocarbons: Compounds made of hydrogen and carbon only. Aliphatic (means fat ) - Open chain Aromatic - ring. Alkane Alkene Alkyne

Chapter 20 Carboxylic Acid Derivatives Nucleophilic Acyl Substitution

Similarity Search. Uwe Koch

Name: Score: /100. Part I. Multiple choice. Write the letter of the correct answer for each problem. 3 points each

Chemistry 1A Spring 1998 Exam #4 KEY Chapters 9 & 10

ORGANIC - BROWN 8E CH.1 - COVALENT BONDING AND SHAPES OF MOLECULES

Data Mining in the Chemical Industry. Overview of presentation

antidisestablishmenttarianism an-ti-dis-es-tab-lish-ment-ta-ri-an-ism

Chapter 20 Carboxylic Acid Derivatives. Nucleophilic Acyl Substitution

The PhilOEsophy. There are only two fundamental molecular descriptors

Chapter 17: Carbonyl Compounds II

Receptor Based Drug Design (1)

JBA 2018 Chemistry Exam 2. Name: Score: /100 = /80

Structure-Activity Modeling - QSAR. Uwe Koch

Unit 5: Organic Chemistry

Complete Volume List for Science of Synthesis Volumes

Chapter 18: Carbonyl Compounds II

Chemistry 343- Spring 2008

Sul Ross State University Syllabus for Organic Chemistry II: CHEM 3408 (Spring 2017)

ORGANIC - BRUICE 8E CH.3 - AN INTRODUCTION TO ORGANIC COMPOUNDS

Classifying Compounds in Public Databases

Development of a Structure Generator to Explore Target Areas on Chemical Space

Chemistry 234 Exam 3. The Periodic Table

The reuse of structural data for fragment binding site prediction

Suggested solutions for Chapter 29

Chimica Farmaceutica (Insegnamento Integrato di Chimica e Biotecnologie Farmaceutiche) Drug design (2)

Comprehensive Organic Functional Group Transformations

Organic Chemistry SL IB CHEMISTRY SL

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Loudon Chapter 20 & 21 Review: Carboxylic Acids & Derivatives CHEM 3331, Jacquie Richardson, Fall Page 1

Symmetric Stretch: allows molecule to move through space

Organic Chemistry Unit Review Package

Montgomery County Community College CHE 122 General Chemistry-Organic (For the Non-Science Major) 4-3-3

ORGANIC CHEMISTRY. Wiley STUDY GUIDE AND SOLUTIONS MANUAL TO ACCOMPANY ROBERT G. JOHNSON JON ANTILLA ELEVENTH EDITION. University of South Florida

Infrared Spectroscopy used to analyze the presence of functional groups (bond types) in organic molecules How IR spectroscopy works:

Transcription:

Functional Group Fingerprints CS Chemistry Wilmington, USA James R. Arnold Charles L. Lerman William F. Michne James R. Damewood American Chemical Society ational Meeting August, 2004 Philadelphia, PA

Functional Group Fingerprint bjectives Develop a 2D fingerprint searching method that uses medicinally relevant, custom defined, functional groups. Validate the method on a large dataset of biological targets and chemical classes. Examine approaches for enhancing accuracy, and reducing false positive rates, of 2D searches. Deploy method company-wide. Rapidly expand SAR in it/lead Identification. Subset the corporate collection for screening. Factor of 2 improvement in every 2D similarity search.

Many Target Classes Approached by Ligand-Based Methods The Impetus for Developing Functional Group Fingerprints Biochemical Classes of Drug Targets of Current Therapies: (Perhaps 70% of Targets Approached by Ligand-Based Methods) Receptors, 45% Enzymes, 28% Unknown, 7% DA, 2% uclear Receptors, 2% ormones & Factors, 11% Ion Channels, 5% Drews, J. Science 2000, 287, 1960-1964

Functional Groups are ne Aspect of Medicinal Chemistry Reasoning in Ligand-Based Design 2 Express one aspect of the knowledge of our most experienced people for wide use. Design is often partially driven by functional group features. These can be warheads, linkers, substituents for interaction with receptors - and they influence many molecular properties. Functional Group Fingerprints capitalize on proven Medicinal Chemistry approaches, and two-dimensional searches are widely used

Functional Group Fingerprint Based Classification and Similarity Searching -Classification based on 400 medicinally relevant functional groups -Classification translated into bit strings Imigran (1): GSK, 1.07 billion dollar treatment for migraine in 2000.

Functional groups are recognized algorithmically using SMARTS The exclusions make the functional group definitions specific and make the entire set as orthogonal as possible.

Functional groups are Defined to Minimize verlap Between Definitions The above imide is defined as an imide.. not two carbonyls, two amides and an amine. rthogonal Functional group definitions allow specific functional groups to be related to activity.

Most common functional group classes in 2003 MDDR Functional Group Frequency Functional Group Frequency aromatic nitrogen 9,609 3 o amine, not a arom. 8,493 2 o alcohol 8,104 aryl halide 7,896 2 o amide 7,805 acyclic ether, a arom. 7,262 carboxylic acid 6,464 alkene 6,044 carboxylic ester aromatic alcohol 5,199 3,691 cyclic ether aromatic sulfur, thio. 4,778 3,485 2 o amine, not a arom. 3,281 aromatic sulfur 3,185 3 o amide 3,113 acyclic ether 3,018 3 o amine, a arom 2,938 imidazole, fused, no 2,737 1 o alcohol 2,662 1 o amine, a arom. 2,488 1 o amine, not a arom. 2,474 acetal ketal 2,177 beta lactam, fused 1,931 3 o alcohol 1,921 3 o lactam 1,814 aromatic -, no 1,709 a b unsaturated acid 1,666 2 o amine, a arom. 1,634 lactone 1,586 cyclic ether a 1 arom 1,552 cyclic thioether 1,552 imidazole, fused, w/ 1,533 ketone 1,513 ketone, a arom. 1,476 aromatic ketone 1,385 aromatic w/ 1,326 aromatic oxygen 1,283 acyclic thioether a arom 1,240 a b unsaturated ester 1,188 acyclic thioether 1,166 1 o amide 1,146 2 o lactam 1,136 oxime ether 1,135 trihalide 1,135 nitrile 1,122 sulfonamide, 1,075 urethane, 1,071 urea 864 General categories are shown, actual functional group classifications are more specific.

Classification Quality: Coverage and verlap of Functional Group Definitions Coverage: All heteroatoms in molecule are classified. verlap: A heteroatom in molecule classified in > 1 functional group. % Coverage and verlap 100 90 80 70 60 50 40 30 20 Ideal Coverage CMC = 8,545 MDDR = 135,342 MedCh = 145,158 10 0 CMC MDDR MedChem Testing in medicinally relevant databases. Roughly 90% coverage and 10% overlap. Ideal verlap

Biological Validation: 538 Target Classes in MDDR Active compounds randomly divided into test and training Each Target Class had > 10 actives, or not included n average: 473 actives in 94 clusters* (Daylight) for each class Compounds in MDDR: 4.5 functional groups is median % Compounds # Functional Groups MDDR (Cumulative) # Cpds and # Clust in Tgt. Classes 100 90 80 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 # clusters at Tanimoto 0.3 3000 2500 2000 1500 1000 500 0 0 2000 4000 6000 8000 10000 12000 14000 # Functional Groups # Compounds: 537 Target Classes * Clusters generated with Daylight fingerprints at Tanimoto = 0.3

Tanimoto Scores From Functional Groups Tanimoto based on presence of functional group (binary) or counts (count) Count Tanimoto (C. Lerman) S S 1 2 B1 = FG in mol 1 B2 = FG in mol 2 BC = FG common to mol 1 and mol2 T dist = (B1 + B2-2 * BC) / (B1 + B2 - BC) S 3 Distance Matrix 1 2 3 4 1 ----.25 0.60 0.67 S 4 2 ---- 0.67 0.50 3 ---- 0.20 4 ---- ne functional group difference = distance 0.2-0.25

Average Percentage Actives Recovered 538 Target Classes in MDDR 2003 Actives in each target class randomly divided test & training. Recovery of test set using training set is graphed. % Actives Retreived 100 90 80 70 60 50 40 30 20 Binary Counts Daylight Consensus Random Recovery Rates Top Top Top Top 100 500 1,000 5,000 Bin 25.7 49.6 59.4 75.8 Ct 31.4 54.3 63.1 78.1 Day 38.2 56.4 68.3 82.2 Cons 37.7 65.0 74.5 87.9 > 60% Actives in top 1% DBase 10 0 0 20000 40000 60000 80000 100000 120000 Ranked MDDR MDDR 2003 > 135,000 cpds.

Tanimoto Enrichment Rate Analysis 538 Target Classes in MDDR 2003 Actives in each target class randomly divided test & training Recovery of test set using training set is graphed Enrichment Rate Equation A = # actives at Tanimoto B = # cpds total at Tanimoto ADB = total actives in DBase DB = total cpds in Dbase E = (A / B) / (ADB / DB) Enrichments normalized for the number of actives in target class.

Example Biological Categories: MDDR 2003 umber Cpds With Biology In Test Set umber Cpds With Biology Retrieved Daylight 0.3 umber Cpds Total Retrieved Daylight 0.3 En h an ce Ratio 0.3 Daylight umber Cpds With Biology Retrieved FGroup 0.3 umber Cpds Total Retrieved FGroup 0.3 En h an ce Ratio 0.3 FGroup umber Cpds With Biology Retrieved Consens 0.3 umber Cpds Total Retrieved Consens 0.3 Enhance Ratio 0.3 Consens Biology eurokinin K2 Antagonist 147 132 3112 38.3 110 1488 66.7 106 469 203.9 eurokinin K3 Antagonist 25 23 220 566.0 17 517 178.0 16 81 1069.4 Protein Kinase C Inhibitor 225 199 1619 73.9 151 1056 86.0 145 503 173.4 IV-1 Protease Inhibitor 457 411 2547 47.8 336 1575 63.2 327 899 107.7 5T1B Agonist 24 18 322 315.2 15 202 418.8 12 120 563.9 mglur1 Antagonist 20 13 95 926.0 8 298 181.7 7 47 1007.9 Thrombin Inhibitor 555 493 3571 33.7 417 1698 59.9 399 1004 96.9 Factor Xa Inhibitor 379 293 2307 45.4 238 1326 64.1 215 546 140.6 GABA-B Receptor Antagonis 21 15 33 2929.5 8 33 1562.4 7 16 2819.6 Adrenergic_beta_Blocker 89 70 494 215.5 76 618 187.0 67 228 446.9 Potassium_Channel_Blocke 132 110 644 175.1 88 1512 59.7 86 345 255.6 Sodium_Channel_Blocker 97 64 419 213.1 52 1775 40.9 49 220 310.8 ACE_Inhibitor 266 232 3185 37.1 198 1169 86.2 182 642 144.2 Estrogen_Receptor_Modulat 59 46 466 226.4 41 953 98.7 37 227 373.9 Dopamine_D2_Agonist 80 71 434 276.8 53 1323 67.8 52 187 470.4 Dopamine_D2_Antagonist 244 187 1283 80.8 158 3147 27.9 146 546 148.3 Thymidylate_Synthetase_Inh 128 120 493 257.4 106 365 307.1 103 274 397.5 Dihydrofolate_Reductase_Inh 72 61 322 356.1 61 340 337.2 58 255 427.6 Renin_Inhibitor 599 575 3164 41.1 527 1766 67.4 516 1337 87.2 Trypsin_Inhibitor 51 40 485 218.9 32 133 638.5 29 93 827.5 Antiviral 1635 1377 11080 10.3 1167 7345 13.2 1116 4121 22.4 Antiinflammatory 2363 1915 13221 8.3 1495 14285 6.0 1376 5603 14.1

Consensus Approach: verlap of True Positives from FG Count and Daylight The circles are drawn to scale.

Performance of the FG Count, Daylight and Consensus Approaches in Terms of True and False Positives umber of Compounds 1500 1250 1000 750 500 FG = FG Count D = Daylight C = Consensus 250 0 FG, 0.0 D, 0.0 C, 0.0 FG, 0.1 D, 0.1 C, 0.1 FG, 0.2 D, 0.2 C, 0.2 FG, 0.3 D, 0.3 C, 0.3 Tanimoto Distance for Methods umber of true and false positives for the Functional Group Fingerprint counts, Daylight fingerprint and consensus (logical AD ) approaches for the five hundred and thirty eight biological target classes at Tanimoto distances of 0.0, 0.1, 0.2, and 0.3. The three methods are binned at the various Tanimoto distances and are reported in the order of counts, Daylight, consensus, and are listed as C, D and A, respectively.

Functional Group Fingerprint Conclusions Developed a 2D fingerprint searching method that uses medicinally relevant, custom defined, functional groups. Validated the method on a large dataset of biological targets and chemical classes (538 target classes, 473 cpds 90 clust per class). Factor of 2 gain in accuracy through reduction of false positives. Deploy method company-wide. Rapidly expand SAR in it/lead Identification. Subset the corporate collection for screening. Factor of 2 improvement in 2D searches. Acknowledgement: Dave Cosgrove AstraZeneca