Analysis of Activity Landscapes, Activity Cliffs, and Selectivity Cliffs. Jürgen Bajorath Life Science Informatics University of Bonn

Similar documents
Antonio de la Vega de León, Jürgen Bajorath

Antonio de la Vega de León, Jürgen Bajorath

Design and characterization of chemical space networks

Similarity Search. Uwe Koch

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Machine Learning Concepts in Chemoinformatics

Data Mining in the Chemical Industry. Overview of presentation

Ligand-receptor interactions

Characterizing Activity Landscapes Using an Information-Theoretic Approach

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Introduction. OntoChem

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Analysing Multitarget Activity Landscapes using Protein- Ligand Interaction Fingerprints: Interaction cliffs

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Docking. GBCB 5874: Problem Solving in GBCB

Chemistry 14C Winter 2017 Final Exam Part A Page 1

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

Design and Synthesis of the Comprehensive Fragment Library

Solutions and Non-Covalent Binding Forces

Drug Informatics for Chemical Genomics...

Supporting Information. Strategy To Discover Diverse Optimal Molecules in the Small Molecule Universe

Statistical concepts in QSAR.

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Analysis of Random Fragment Profiles for the Detection of Structure-Activity Relationships

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

UniStra activities within the BigChem project:

Bridging the Dimensions:

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Genetic Mutations. Jing Lu

* Author to whom correspondence should be addressed; Tel.: ; Fax:

Structure-Activity Modeling - QSAR. Uwe Koch

Plan. Day 2: Exercise on MHC molecules.

Molecular Similarity Searching Using Inference Network

DOCKING TUTORIAL. A. The docking Workflow

Table 8.2 Detailed Table of Characteristic Infrared Absorption Frequencies

Chemical library design

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Chapter 19. Organic Chemistry. Carbonyl Compounds III. Reactions at the a-carbon. 4 th Edition Paula Yurkanis Bruice

Receptor Based Drug Design (1)

Molecular Descriptors Theory and tips for real-world applications

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)

Data Quality Issues That Can Impact Drug Discovery

ACIDS AND BASES. Note: For most of the acid-base reactions, we will be using the Bronsted-Lowry definitions.

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Amino Acids and Proteins at ZnO-water Interfaces in Molecular Dynamics Simulations: Electronic Supplementary Information

Chapter 8: Introduction to QSAR

Emerging patterns mining and automated detection of contrasting chemical features

Structural biology and drug design: An overview

The Schrödinger KNIME extensions

5.1. Hardwares, Softwares and Web server used in Molecular modeling

CHEM 109A Organic Chemistry

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Comparative Network Analysis

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Experimental Design and Data Analysis for Biologists

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Name. Chapter 4 covers acid-base chemistry. That should help you get going.

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Improved Machine Learning Models for Predicting Selective Compounds

The reuse of structural data for fragment binding site prediction

Biological Macromolecules

Comparing whole genomes

2. Polar Covalent Bonds: Acids and Bases

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Amino Acids and Peptides

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

What is an enzyme? Lecture 12: Enzymes & Kinetics I Introduction to Enzymes and Kinetics. Margaret A. Daugherty Fall 2004 KEY FEATURES OF ENZYMES

Advances in diversity profiling and combinatorial series design

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

Mapping and Analysis for Spatial Social Science

Towards Physics-based Models for ADME/Tox. Tyler Day

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Study guide for AP test on TOPIC 1 Matter & Measurement

arxiv: v1 [cs.ds] 25 Jan 2016

Improved Descriptors for the Quantitative Structure Activity Relationship Modeling of Peptides and Proteins

Functional Characterization and Topological Modularity of Molecular Interaction Networks

Targeting protein-protein interactions: A hot topic in drug discovery

The Nature of Geographic Data

Nonlinear QSAR and 3D QSAR

SAM Teacher s Guide Protein Partnering and Function


Virtual screening for drug discovery. Markus Lill Purdue University

Xia Ning,*, Huzefa Rangwala, and George Karypis

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Fragment Hotspot Maps: A CSD-derived Method for Hotspot identification

Math/Stat Classification of Spatial Analysis and Spatial Statistics Operations

Module 2 Acids and Bases. Lecture 3 Acids and Bases

Medicinal Chemistry/ CHEM 458/658 Chapter 3- SAR and QSAR

Estimating Global Bank Network Connectedness

Transcription:

Analysis of Activity Landscapes, Activity Cliffs, and Selectivity Cliffs Jürgen Bajorath Life Science Informatics University of Bonn

Concept of Activity Landscapes Activity landscapes : biological activity hypersurfaces within chemical space; visualized as a 2D projection of chemical space with compound potency added as the third dimension

Idealized Activity Landscapes and SARs Continuous SAR gradual changes in structure result in moderate changes in activity Discontinuous SAR small changes in structure have dramatic effects on activity rolling hills activity cliffs

Activity Landscapes and SARs Variable Activity Landscapes Heterogeneous SARs Combination of continuous and discontinuous SAR components rugged regions smooth regions

SAR Index (SARI) Scoring Numerical SAR analysis function to support activity landscape assessment (P, potency; sim, pairwise 2D similarity) continuity score emphasizes structurally diverse compounds having similar potency balances two parts GLOBAL SCORE all possible compound pairs discontinuity score emphasizes similar compounds with large potency differences

Multi-Level Discontinuity Scoring Activity class A Cluster C cluster discontinuity scores local SAR character presence of activity cliffs SARI disc. score global SAR character of an activity class Compound i compound discontinuity scores contributions of molecules to local SAR discontinuity (A) (i) (C) (i,j) (j) A C

From Idealized to Calculated 3D Activity Landscapes Cathepsin S inhibitors potency Coordinate-free chemical space (MACCS Tc distances) 2D projection through multidimensional scaling x/y-plane: chemical space projection z-axis: interpolated potency values color code: surface elevation

Activity Landscapes 2D Projection Squalene synthase inhibitors Reference space - MACCS Tc distances - MDS on Euclidean fingerprint distances Coordinate scaling - range scaling to [0,1] - multiplication with maximum fingerprint dissimilarity

3D Surface Generation 1. Add potency as 3 rd dimension 2. Interpolate on a regular grid 3. Generate 3D surface

Activity Landscapes 3D Projection Squalene synthase 80 x 80 grid Landscapes are comparable for different data sets - unified axes range - colored according to interpolated surface elevation: 5.8 Transparent areas are not populated with data points 8.7

Landscape Model Optimization Correlation between fingerprint dissimilarity and calculated 2D coordinate distances (80+%) interpolated 3D surface elevation and measured potency values (95+%)

Continuous Landscape Data peaks Lipoxygenase MACCS Euclidean distance SARI: 0.92 Mostly weakly active molecules Densely populated Data peaks do not correspond to SAR discontinuity

Discontinuous Landscape Acetylcholine esterase 1 0.02 nm 1 2 2 4 3 0.4 nm 3 590 nm 4 MACCS Euclidean distance SARI: 0.17 Cliff regions: similar molecules with high/low potency 32 µm

Influence of Molecular Representations 1 1 2 4 3 2 3 1 2 4 3 4 MACCS TGT Molprint2D 1 3 0.02 nm 0.4 nm 2 4 590 nm 32 µm

Network-like Similarity Graph (NSG) Annotated graph representation of similarity relationships in compound data sets Nodes: represent compounds in the data set Edges: connect nodes with high pairwise similarity Clusters: Hierarchical clustering (gray background) Layout: Fruchterman-Reingold

NSG Information Layers Edges Tc > x (eg 0.65) Tc < x 1 similarity relationships 2 potency distribution 3 compound discont. scores 4 clusters 5 cluster scores

NSG Information Layers Node color 1 similarity relationships 2 potency distribution 3 compound discont. scores 4 clusters 5 cluster scores

NSG Information Layers Node size high compound score low compound score 1 similarity relationships 2 potency distribution 3 compound discont. scores 4 clusters 5 cluster scores

NSG Information Layers Cluster islands clusters obtained by Ward s hierarchical clustering are highlighted 1 similarity relationships 2 potency distribution 3 compound discont. scores 4 clusters 5 cluster scores

NSG Information Layers Annotation clusters are annotated with global discontinuity scores calculated for cluster members 1 similarity relationships 2 potency distribution 3 compound discont. scores 4 clusters 5 cluster discont. scores

SAR Phenotypes - Lipoxygenase Inhibitors: Globally Continuous SAR Many compound clusters with continuous local SARs

SAR Phenotypes - Thrombin Inhibitors: Globally Discontinuous SAR Clusters with large-magnitude activity cliffs

SAR Phenotypes - Squalene Synthase Inhib.: Heterogeneous-relaxed SAR activity cliff continuous local SAR discontinuous local SAR

Activity Cliff Index NSGs provide interactive graphical access to prominent activity cliffs Cliff Index (CI) enables systematic mining and ranking of activity cliffs 7 µm CI = 13.4 activity cliff marker CI prioritizes pairs of similar compounds having large potency differences: CI = 15.2 0.015 nm 1 µm

3D vs. 2D Activity Landscapes Squalene synthase NSG cluster discontinuity: 1.00 Activity cliff region Interpolated area Smooth region NSG cluster discontinuity: 0.01

From Activity Cliffs to Selectivity Cliffs Multi-target SARs cathepsin L cathepsin B Target-pair selectivity: difference between logarithmic potency Structure-selectivity relationships (SSRs) pic50 = 9 pic50 = 7 S L/B = 2

SAR and SSR Network Analysis cathepsin L 0.48 Potency-based NSG Potency: 10.4 3.0 Compound discontinuity score: 1 activity cliff markers 0 0.05 Cluster discontinuity score 1 rough SAR 0 smooth SAR

SAR and SSR Network Analysis cathepsin L 0.48 cathepsin B 0.10 0.05 0.27

SAR and SSR Network Analysis cathepsin L / cathepsin B 0.73 Selectivity-based NSG Selectivity: 3.2 (L) 3.2 (B) Compound discontinuity score: 1 selectivity cliff markers 0.72 1 0 Cluster discontinuity score rough SSR 0 smooth SSR

Local SSR Environments cathepsin L / cathepsin B 0.73 discontinuous SSR 0.72

Activity Cliffs vs. Selectivity Cliffs L B L/B discontinuous SAR continuous SAR discontinuous SSR activity cliff markers selectivity cliff markers L: 15 nm B: 3.5 µm L/B: 1.4 L: 10 µm B: 170 nm L/B: -1.8

Local SSR Environments cathepsin L / cathepsin B 0.73 discontinuous SSR 0.72

Activity Cliffs vs. Selectivity Cliffs L B L/B continuous SAR continuous SAR discontinuous SSR selectivity cliff markers L: 3.6 µm B: 102 µm L/B: 1.5 L: 26 µm B: 5.3 µm L/B: -0.7

Selectivity Determinants Molecules with different selectivity are often found in the neighborhood of selectivity cliff markers Selectivity rules can be formulated L/B halogens with increasing bulkiness and decreasing electronegativity shift selectivity toward cat L sel: -0.7 sel: 0.1 sel: 1.5 sel: 2.0

Narrowly Focused Landscape Views: Combinatorial Analog Graphs (CAGs) CAGs are designed to systematically analyze SARs in analog series 1 0.33 0.45 2 0.33 3 0.84 Substitutions introducing SAR discontinuity or selectivity determinants can be identified 1-2 0.22 1-3 0.65 2-3 0.58 1-2-3 0.78

R-group decomposition Activity Cliffs in Analog Series 0.53 1 0.61 2 0.98 3 0.36 4 0.35 5 0.41 6 0.78 1-2 0.70 1-3 0.64 1-4 0.43 1-5 0.53 1-6 0.69 2-3 1.00 2-4 0.52 2-5 0.98 2-6 3-4 0.32 3-5 0.57 3-6 0.33 4-5 0.29 4-6 0.46 5-6 1-2-3 0.86 1-2-4 1-2-5 0.69 1-2-6 0.93 1-3-4 0.55 1-3-5 0.44 1-3-6 0.53 1-4-5 0.40 1-4-6 0.62 1-5-6 2-3-4 0.70 2-3-5 1.00 2-3-6 0.75 2-4-5 0.56 2-4-6 0.73 2-5-6 3-4-5 0.25 3-4-6 0.34 3-5-6 0.31 4-5-6 Analog series Discontinuity scoring Graph

Combinatorial Analog Graph (CAG) discontinuity score edges compounds in connected subsets share modifications at substitution sites 0.45 complete series 1 0 1 0.33 2 0.33 3 0.84 compounds with variations at one site site numbers discontinuity score 1-2 0.22 1-3 0.65 2-3 0.58 compounds with simultaneous variations two sites 1-2-3 0.78

Local Compound Similarity Assessment Ac (-) R-group pharmacophore feature edit distances Ar Ac Ac -DA Ac 19 substituent categories: 3 structural features x 6 pharmacophore features + no substituent acyclic (Ac) donor (D) (-) aromatic (Ar) acceptor (A) aliphatic (Al) donor & acceptor (DA) positively charged (P) negatively charged (N) featureless

Similarity Assessment and Disc. Score pk i 6.5 pk i 7.8 Site Mutation Edit distance 1 Ac Ac 0 2 (-) Ac-DA 0.2 3 Ar Ac 0.2 0.4 Similarity 1 distance( i,j ) ΔpK i pk i ( i ) pk i ( j ) Disc raw sim( i,j ) * ΔpK i ( i,j ) 1.0 0.4 = 0.6 7.8 6.5 = 1.3 0.6 * 1.3 = 0.78

Discontinuity Score Compounds with variations at substitution site 2: Discontinuity score: SAR discontinuity contributions from individual sites or site combinations

SAR Hotspots and SAR Holes SAR Hotpots (red): Sites where substitutions significantly change potency 0.53 SAR Holes (white) : undersampled sites or site combinations 1 0.61 2 0.98 3 0.36 4 0.35 5 0.41 6 0.78 1-2 0.70 0.64 1-3 0.43 1-4 0.53 1-5 0.69 1-6 1.00 2-3 0.52 2-4 0.98 2-5 2-6 3-4 0.32 0.57 3-5 0.33 3-6 0.29 4-5 0.46 4-6 5-6 1-2-3 0.86 1-2-4 1-2-5 0.69 1-2-6 0.93 1-3-4 0.55 1-3-5 0.44 1-3-6 0.53 1-4-5 0.40 1-4-6 0.62 1-5-6 2-3-4 0.70 2-3-5 1.00 2-3-6 0.75 2-4-5 0.56 2-4-6 0.73 2-5-6 3-4-5 0.25 3-4-6 0.34 3-5-6 0.31 4-5-6

Local SAR Discontinuity / Activity Cliffs Thrombin inhibitors 741 nm 160 nm 2 µm 204 nm 926 nm 82 nm 7 µm 1 nm

Structural Interpretation of Activity Cliffs Factor Xa inhibitors 950 nm 18 nm

Structural Interpretation of Activity Cliffs Factor Xa inhibitors Phe 174 Trp 215 Tautomerism disrupts π stacking 950 nm Tyr 99

Structural Interpretation of Activity Cliffs Tie-2 kinase inhibitors 153 nm 1 nm

Structural Interpretation of Activity Cliffs Tie-2 kinase inhibitors Trp 215 Phe 174 Ala 905 Methylamine substituent forms strong hydrogen bond Tyr 99 1 nm

Conclusions Combining numerical and graphical analysis tools for activity landscape analysis and mining of SAR information NSGs for the systematic comparison of global and local SAR features in compound data sets From activity cliffs to selectivity cliffs through multi-target SAR analysis in NSGs Topology of calculated 3D landscapes for different data sets and molecular representations Combinatorial Analog Graphs to organize analog series and identify SAR hotspots and SAR holes

Acknowledgments Preety Iyer Qingli Liu Eugen Lounkine Lisa Peltason Anne Mai Wassermann Mathias Wawer