Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Similar documents
In silico pharmacology for drug discovery

Structural biology and drug design: An overview

Introduction. OntoChem

Similarity Search. Uwe Koch

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Introduction to Chemoinformatics and Drug Discovery

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Machine learning for ligand-based virtual screening and chemogenomics!

CSCE555 Bioinformatics. Protein Function Annotation

Machine Learning Concepts in Chemoinformatics

An Integrated Approach to in-silico

Docking. GBCB 5874: Problem Solving in GBCB

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Receptor Based Drug Design (1)

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Virtual screening in drug discovery

Structure to Function. Molecular Bioinformatics, X3, 2006

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Notes of Dr. Anil Mishra at 1

Structure-Activity Modeling - QSAR. Uwe Koch

CAP 5510 Lecture 3 Protein Structures

Statistical concepts in QSAR.

Biologically Relevant Molecular Comparisons. Mark Mackey

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Plan. Day 2: Exercise on MHC molecules.

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Development of a Structure Generator to Explore Target Areas on Chemical Space

The Schrödinger KNIME extensions

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Structural Bioinformatics (C3210) Molecular Docking

Structure-Based Drug Discovery An Overview

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Bioinformatics. Macromolecular structure

Data Mining in the Chemical Industry. Overview of presentation

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Week 10: Homology Modelling (II) - HHpred

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

CS612 - Algorithms in Bioinformatics

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Xia Ning,*, Huzefa Rangwala, and George Karypis

Author Index Volume

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Introduction to FBDD Fragment screening methods and library design

Ligand Scout Tutorials

In Silico Investigation of Off-Target Effects

STRUCTURAL BIOINFORMATICS II. Spring 2018

Fondamenti di Chimica Farmaceutica. Computer Chemistry in Drug Research: Introduction

Similarity methods for ligandbased virtual screening

Molecular Similarity Searching Using Inference Network

Early Stages of Drug Discovery in the Pharmaceutical Industry

Hit Finding and Optimization Using BLAZE & FORGE

Large-Scale Genomic Surveys

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Basics of protein structure

Cheminformatics analysis and learning in a data pipelining environment

Introduction to Computational Structural Biology

Design and Synthesis of the Comprehensive Fragment Library

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Detection of Protein Binding Sites II

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

QSAR in Green Chemistry

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Homology Modeling. Roberto Lins EPFL - summer semester 2005

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Drug Informatics for Chemical Genomics...

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Computational Molecular Biology (

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Data Quality Issues That Can Impact Drug Discovery

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Sequence analysis and comparison

DOCKING TUTORIAL. A. The docking Workflow

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Using AutoDock for Virtual Screening

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

ICM-Chemist-Pro How-To Guide. Version 3.6-1h Last Updated 12/29/2009

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Transcription:

Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller

Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics The study of small-molecular-weight drug candidates on gene/protein function.

Chemogenomic Chemogenomics defines, in principle, the screening of the chemical universe, i.e., all possible chemical compounds, against the target universe, i.e., all proteins and other potential drug targets. Mission impossible!!

Chemogenomic The solution the method defines the screening of congeneric chemical libraries against certain target families, e.g., the G proteincoupled receptors, nuclear receptors, different protease families, kinases, phosphodiesterases, ion channels, transporters, etc.

Chemogenomic Requirements A compound library A representative biological system Target library Single cell Organism A reliable readout Gene/protein expression High-throughput screening (binding or functionality assays)

Chemogenomic Completion of a two-dimentional matrix, representing the interaction of tragets/genes and compounds by values of binding affinities (Ki) or functional effect (IC50).

Ligand and Target Spaces Assumptions for any chemogenomi-based approach (a) Compounds sharing some chemical similarity should also share targets. (b) Targets sharing similar ligands should share similar patterns (binding sites). Question How do we measure the distances between two ligands or two targets?

Ligand Space Distance measuring between two compounds is done by solving a similarity matrix The compounds properties are often described using descriptors Descriptor classification One-dimensional Two-dimensional Three-dimensional

Ligand Space (Descriptors) 1D descriptors Easy and fast to compute Describe global properties (MW, atom and bond counts) Based on the chemical formulae Prediction of physicochemical properties Polar surface area Solubility Rings Discrimination between compound sets Drugs vs. nondrugs Ligands from targetfamilies 1D linear representations of compounds SMILES (Simplified Molecular Input Entry System)

Ligand Space (Descriptors) 2D descriptors Most common ligand descriptors Describe topological properties (maximum common substructure, structural keys) Encode both atomic and bond properties 2D sketch figure Scanning libraries for similar substructures or fragments Graph-based method Molecular graph (subfamily clustering) Computational slow Fingerprint-based method Bit strings (0 and 1 = atoms, fragments, rings..) Fingerprints easy for comparison Also used in receptor-ligand recognition

Ligand Space (Descriptors) 3D descriptors Describe conformational properties (atomic coordinates, potentials, fields, shapes) Necessities for proper alignment Comparison in same 3D Cartesian space Conformational space accessible to each ligand Bit strings vs. structure comparison Structure comparison can produce false positives 3D information is stored in bit strings Binary representation of 2D or 3D properties Tanimoto coefficient (simple similarity indicies)

Ligand Space (Descriptors)

Target Space Chemoproteomics Traget = proteins Dimension Classification scheme Databases 1D 2D 3D By sequence By patterns By secondary structure, fold By atomic coordinates By binding site UniProt, Pfam PRINTS, PROSITE SCOP, CATH PDB, MODBASE BindingMOAD, sc-pdb

Target Space The amino acid sequence (1D) Clustering of targets into target-families Large variation in sequence length even among family members e.g., human GPCRs range from 290 to 6200 residues Structural motifs (2D) Mapping of a-helices, b-sheets, coils and random structures 3D Structure Atomic coordinates derived by X-ray diffraction or NMR Structural fold Ligand-binding site, higher similarity among related targets Pharmacological profile Binding affinity for a panel of ligands Modifying pharmacological profiles of druges are widely used in drug design

Protein-Ligand Space Full matrices (affinity or structural information) Experimental data are stored in the matrices Affinity of a new compound to a known target Measuring structure-activity relationships Prediction of a global pharmacological profile Advantages Based on experimental data Superior to computed descriptors Disadvantages Enormous amount of data is necessary Highly cost consuming (not realistic in academic environments) Interaction fingerprints (IFPs) Replacement of affinity with molecular interaction descriptors Conversion of atomic coordinates of protein-ligand complexes into bit strings.

Ligand-based Chemogenomic Annotating ligand libraries Molecules sharing enough similarity to existing ligands for which a target profile is known have enhanced probability of sharing the same biological profile. Ligand libraries Targets In vitro affinity data ADME properties Biological annotated compound libraries AurSCOPE (160.000 GPCR ligands and 77.000 kinase inhibitors) MedChem database (Biological and pharmacological information of 650.000 compounds) ChemBank (50.000 compounds in 441 high-troughput screening assays) Natural product-oriented chemical libraries Evolutionary pressure Highly specific binding mechnisms

Ligand-based: Privileged Structures Coined by Evans et al. (1,4-benzodiazepine scaffold) A privileged structure is defined as a substructure or scaffold exhibiting strong preference for a particular area of the target space. Suitable to orient design of trageted compound libraries Biphenyl: protein-binding motif No particular preference for target family 2-tetrazolo-biphenyl GPCRs Only few are really selective

Ligand-based In silico Screening Target fishing Reference compounds set (known 2D or 3D descriptors) Screening procedure (QSAR, Bayesian analysis or pharmacophore) Screening collection for identification of new compounds

Ligand-based In silico Screening Mestres et al. Library of molecules targeting nuclear hormone receptors NHR 2000 ligands 25 receptors Easily distinction between selective and promiscuous scaffolds SHannon Entropy Descriptors SHED Novartis Prediction of target profiles from extended connectivity fingerprints Machine learning algorithm based on Bayesian statistics Wombat database (1230 unique SMILES) Bayesian models was produced (trained) for each activity class Prediction is done by calculating the probability of each test compound to become a ligand for each of the tragets Improvement by concatenate all target-associated probabilities Bayes affinity fingerprint 2D descriptors was more predictive than 3D (not for singletons)

Ligand-based In silico Screening Drawback Categorization of training set compounds according to their molecular target, without checking: Does it really bind? Where it binds? How it binds? Training a machine learning algorithm with incorrect data Alternatives 3D pharmacophores from protein-ligand complexes Experimentally determined atomic coordinates Experimentally determined pharmacological activities Limited chemical diversity observed among PDB ligands

Target-based Chemogenomic Selectivity control Selectivity of ligands among family related targets Proteome-wide comparative modeling Structural data (X-ray or NMR) Sequence-based comparison Structure-based comparison Comparing Molecular fields Comparing 3D structures

Sequence-based Comparison Multiple alignment of all targets Comparison of any kind of target families Lack of high-resolution structural data GPCRs are ideal candidates for sequence-based comparison Only bovine rhodopsin has been crystallised Important target family for drug design Key residues are extracted and concatenated Ungapped sequence (30 residues) Distance matrix based on: Sequence identity Sequence similarity Physicochemical properties Cavity-based clustering of 372 human GPCRs Reproduced a perfect full sequence based tree Target comparison across a family is possible using only few residues Applications Simple analysis of binding site regions by residue conservation Target hopping used to discover receptor ligands to a particular receptor

Sequence-based Comparison

Structure-based Comparison High-resolution structural data is crucial for homology modeling, however only the ligand-binding site are compared Comparing Molecular Fields Molecular interaction fields (MIFs) Structural alignment of targets Interaction energies Probe atoms at each point of a 3D grid (binding site) MIFs placed in a global matrix Rows: Targets Columns: Interaction energies Analysis either by: Principal component analysis Hierarchical clustering Highly dependent on: Structural alignment, grid resolution and probe atoms

Structure-based Comparison Comparing 3D Structures Global structural alignment methods GASH DaliLite CE Alignment of predefined structural motifs Matching templates to a reference protein Not all proteins sharing binding sites for a particular ligand share any structural template similarities Structural alignment by physicochemical property description Surface-based comparison Relatively slow and thus incompatible with proteome-wide comparison SuMO, Cavbase, SiteEngine, SitesBase and CPASS Emerged in the last years Represent active site by pseudocenters encoding physicochemical properties (H-bonding, capacity, aromaticity, hydrophobicity and charge) Pseudocentres are linked by edges providing a molecular graph Detection of maximal common subgraphs (clique detection) Detection of local similarities at ligand-binding subpockets for proteins with totally different fold and catalytic activities

Structure-based Comparison Comparing 3D Structures Interpretation of computated similarity scores often difficult Active sites of different dimensions Larger sites tend to present more matches even if the smallest is more similar Surgand et al. projected an active site on a dimensionless 80- triangled sphere of cavity descriptors Measuring normalized distance in descriptor space

Target-Ligand-based Chemogenomic Chemical annotation of target binding sites Various chemical compound libraries exist Binding information is crucial Protein/binding site must annotated by ligand chemotype SMID (Small Molecule Interaction Database) annotate protein sequence by domain-specific ligands Browse likely ligands to a protein of unknown 3D structure Ligand-annotated binding sites from PDB BindingMOAD and sc-pdb Pharmacological point of view Prioritize ligands for designing targeted compound libraries

Target-Ligand-based Chemogenomic To browse and predict protein-ligand complexes, one needs to set up simple descriptors for both ligands and proteins from knowledge databases and concatenate them into a single protein-ligands description. Two dimensional searches Use experimental binding affinity matrices and define appropriate QSAR models to predict affinity of new compounds Three-dimensional searches Dock each ligand of compound library into each active site of target library Molecular inverse docking approach Scoring functions cannot quantify very heterogeneous proteinligand complexes Computation of IFP strings Converts 3D information about protein-ligand interaction to 1D

Target-Ligand-based Chemogenomic Three-dimensional searches 3D-based docking-independent methods Retrieving ligand from protein and vice versa Encode protein and ligand properties with similar descriptors CoLiBRI (complementary ligands based on receptor information) Ligand and protein described using same molecular descriptors (TAE-RECON) Shape and electronic properties of isolated atoms Mapping patterns of active sites onto patterns of their complementary ligands and vice versa High test results when similar training set

Final remarks High-troughput data (structure, binding affinity, etc.) Ligand Target Linking data either by ligand or target focusing Target-based Chemogenomics Ligand-based Chemogenomics Target-Ligand-bases Chemogenomics Selectivity profiles for therapeutic usage Not more selective ligands In silico approach

Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller