Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Similar documents
Similarity methods for ligandbased virtual screening

JCICS Major Research Areas

Chemoinformatics: the first half century

Data Mining in the Chemical Industry. Overview of presentation

This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules.

Similarity Search. Uwe Koch

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW

Introduction. OntoChem

Introduction to Chemoinformatics

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

BIOINF Drug Design 2. Jens Krüger and Philipp Thiel Summer Lecture 5: 3D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

In silico pharmacology for drug discovery

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

Introduction to Chemoinformatics

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Introduction to Chemoinformatics and Drug Discovery

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Universities of Leeds, Sheffield and York

Course Plan (Syllabus): Drug Design and Discovery

An Integrated Approach to in-silico

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Solved and Unsolved Problems in Chemoinformatics

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Universities of Leeds, Sheffield and York

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

The Schrödinger KNIME extensions

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Practical QSAR and Library Design: Advanced tools for research teams

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Course Specification Medicinal Chemistry-1

Receptor Based Drug Design (1)

Machine Learning Concepts in Chemoinformatics

Fast similarity searching making the virtual real. Stephen Pickett, GSK

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

Universities of Leeds, Sheffield and York

Biologically Relevant Molecular Comparisons. Mark Mackey

Machine learning for ligand-based virtual screening and chemogenomics!

Introducing a Bioinformatics Similarity Search Solution

Integrated Cheminformatics to Guide Drug Discovery

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Use of CTI Index for Perception of Duplicated Chemical Structures in Large Chemical Databases

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Docking. GBCB 5874: Problem Solving in GBCB

Pipeline Pilot Integration

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Structure-Activity Modeling - QSAR. Uwe Koch

Drug Informatics for Chemical Genomics...

Large scale classification of chemical reactions from patent data

The Case for Use Cases

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

In Silico Investigation of Off-Target Effects

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

MOLECULAR DIVERSITY IN DRUG DESIGN

Chemoinformatics and Drug Discovery

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Clustering Ambiguity: An Overview

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval

Bioisosteres in Medicinal Chemistry

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

CH MEDICINAL CHEMISTRY

The Schrödinger KNIME extensions

DISCOVERING new drugs is an expensive and challenging

Molecular Modelling for Medicinal Chemistry (F13MMM) Room A36

Medicinal Chemistry 1

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

De Novo molecular design with Deep Reinforcement Learning

Virtual screening in drug discovery

Title: Medicinal Chemistry 3

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

AUTOMATIC GENERATION OF TAUTOMERS

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

The Schrödinger KNIME extensions

Tautomerism in chemical information management systems

Web tools for Monomer selection, Library Design and Compound Acquisition. Andrew Leach GlaxoSmithKline Research and Development Stevenage

Computational chemical biology to address non-traditional drug targets. John Karanicolas

CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS

Statistical concepts in QSAR.

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

The Conformation Search Problem

FRAUNHOFER IME SCREENINGPORT

Chemoinformatics An Introduction for Computer Scientists

Chemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility

(S1) (S2) = =8.16

Research Article. Chemical compound classification based on improved Max-Min kernel

Fragment Hotspot Maps: A CSD-derived Method for Hotspot identification

Transcription:

Chemoinformatics and information management Peter Willett, University of Sheffield, UK

verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics software Examples of current research

Drug discovery: I Drug discovery is a vastly complex, multi-disciplinary task that can extend over two decades The total cost for the discovery and development of a novel therapeutic agent is now ca. $1.5B Even so, only about 1 in 3 cover the R&D costs But when they can do the pay-offs can be massive: Lipitor in 2006 made $12.5B (cf MS Windows and Boeing 747) Patent cover is 20 years from initial announcement Time is money so need to find potential drugs (and to reject nondrugs) much faster (and similarly for agrochemicals)

Drug discovery: II Chemoinformatics is one way of increasing the cost effectiveness of drug discovery Initial work in chemoinformatics as early as the Sixties: current interest because of developments in Combinatorial chemistry High throughput screening (HTS) Change from sequential to massively parallel processing Resulting explosion in the amounts of data available in drug-discovery programmes, and an increased interest in computational methods Focus on chemical structure diagram, cf development of other types of -informatics specialisms

Definitions F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33, 375-384 The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htm Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics: a textbook. Wiley-VCH. Chemoinformatics is the application of informatics methods to solve chemical problems.

Representation of molecules eed for a machine-readable representation 1D computed/experimental global properties 2D the chemical structure diagram 3D atomic coordinate data 1D representations handled using conventional DBMS software eed to manipulate 2D and 3D data

Connection tables 4 3 2 5 1 6 7 9 1 C 2 2 6 1 7 1 2 C 1 2 3 1 3 C 2 1 4 2 4 C 3 2 5 1 8 5 C 4 1 6 2 6 C 1 1 5 2 7 C 1 1 8 1 9 2 8 C 7 1 9 7 2 An unambiguous representation of a 2D chemical structure diagram A connection table is a graph, the underlying data structure in chemoinformatics

Graph theory and chemistry Graph theory Branch of mathematics that describes sets of objects, called nodes and the relationships between them, called edges A 2D connection table is a graph: odes correspond to atoms Edges correspond to bonds Graph matching algorithms Search chemical databases Generation of other representations H 2 Br

Types of search Exact structure search (hashed connection table with graph isomorphism for collision handling) Substructure search (subgraph isomorphism) cf partial or boolean matching in text Similarity searching (maximal common subgraph isomorphism (or simpler)) cf best match search or web searching Graph matching algorithms are effective But time is factorial with the number of nodes eed for efficient heuristics

Fingerprints C C C C C C C C A fingerprint (or fragment bit-string) is a binary vector encoding the presence ( 1 ) or absence ( 0 ) of fragment substructures in a molecule Each bit in the fingerprint represents one molecular fragment. Typical length is ~1000 bits An approximate representation, but one that can be processed very efficiently and hence often used as a precursor to graph matching

Chemoinformatics facilities Database searching as described previously Structure and substructure searching originally Similarity searching from mid-eighties 3D substructure searching from mid-ineties (first rigid then flexible) Applications Database clustering Molecular diversity analysis Drug-likeness Virtual screening Ligand-based Structure-based

3D substructure searching Generation of pharmacophore patterns Use of MGA and hyperstructure approaches a b c a = 8.62 0.58 Angstroms b = 7.08 0.56 Angstroms c = 3.35 0.65 Angstroms + - + - + - S P P P

Similarity searching using 2D fingerprints Use of data fusion methods to enhance performance, combining information from multiple searches H H H H H 2 Query H H 2 H H H H H 2

Molecular modelling and QSAR Use of computational chemistry to obtain the structures and properties of small molecules Quantum mechanics Molecular dynamics Molecular modelling Statistical correlation of structure (however described) with physical, chemical and biological properties Initially biological activity (QSAR) ow pharmacokinetics and toxicity (ADMET)

Integration with database searching Related, but largely separate, research areas for many years Simple search operations on very large numbers of molecules Increasingly complex operations on smaller and smaller (normally homogeneous) datasets Substructural analysis as an early, notable exception The future lies in the integration of these two approaches, applying more sophisticated methods on larger datasets Docking now well established Property calculations at a database level ADMET

General references J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH, Weinheim, 2003). W.L. Chen, Chemoinformatics: past, present and future, Journal of Chemical Information and Modeling 46 (2006) 2230-2255. D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics education in drug discovery, Drug Discovery Today 11 (2006) 436-439. A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics (Kluwer, Dordrecht, 2 nd sedition, 2007). P. Willett, A bibliometric analysis of chemoinformatics, Aslib Proceedings 60 (2008) 4-17.