OntoChem Software. Chemoinformatic Solutions for Life Sciences Problems

Similar documents
Introduction. OntoChem

Aurora Costache, PhD. CHEMAXON PORTFOLIO WALK THROUGH From toolkits to end-user applications to deliver solutions

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Data Mining in the Chemical Industry. Overview of presentation

Introducing a Bioinformatics Similarity Search Solution

Pipeline Pilot Integration

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Introduction to Chemoinformatics and Drug Discovery

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

An Integrated Approach to in-silico

Early Stages of Drug Discovery in the Pharmaceutical Industry

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

Introduction to Chemoinformatics

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

In Silico Investigation of Off-Target Effects

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Patent Searching using Bayesian Statistics

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

Chemical Ontologies. Chemical Ontologies. ChemAxon UGM May 23, 2012

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Structural biology and drug design: An overview

DRUG DISCOVERY TODAY ELN ELN. Chemistry. Biology. Known ligands. DBs. Generate chemistry ideas. Check chemical feasibility In-house.

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

The Schrödinger KNIME extensions

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Receptor Based Drug Design (1)

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Reaxys Pipeline Pilot Components Installation and User Guide

How IJC is Adding Value to a Molecular Design Business

SCULPT 3.0. Using SCULPT to Gain Competitive Insights. Brings 3D Visualization to the Lab Bench SPECIAL REPORT. 4 Molecular Connection Fall 1999

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Using AutoDock for Virtual Screening

Ákos Tarcsay CHEMAXON SOLUTIONS

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Structure-based approaches to the indexing and retrieval of patent chemistry. Tim Miller Head of Research May 2010

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Plan. Day 2: Exercise on MHC molecules.

Finding the Needle - Reaxys Structure Searching

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

Using Web Technologies for Integrative Drug Discovery

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered

PubChem data extraction and integration using Instant JChem. Oleg Ursu Cristian Bologa Tudor I. Oprea Division of Biocomputing

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Solved and Unsolved Problems in Chemoinformatics

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE

QSAR Study of Quinazoline Derivatives as Inhibitor of Epidermal Growth Factor Receptor-Tyrosine Kinase (EGFR-TK)

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

Comprehensive Chemoinformatics since Web-based, client/server, and toolkit approaches. Native Oracle (cartridge) and Microsoft technology.

Structure and Reaction querying in Reaxys

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

The Case for Use Cases

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Chemoinformatics and Drug Discovery

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

LIBRARY DESIGN FOR COLLABORATIVE DRUG DISCOVERY: EXPANDING DRUGGABLE CHEMOGENOMIC SPACE

Performing a Pharmacophore Search using CSD-CrossMiner

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

DOCKING TUTORIAL. A. The docking Workflow

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

JCICS Major Research Areas

Using Self-Organizing maps to accelerate similarity search

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Targeting protein-protein interactions: A hot topic in drug discovery

Tautomerism in chemical information management systems

NASA/IPAC EXTRAGALACTIC DATABASE

5.1. Hardwares, Softwares and Web server used in Molecular modeling

User Guide for LeDock

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Cheminformatics analysis and learning in a data pipelining environment

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

Data Quality Issues That Can Impact Drug Discovery

VMware VMmark V1.1 Results

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Ligand Scout Tutorials

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Kalexsyn Overview Kalexsyn, Inc Campus Drive Kalamazoo, MI Phone: (269) Fax: (269)

Transcription:

ntochem Software. Chemoinformatic Solutions for Life Sciences Problems ntochem GmbH H.-Damerow-Str. 4 Halle 612

Short Company verview Founded in 25 Dr. Lutz Weber (Roche, Morphochem) Prof. Ludger Wessjohann (IPB Halle, Monaco) 26 implementing new software and algorithms 27 first clients (Pharma, Agro, Biotech, Food & Fragrances) 8 projects cash flow positive 28 first financing round expansion (new space) investments head count from 5 to 12

Mission ur knowledge discovery is the non-trivial extraction of implicit, unknown, and potentially useful information from data. The knowledge discovery process uses data mining results (the process of extracting patterns from data) and transforms them into useful and understandable information. This information is not typically retrievable by standard techniques but is uncovered through the use of artificial intelligence (AI) techniques.

Technology Idea: Automation of Association Discovery Morphochem/Migragen example 23: Therapeutic goal: spinal cord injury Solution/Patent: treatment with fasudil erv cell growth is inhibited in spinal cord injury Rho-kinase inhibits nerv cell growth Rho-kinase inhibitors are known, e.g. Fasudil Fasudil, is in Phase II clinical development Fasudil (patent) is claimed for cardiovascular S Pharma buys patent H Patent: Fasudil as a treatment of spinal cord injuries

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library of druglike molecules with synthesis procedures (1...)

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library known molecule new application of druglike molecules with synthesis procedures (1...)

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library new molecule known application of druglike molecules with synthesis procedures (1...)

Intelligent Product Generation Priato Reaction Library Reactants Baeyer-Villiger ketone oxidation Baylis-Hillman vinyl alkylation Beckmann rearrangement Bischler-apieralski isoquinoline synthesis Friedel-Crafts reaction Friedlander quinoline synthesis Gabriel synthesis Grignard reaction Hell-Volhardt-Zelinski halogenation Products REACTR ChemAxon Reactor...

ChemAxon Related Large Chemical Databases (>1 billion compounds) non-combinatorial, non-markush is it technically feasible? Upload and search speed... How to generate... Which software is best... is it useful? Chemical similarity concepts SSS = screening with fingerprints + atom-by-atom-search (ABAS) SSS fingerprints are tuned to provide fast screening Will they work in case of large chemical databases? With they work with many similar compounds generated via reactions?

Large Databases For fast searching we need molecules in cache (?) Index 1 bln compounds approx 1 GB memory (ChemAxon) Can index be optimized, i.e. smaller? Disk s are becoming competitive, i.e. AS and solid state drives 15, rpm SCSI disk array s solid state drives same speed for random access as for sequential access Hardware comparison PC, 2 cores AMD64, 4GB RAM, Linux 9.2 Silicon Graphics Altix 3, 4 cores Itanium2, 12 GB RAM, Linux 9.2 InfitineStorage S3, 2.8TB UMAlink 6GB/sec

Search Speed Test database with 11 million compounds (PubChem) PC racle 1.2 Enterprise; JChem 3.2 peration jc_tanimoto( >.9) SGI peration jc_tanimoto( >.9) Query Structure umber f Hits SSS Time (ms) 788 1.587 88.827 131.892 2.343.464 812 788 31.945 59.829 399.85 5.943 c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 Query Structure c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 umber f Hits SSS Time (ms) 2.15 4.17 88.827 131.892 2.343.464 2.95 2.99 21.729 45.28 27.345 13.325 Screened Count Screening Time (ms) 773 1.536 89.117 677.719 2.68.366 Screened Count 767 771 846 994 1.53 5.943 Screening Time (ms) 2.79 4.125 89.117 677.719 2.68.366 2.57 2.69 2.189 2.536 3.232 13.325

Search Speed Test database with 4 million compounds (own compounds) racle 1.2 Enterprise; JChem 3.1 PC peration jc_tanimoto( >.9) SGI peration jc_tanimoto( >.9) Query Structure umber f Hits SSS Time (ms) 7.218 7.27 19 2.269 21.339 375.389 5.932.686 9.146 9.36 45.26 283.92 1.453.639 51.92 c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 Query Structure c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 umber f Hits SSS Time (ms) 6.392 6.952 19 2.269 21.339 375.389 5.932.686 6.711 7.41 39.341 114.84 67.951 36.529 Screened Count Screening Time (ms) 7.19 7.175 2 2.984 183.897 1.816.711 8.967.52 Screened Count 7.18 8.174 8.96 8.131 1.8 51.92 Screening Time (ms) 6.385 6.941 2 2.984 183.897 1.816.711 8.967.52 6.373 6.486 7.659 7.329 8.756 36.529

Search Speed bservations racle 1.2 Enterprise; JChem 3.2 easy to setup and integrate works out-of-the-box switch from Java 1.4 to 1.6 approx 1% speed increase on PC Java 1.6 not available for pure 64-bit Itanium2, but 1.5 with jrockit is similar Loading of data 6 days for 4 million compounds and standard racle and Jchem duplicates allowed... tuning needed : database (racle) commit transaction is slow screening uses 1 core ABAS uses all available cores

Tuning Hard- and Software for large DB Sun 46 Server, 16 cores, 128 GB RAM, Solaris 1 StorageTek 254, 16TB Two 4 Gb/sec Fibre Channel host ports SATA-II, 5 GB, 7,2-rpm ZFS - zetafile system JVM: Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM 1.6._5 S: amd64 SunS 5.1 JChem 5.2 PostgreSQL 8.1 works out-of-the-box! (exception IJC, because of special Solaris libraries)

Why PostgreSQL? pgsql-performance@postgresql.org Tables of up to 18 quadrillion rows with up to 1 gigabyte of data per row Up to 5 TB in one table Can utilize up to 128 GB RAM with large-database applications 5 to 1, concurrent active connections Up to 5, concurrent application users Includes C and standards-compliant JDBC drivers Drivers for DBC, PHP, Perl, C++, Python, Ruby,.ET, and other languages are available from the PostgreSQL community $ broad public / company support

Tuning Test database with 2 million compounds tuning PostgreSQL postgres.conf file: switch off synchronization for massive db upload - danger shared_buffers = 2 temp_buffers = 1 work_mem = 124 maintenance_work_mem = 16384 max_fsm_pages = 2 max_fsm_relations = 1 fsync = off full_page_writes = off Java 1.6 -d64 -server -Xmx4M programs make sure you are using 64-bit, e.g. fopen64() etc to load 2 million compounds takes 4 h instead of 24 h needs 26 GB RAM

Tuning Test database with 2 million compounds Sun C1C1c2cnnc3c(cncc23)C4=CSC=C4 SSS Time (ms) 16.422 36.828 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c 14.991 peration c1cncc2c(cnnc12)3cc3 jc_tanimoto( >.9) Query Structure C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 speed difference to SGI 1x umber f Hits 1.287.459 2.96.817 34.43.442 Screened Count Screening Time (ms) 16.394 36.793 14.961 14.91 63.364 1.44.36 84.312 11.351.9 355.138 43.854.27 69.12 14.878 19.3 18.863 2.47 69.12

Chemical Similarity Basic Assumption in Chemistry for Life Sciences: similar chemical structures have similar biological activities Empirical Taste, flavor Physicochemical properties Biological activity Prediction based on chemical structures Semiempirical and ab initio calculations (quantum chemistry) Docking into 3D structures (modelling) Structural similarity - based on atom connectivities (chemoinformatics)

Chemical Similarity today's method Pre-screening for Substructure and Similarity Similarity methods are based on substructure searching methods, typically a bitstring (e.g. with length 124) is calculated. Each bit (e.g. for benzene ring) occurs only once, even if more rings are in the molecule. String is hashed (e.g. one bit may have different meanings)... 1111111111.. halogene-7-bonds path bit set benzene bit set Software Isis-Base & Isis-Host (MDL) Daylight H Tripos ChemFinder ChemAxon InfoChem

ntochem Topological Torsions ntochem has developed and validated a better similarity search method: Using topological torsions: are composed of topologically connected 4 atom sequences: atom(1)-atom(2)-atom(3)-atom(4) Properties are than added : atom type, charge, π-electrons, attached hydrogens, subsequently the multiplicity of each ToTo is counted ToTo_MACPH: e.g. benzene ToTo: pyrazine ToTo: 12 611 611 611 611 6 611 71 611 71 6 71 611 71 611

Topological Torsion example A small molecule has typically up to 1 ToTo s, calculated by smi2 program: 4 6 11 6 11 6 117 8 6 11 6 11 6 1 6 11 8 6 11 6 1 6 11 6 11 2 6 11 6 1 7 6 3 2 6 11 6 1 7 6 1 2 6 11 6 11 6 1 7 8 6 1 6 11 6 11 6 1 4 17 6 1 6 11 6 11 1 6 1 7 6 1 8 1 1 6 1 7 6 1 6 2 2 7 6 1 6 11 6 11 1 7 6 1 6 2 7 2 6 3 7 6 1 6 11 1 6 3 7 6 1 8 1 1 6 3 7 6 1 6 2 2 6 1 7 6 1 6 11 4 6 1 6 2 7 6 2 1 8 1 6 1 7 6 1 1 8 1 6 1 7 6 3 1 8 1 6 1 6 2 7 1 6 2 6 1 7 6 1 1 6 2 6 1 7 6 3 8 6 2 7 6 2 6 2 1 7 6 2 6 1 7 1 7 6 2 6 1 8 1 4 7 6 2 6 2 7 4 6 2 7 6 2 6 1 8 6 2 6 2 7 6 2 2 7 6 2 6 1 6 11 2 6 2 6 1 6 11 6 11 2 6 11 6 1 6 2 7 2 6 11 6 11 6 1 6 2

Topological Torsion s chemical similarity validation ToTo similarity allows better classification of compounds than by other known 2D methods (see also ilakatan 1987 to Sheridan 24) ToTo - Tanimoto 1a 1b 2a 1a: 1..37.19 1b:.37 1..16 2a:.19.16 1. 2b:.2.15.3 2b.2.15.3 1. 2b.45.33.55 1. H H.37 JChem - Tanimoto 1a 1b 2a 1a: 1..38.44 1b:.38 1..43 2a:.44.43 1. 2b:.45.33.55.15.2.3 Dopamine D4 antagonists 1a 1b.19 Histamine H3 receptor ligands 2a.16 2b

Topological Torsion s search speed Similarity searching in 2 Mio db JChem similarity, PostgreSQL: 4 sec ntochem ToTo similarity Sun disk array, file is divided into 16 parts, one for each core: 12 sec

Application example MDM2-P53 inhibitors project task: propose new compounds, similar to known inhibitors but with different scaffold, patentable easy to synthesise water soluble Me H I CH 1 (utlin-3a) H 2 (TPD222669)

Application example Step1: ToTo search in vendors database Step2: generate 3D, align 54 compounds aligned with utlin H

Application example Extract protein pocket: Compare 3D similarity: M3dsml program with moloc.ch

Application example step 2-3D filtering F R H 1 R=F 2 R=Br 3 H H F 4 5 Result: hitlist

Application example step 3 - search in Reaction database & synthesis: 2 R H H H2 1 1 R R 2 + R + H 1 H2 R reflux F H H Result: compounds are active & selective filed patents publication X-6 X-7 H H H H X-552 X-561

Inhibitors - MR Binding Studies Mdm2 protein MR and Biacore binding studies (T Holak, Biochemistry, 21) X522 binds reversibly to the Mdm2 p53 utlin binding site, Disrupts a preformed p53-mdm2 complex, Behaves well: no protein precipitation or unfolding From 13 known inhibitors, only X s and utlins behave well