Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Similar documents
Ákos Tarcsay CHEMAXON SOLUTIONS

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Reaxys Pipeline Pilot Components Installation and User Guide

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Pipeline Pilot Integration

Similarity Search. Uwe Koch

Overview. Everywhere. Over everything.

One platform for desktop, web and mobile

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

Biologically Relevant Molecular Comparisons. Mark Mackey

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

ArcGIS. for Server. Understanding our World

Introducing a Bioinformatics Similarity Search Solution

Practical QSAR and Library Design: Advanced tools for research teams

The Case for Use Cases

Integrated Cheminformatics to Guide Drug Discovery

Large scale classification of chemical reactions from patent data

ArcGIS Deployment Pattern. Azlina Mahad

Patent Searching using Bayesian Statistics

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation.

The shortest path to chemistry data and literature

Introduction to ArcGIS Server Development

The Schrödinger KNIME extensions

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Introduction. OntoChem

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

In Silico Investigation of Off-Target Effects

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

The MDL Discovery Framework: Data and Application Integration in the Life Sciences

The Schrödinger KNIME extensions

Using Web Technologies for Integrative Drug Discovery

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Comprehensive Chemoinformatics since Web-based, client/server, and toolkit approaches. Native Oracle (cartridge) and Microsoft technology.

Road Ahead: Linear Referencing and UPDM

Chemically Intelligent Experiment Data Management

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

Reaxys Medicinal Chemistry Fact Sheet

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Esri Overview for Mentor Protégé Program:

Chemical Data Retrieval and Management

Chemists are from Mars, Biologists from Venus. Originally published 7th November 2006

Reaxys Managing Complexity

D2D SALES WITH SURVEY123, OP DASHBOARD, AND MICROSOFT SSAS

Methods for tautomer enumeration, -searching and -duplicate filtering

Pipeline Pilot Integration

Among various open-source GIS programs, QGIS can be the best suitable option which can be used across partners for reasons outlined below.

Tautomerism in chemical information management systems

ChemAxon Partner Session: Arxspan Overview

Performing Map Cartography. using Esri Production Mapping

MedIsolae-3D. Mediterranean Islands SDI and 3D Aerial Web Navigation. Giacomo Martirano. (Epsilon Italia)

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

Map Application Progression

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

An Integrated Approach to in-silico

Design and Synthesis of the Comprehensive Fragment Library

Migration to the New BI360

ArcGIS Platform For NSOs

REPORT ON INVESTMENTS

Geog 469 GIS Workshop. Managing Enterprise GIS Geodatabases

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Troubleshooting Replication and Geodata Services. Liz Parrish & Ben Lin

NOACA s DART: Web GIS Tools for Transportation Planning. Chad Harris, NOACA Bryan Baker, Tierra Plan LLC Kevin Knapp, Tierra Plan LLC

Iowa Department of Transportation Office of Transportation Data GIS / CAD Integration

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Cheminformatics analysis and learning in a data pipelining environment

Basic Techniques in Structure and Substructure

The File Geodatabase API. Craig Gillgrass Lance Shipman

Complete Faceting. code4lib, Feb Toke Eskildsen Mikkel Kamstrup Erlandsen

Managing bathymetric data in a hydrographic survey company

Introducing New Technology for Liquid Handling Quality Assurance

OECD QSAR Toolbox v.3.4

The Schrödinger KNIME extensions

Introduction. Chemical Structure Graphs. Whitepaper

Aurora Costache, PhD. CHEMAXON PORTFOLIO WALK THROUGH From toolkits to end-user applications to deliver solutions

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

Leveraging Web GIS: An Introduction to the ArcGIS portal

Bentley Map V8i (SELECTseries 3)

TRANSVERSE GIS FOR TOTAL E&P NIGERIA - GEOPS

Bentley Geospatial update

Esri Training by Microcenter Prepare to Innovate. Microcenter Course Catalog

Merck Virtual Library (MVL): Deployment, Application, and Future Enhancement

Are You Maximizing The Value Of All Your Data?

GIS Lecture 4: Data. GIS Tutorial, Third Edition GIS 1

Supplementary information

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

ArcGIS for Local Government

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Geodatabase Replication for Utilities Tom DeWitte Solution Architect ESRI Utilities Team

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

PubChem data extraction and integration using Instant JChem. Oleg Ursu Cristian Bologa Tudor I. Oprea Division of Biocomputing

GIS = Geographic Information Systems;

IUCLID Substance Data

Transcription:

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Serge P. Parel, PhD ChemAxon User Group Meeting, Budapest 21 st May, 2014

Outline Exquiron Who are we? Research informatics infrastructure Why migrate? What to migrate? Who can help? Example 1: Reporting of dose-response results Example 2: Hit expansion toolbox Summary Acknowledgments 2

Exquiron who are we? Hit identification & profiling services Independent and privately held startup company 10 scientific staff and expanding >60 cumulated years experience in hit identification services Covering human health, veterinary, consumer care, nutraceutical, and agrochemical areas 850m 2 lab & office space Located in Reinach, near Basel, Switzerland 3

Research informatics infrastructure As of end of 2013 Chemistry cartridge: ChemAxon JChem on Oracle Workflow platform: Accelrys Pipeline Pilot Cheminformatics platform: Accelrys PP Chemistry Library Biological data warehouse: Exquiron NaviGator database on MS-SQL server 4

Research informatics infrastructure Problem Chemistry cartridge: ChemAxon JChem on Oracle Workflow platform: Accelrys Pipeline Pilot Cheminformatics platform: Accelrys PP Chemistry Library Biological data warehouse: Exquiron NaviGator database on MS-SQL server Renewal financial terms not suitable for small enterprises with only very few users (compared with pricing of Start-Up package) Duplicate functionalities (Accelrys PP Chemistry Library and ChemAxon PP components) 5

Research informatics infrastructure Solution? Chemistry cartridge: ChemAxon JChem on Oracle Workflow platform: Accelrys Pipeline Pilot Workflow platform: KNIME Cheminformatics platform: Accelrys PP Chemistry Library Cheminformatics platform: ChemAxon/Infocom nodes Biological data warehouse: Exquiron NaviGator database on MS-SQL server 6

Research informatics infrastructure Solution? Chemistry cartridge: ChemAxon JChem on Oracle Workflow platform: Accelrys Pipeline Pilot Workflow platform: KNIME Cheminformatics platform: Accelrys PP Chemistry Library Biological data warehouse: Exquiron NaviGator database on MS-SQL server Cheminformatics platform: ChemAxon/Infocom nodes Simple protocols can be ported in house, but what about more complex workflows? 7

Research informatics infrastructure Who can help? Workflow platform: Accelrys Pipeline Pilot Workflow platform: KNIME Cheminformatics platform: Accelrys PP Chemistry Library Cheminformatics platform: ChemAxon/Infocom nodes Ask KNIME if they can help: Yes, but they rely rather on partners for consulting services and support them where needed Ask ChemAxon if they can help: Yes, through their Consultancy Services 8

Scope of the migration Lots of short PP protocols (SD files and lists manipulation, virtual screening, ) Dose-response data reporting Input file generation Report generation Exhaustive hit expansion toolbox Hit expansion Data fusion and analysis Done in house Done over Consultancy with support from KNIME Done over Consultancy Done in house 9

Example 1 Reporting of dose-response experiments Typical workflow for a hit finding campaign: HTS Confirmation usually in duplicates Dose-response determination assay + counterassay(s) Confirmed actives Many 100K compounds Actives A few 1,000s A few 100s Hits Reporting Chemists want to see structures and biological results Biologists want to see biological results and dose-response curves 10

Reporting of dose-response experiments Workflow (I) Exquiron data warehouse JChem cartridge or SD File Select project and set of experiments Exp. data points IC50 fitting parameters Tag each data point with experiment, concentration, replicate and experiment. Tag IC50 parameters by experiment SD File Purity Additonal data (Text file) 11

Reporting of dose-response experiments Workflow (II) Compound structure SD File Generate report elements Results table Export as pdf Doseresponse curves 12

Reporting of dose-response experiments SD file generation (I) exq_id exp_id Conc Valid Activity(%) EXQ0012345 2344 90 TRUE 6.411991119 EXQ0012345 2344 30 TRUE 41.63282013 EXQ0012345 2344 10 TRUE 74.8677597 EXQ0012345 2344 3.333333254 TRUE 83.46128082 EXQ0012345 2344 1.111111164 TRUE 87.6084137 EXQ0012345 2344 0.370370358 TRUE 93.19058228 Concentration may be different for each experiment Number of replicates per concentration may vary Datapoint at a given concentration could be invalid Number of experiments to report may vary Fields for SD file cannot be defined a priori.. M END > <dtag_nr> EXQ0012345 > <Activity_0.37037_1_S1> 93.19058227539062 > <Activity_1.11111_1_S1> 87.60841369628906 > <Activity_3.33333_1_S1> 83.4612808227539 > <Activity_10.0_1_S1> 74.86775970458984 > <Activity_30.0_1_S1> 41.63282012939453 > <Activity_90.0_1_S1> 6.411991119384766.. 13

Reporting of dose-response experiments SD file generation (II) PipelinePilot Get data from database: SELECT statement Each datapoint is a separate row Crunch data: Through Group By s and a lot of hash tables in PilotScript code New columns/column names generated on the fly For each compound, one single row with each datapoint is a separate column Finalize: Merge with structures, regression parameters and purity information Write SD file KNIME Get data from database: SELECT statement Each datapoint is a separate row Crunch data: Through Group By s, pivoting and some string manipulation All new columns/column names generated at once through pivoting For each compound, one single row with each datapoint is a separate column Finalize: Merge with structures, regression parameters and purity information Write SD file 14

Reporting of dose-response experiments Report generation in PipelinePilot PipelinePilot Read and parse SD file Generate report elements (PP Reporting Collection) Compound ID Structure Results table Generate DR curves based on logistic regression parameters (done in Perl!!) Feed DR curves and experimental data points to XY- Chart component Combine into report with header, footer, logo Generate pdf file 15

Reporting of dose-response experiments and in KNIME PipelinePilot Read and parse SD file Generate report elements (PP Reporting Collection) Compound ID Structure Results table Generate DR curves based on logistic regression parameters (done in Perl!!) Feed DR curves and experimental data points to XY- Chart component Combine into report with header, footer, logo Generate pdf file KNIME Read and parse SD file Generate report information as tables Compound ID Structure Results table Generate DR curves based on logistic regression parameters (using a Math Formula node) Feed DR curves and experimental data points to R and use ggplo to generate the DR plot, return it as picture Send all elements to the KNIME Report Designer (BIRT) Combine to a report with header, footer, logo Generate pdf file 16

Reporting of dose-response experiments Result PipelinePilot KNIME 17

Example 2 Hit expansion workflow Purpose of hit expansion (aka SAR expansion, SAR by catalogue): To consolidate and expand knowledge on active regions in chemical space To find derivatives of active scaffolds to establish or broaden SAR information To expand hit series by scaffold hopping Strengthen knowledge base for direct use in H2L campaign 18

Hit Expansion workflow Workflow A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p) 19

Hit Expansion workflow Workflow Applied to bridge gaps in chemical space A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p) 20

Hit Expansion workflow Example (ChemBL VEGFR2 dataset) A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p) 21

Hit expansion workflow To do Port all functionalities of the workflow to KNIME using the ChemAxon/Infocom nodes (with the exception of the pharmacophore graphs) Improve and optimize workflow for increased execution speed (current Turbosim implementation very slow due to the large number of similarity searches performed) Assignment more than fulfilled 22

Hit expansion workflow Target database preparation Workflow combines similarity searches using FCFP, ECFC and MACCS keys Workflow also performs various substructure searches In PP, all fingerprints (FP) were pre-calculated and stored with their corresponding structure into a PP cache Still a lot of «overhead» since structures are not needed to run similarity searches (FP already calculated) and FP are not needed when running substructures searches, but everything is carried through the whole workflow 23

Hit expansion workflow Target database preparation Workflow combines similarity searches using FCFP, ECFC and MACCS keys Workflow also performs various substructure searches In PP, all fingerprints (FP) were pre-calculated and stored with their corresponding structure into a PP cache Still a lot of «overhead» since structures are not needed to run similarity searches (FP already calculated) and FP are not needed when running substructures searches, but everything is carried through the whole workflow After calculation, FPs are stored into a KNIME table file, and the structures are stored separately into a local JChem database, then load only what is needed where you need it 24

Hit expansion workflow Ring generalization Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds) Implementation: Loop through each atom, set to atom type any if it is in a ring Loop through each bond, set to bond type any if it is in a ring 25

Hit expansion workflow Ring generalization Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds) Implementation: Loop through each atom, set to atom type any if it is in a ring Loop through each bond, set to bond type any if it is in a ring Doesn t work in PP (with PilotScript and Chemistry Collection) 26

Hit expansion workflow Ring generalization Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds) Implementation: Loop through each atom, set to atom type any if it is in a ring Loop through each bond, set to bond type any if it is in a ring Works perfectly fine in KNIME by calling the ChemAxon API in a Java Snippet 27

Hit expansion workflow Bioisostere enumeration Bioisosteric transformations are applied to generate new virtual template structures This step makes use of the Wagener and Lommerse dataset (J. Chem. Inf. Model. 2006, 46, 677) as originally implemented in PP (MDL.rxn files) But there is no bioisosteric transformation node available under KNIME 28

Hit expansion workflow Bioisostere enumeration - implementation Bioisosteric transformations were implemented as a KNIME metanode using a loop iterating through each reaction file and applying it to the incoming template with a ChemAxon/Infocom UniReactor node Works well, but requires a strong atom-to-atom mapping in the.rxn file (which was not the case under PP) Still a couple of issues to solve: May be due to the way UniReactor handles 2-attachment points reactions (?): But we are on the right track! 29

Summary KNIME combined with the ChemAxon/Infocom cheminformatics nodes is an efficient and cost-effective alternative to PipelinePilot State-of-the-art reporting platform (WYSIWYG and Drag&Drop) Fingerprint-based similarity search orders of magnitude faster RAM memory requirements high (at least for the Desktop version of KNIME when dealing with large SD files, prob. less relevant for the Server version). 2 GB assigned to KNIME is probably the minimum Loops can sometimes be quite slow ( equivalent of the RunToCompletion option in PP) Currently no KNIME implementation for the Discngine Pharmacophore Graphs, but possibility to implement Screen3D instead under evaluation 30

Summary ChemAxon and KNIME have done a terrific job in helping us successfully migrate business critical protocols/workflows Required timelines could be maintained (due to expiring PP license) Workflows have been optimized and streamlined If you are planning a similar migration and you need help, ask the ChemAxon team. They definitively can help! 31

Acknowledgements Attila Szabó Mihály Medzi Medzihradszky Aaron Hart Andreas Bergner 32

Thank you! For more information contact Serge P. Parel Exquiron Biotech AG Kägenstrasse 17 4153 Reinach Switzerland E-mail: sparel@exquiron.com www.exquiron.com