CDK & Mass Spectrometry

Similar documents
Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction

Chemical Data Retrieval and Management

JOHN MAYFIELD EGON WILLIGHAGEN CHEMISTRY DEVELOPMENT KIT V2.0

The Schrödinger KNIME extensions

The Chemistry Development Kit (CDK). 3. Atom typing, Rendering, Molecular Formula, and Substructure Searching

CDK-Taverna: an open workflow environment for cheminformatics

Orbital Development Kit

Dictionary of ligands

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Supporting Information. Kekule.js: An Open Source JavaScript Chemoinformatics Toolkit

Introduction to Chemoinformatics and Drug Discovery

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Chemical Space: Modeling Exploration & Understanding

Groovy Cheminformatics with the Chemistry Development Kit Ed

KNIME applications at Syngenta

Analyzing Small Molecule Data in R

The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemoand Bioinformatics

Bioinformatics Workshop - NM-AIST

cheminformatics toolkits: a personal perspective

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors

KATE2017 on NET beta version Operating manual

The Schrödinger KNIME extensions

Solved and Unsolved Problems in Chemoinformatics

Pipeline Pilot Integration

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Reaxys Pipeline Pilot Components Installation and User Guide

Data Mining in the Chemical Industry. Overview of presentation

Searching Substances in Reaxys

IUCLID Substance Data

AUTOMATIC GENERATION OF TAUTOMERS

Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors

Machine Learning Concepts in Chemoinformatics

So I have an SD File What do I do next? Rajarshi Guha & Noel O Boyle NCATS & NextMove So<ware

Structural biology and drug design: An overview

An Integrated Approach to in-silico

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Pipeline Pilot Integration

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Bridging the Dimensions:

Tautomerism in chemical information management systems

New developments on the cheminformatics open workflow environment CDK-Taverna

Large scale classification of chemical reactions from patent data

QSAR in Green Chemistry

Introduction to Chemoinformatics

Introducing a Bioinformatics Similarity Search Solution

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

CheS-Mapper 2.0 for visual validation of (Q)SAR models

Review 1 of Handbook of Chemoinformatics Algorithms by Faulon, Bender, eds. CRC Press, pages, hardcover

JCICS Major Research Areas

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Interpretation of Organic Spectra. Chem 4361/8361

Generating Small Molecule Conformations from Structural Data

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Command-line tools of ChemAxon: tips and tricks

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

The Electronic Representation of Chemical Structures: beyond the low hanging fruit

Use of CTI Index for Perception of Duplicated Chemical Structures in Large Chemical Databases

Thermo Scientific Pesticide Explorer Collection. Start-to-finish. workflows for pesticide analysis

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints

Agilent MassHunter Quantitative Data Analysis

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

Organometallics & InChI. August 2017

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

On InChI and evaluating the quality of cross-reference links

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

SEAMLESS INTEGRATION OF MASS DETECTION INTO THE UV CHROMATOGRAPHIC WORKFLOW

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Canonical Line Notations

Drug Informatics for Chemical Genomics...

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Similarity Search. Uwe Koch

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

1. Introduction. * Equal Contributors

Machine learning for ligand-based virtual screening and chemogenomics!

Intelligent NMR productivity tools

Using Web Technologies for Integrative Drug Discovery

MassHunter TOF/QTOF Users Meeting

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

MassHunter Software Overview

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009

Ligand Scout Tutorials

The shortest path to chemistry data and literature

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

Chemically Intelligent Experiment Data Management

What s New in NIST11 (April 3, 2011)

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Practical QSAR and Library Design: Advanced tools for research teams

Compounding insights Thermo Scientific Compound Discoverer Software

Transcription:

CDK & Mass Spectrometry October 3, 2011 1/18 Stephan Beisken October 3, 2011 EBI is an outstation of the European Molecular Biology Laboratory.

Chemistry Development Kit (CDK) An Open Source Java TM Library for Structural Chemo- and Bioinformatics > 90 000 lines of code > 900 classes > 9000 methods library generation virtual screening molecular property prediction visualization http://cdk.sourceforge.net Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, 2111-2120. Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and Computer Sciences 2003, 43, 493-500. 2/18 Stephan Beisken October 3, 2011

Functionality Input/Output I/O (CML, MDL, PDB, InChI,... ) canonical SMILES Visualization structure diagram layout (SDG) 2D rendering 3D rendering Modeling 3D model builder atom typing force field Chemical Graphs isomorphism detection MCS searches SMARTS- and substructure searches ring searches (SSSR, all rings) aromaticity detection Structure Generation deterministic isomer generator stochastic structure generators Properties fingerprinting Gasteiger-charges > 30 QSAR-descriptors 3/18 Stephan Beisken October 3, 2011

In Numbers I 74 registered developers 80 people subscribed to cdk-devel list 152 people subscribed to cdk-user list 4/18 Stephan Beisken October 3, 2011

In Numbers II 142 060 downloads since 2001 moved from SVN to GIT 5/18 Stephan Beisken October 3, 2011

KNIME CDK Basis embedded in Chemistry base I/O writers, converters read-in molecules need to be converted to CDK type molecules Molecule to/from CDK Data Type CDKCell (CDK- & StringValue (chemical markup language)) stores molecule as BLOB Java serialization 6/18 Stephan Beisken October 3, 2011

KNIME CDK Visualization molecule diagrams most intuitive 2D layout via StructureDiagramGenerator connectivity of the molecule IAtomContainer vs. IMolecule 3D Viewer : works only with pre-calculated coordinates Structure sketcher for manual input 7/18 Stephan Beisken October 3, 2011

KNIME CDK Properties range of molecular properties Lipinski s rule of five fingerprints (MACCS, Pubchem,... ) fingerprint similarity (Tanimoto) Hyrdogen adder (perceives and configures atom types, checks valences) 8/18 Stephan Beisken October 3, 2011

KNIME CDK version 1.4.x Advantages (since 1.3.x) not so much new functionality... but: many patches, fixes more robust renderer classes merged back in most importantly: many new AtomTypes 9/18 Stephan Beisken October 3, 2011

KNIME CDK Challenges serialization (CML) threading Wishlist QSAR descriptors standardization signatures, fingerprints 10/18 Stephan Beisken October 3, 2011

KNIME CDK Threading nodes work row-by-row threading is disabled for all CDK nodes CDK developers try to ensure thread safety, however, no systematic analysis has been undertaken yet Thread safe SMSD query ONC1=CC=CC=C1 target O1C=CC=CN1C1=CC=CC=C1 MCS flag: ring matcher OFF http://chembioinfo.wordpress.com/2011/ 09/14/thread-safe-smsd/ Syed Asad Rahman 11/18 Stephan Beisken October 3, 2011

Computer Assisted Structure Elucidation experimental data compound identification elucidation: conformation chirality E/Z stereochemistry 12/18 Stephan Beisken October 3, 2011

Mass Spectrometry Data Dimensionality retention time m/z ratio signal intensity Signals / Peaks fragments adducts isotopic peaks 13/18 Stephan Beisken October 3, 2011

Chromatogram and Spectra Analysis 14/18 Stephan Beisken October 3, 2011

Data Analysis Workflow Characteristics... highly modular arrays of algorithms Requires... manual tweaking manual analysis Challenges... compound identification meaningful visualisation 15/18 Stephan Beisken October 3, 2011

A Modular Framework for Compound Identification 16/18 Stephan Beisken October 3, 2011

Integration Nodes mzml Reader / Writer 2D & 3D Spectrum Viewer Profile to Centroid mode converter based on mzmldatatype Jmzml library (Jaxb) Challenges efficient data handling preservation of modularity i.e., how to store all information in an accessible and efficient way 17/18 Stephan Beisken October 3, 2011

Acknowledgements The Chemoinformatics and Metabolism Team Christoph Steinbeck The CDK Project Admins Egon Willighagen Miguel Rojas Christoph Steinbeck The KNIME Team Thorsten Meinl All CDK Developers & Contributors, Syngenta AG, The University of Cambridge Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, 2111-2120. Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and Computer Sciences 2003, 43, 493-500. 18/18 Stephan Beisken October 3, 2011