CDK & Mass Spectrometry October 3, 2011 1/18 Stephan Beisken October 3, 2011 EBI is an outstation of the European Molecular Biology Laboratory.
Chemistry Development Kit (CDK) An Open Source Java TM Library for Structural Chemo- and Bioinformatics > 90 000 lines of code > 900 classes > 9000 methods library generation virtual screening molecular property prediction visualization http://cdk.sourceforge.net Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, 2111-2120. Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and Computer Sciences 2003, 43, 493-500. 2/18 Stephan Beisken October 3, 2011
Functionality Input/Output I/O (CML, MDL, PDB, InChI,... ) canonical SMILES Visualization structure diagram layout (SDG) 2D rendering 3D rendering Modeling 3D model builder atom typing force field Chemical Graphs isomorphism detection MCS searches SMARTS- and substructure searches ring searches (SSSR, all rings) aromaticity detection Structure Generation deterministic isomer generator stochastic structure generators Properties fingerprinting Gasteiger-charges > 30 QSAR-descriptors 3/18 Stephan Beisken October 3, 2011
In Numbers I 74 registered developers 80 people subscribed to cdk-devel list 152 people subscribed to cdk-user list 4/18 Stephan Beisken October 3, 2011
In Numbers II 142 060 downloads since 2001 moved from SVN to GIT 5/18 Stephan Beisken October 3, 2011
KNIME CDK Basis embedded in Chemistry base I/O writers, converters read-in molecules need to be converted to CDK type molecules Molecule to/from CDK Data Type CDKCell (CDK- & StringValue (chemical markup language)) stores molecule as BLOB Java serialization 6/18 Stephan Beisken October 3, 2011
KNIME CDK Visualization molecule diagrams most intuitive 2D layout via StructureDiagramGenerator connectivity of the molecule IAtomContainer vs. IMolecule 3D Viewer : works only with pre-calculated coordinates Structure sketcher for manual input 7/18 Stephan Beisken October 3, 2011
KNIME CDK Properties range of molecular properties Lipinski s rule of five fingerprints (MACCS, Pubchem,... ) fingerprint similarity (Tanimoto) Hyrdogen adder (perceives and configures atom types, checks valences) 8/18 Stephan Beisken October 3, 2011
KNIME CDK version 1.4.x Advantages (since 1.3.x) not so much new functionality... but: many patches, fixes more robust renderer classes merged back in most importantly: many new AtomTypes 9/18 Stephan Beisken October 3, 2011
KNIME CDK Challenges serialization (CML) threading Wishlist QSAR descriptors standardization signatures, fingerprints 10/18 Stephan Beisken October 3, 2011
KNIME CDK Threading nodes work row-by-row threading is disabled for all CDK nodes CDK developers try to ensure thread safety, however, no systematic analysis has been undertaken yet Thread safe SMSD query ONC1=CC=CC=C1 target O1C=CC=CN1C1=CC=CC=C1 MCS flag: ring matcher OFF http://chembioinfo.wordpress.com/2011/ 09/14/thread-safe-smsd/ Syed Asad Rahman 11/18 Stephan Beisken October 3, 2011
Computer Assisted Structure Elucidation experimental data compound identification elucidation: conformation chirality E/Z stereochemistry 12/18 Stephan Beisken October 3, 2011
Mass Spectrometry Data Dimensionality retention time m/z ratio signal intensity Signals / Peaks fragments adducts isotopic peaks 13/18 Stephan Beisken October 3, 2011
Chromatogram and Spectra Analysis 14/18 Stephan Beisken October 3, 2011
Data Analysis Workflow Characteristics... highly modular arrays of algorithms Requires... manual tweaking manual analysis Challenges... compound identification meaningful visualisation 15/18 Stephan Beisken October 3, 2011
A Modular Framework for Compound Identification 16/18 Stephan Beisken October 3, 2011
Integration Nodes mzml Reader / Writer 2D & 3D Spectrum Viewer Profile to Centroid mode converter based on mzmldatatype Jmzml library (Jaxb) Challenges efficient data handling preservation of modularity i.e., how to store all information in an accessible and efficient way 17/18 Stephan Beisken October 3, 2011
Acknowledgements The Chemoinformatics and Metabolism Team Christoph Steinbeck The CDK Project Admins Egon Willighagen Miguel Rojas Christoph Steinbeck The KNIME Team Thorsten Meinl All CDK Developers & Contributors, Syngenta AG, The University of Cambridge Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, 2111-2120. Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and Computer Sciences 2003, 43, 493-500. 18/18 Stephan Beisken October 3, 2011