has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Size: px
Start display at page:

Download "has its own advantages and drawbacks, depending on the questions facing the drug discovery."

Transcription

1 2013 First International Conference on Artificial Intelligence, Modelling & Simulation Comparison of Similarity Coefficients for Chemical Database Retrieval Mukhsin Syuib School of Information Technology Faculty of Information Science and Technology Universiti Kebangsaan Malaysia Bangi, Selangor, Malaysia Shereena M. Arif School of Information Technology Faculty of Information Science and Technology Universiti Kebangsaan Malaysia Bangi, Selangor, Malaysia Nurul Malim Pusat Pengajian Sains Komputer Universiti Sains Malaysia Pulau Pinang, Malaysia Abstract-Similarity-based virtual screening is used in drug discovery by using computational model for rapid evaluation of large number of chemical molecules. Similarity searches use 2D or 3D fingerprints and similarity coefficient to calculate the structural resemblance between each molecule in a chemical database and a target structure. The objective of this work is to determine the best coefficient to be used in similarity searching to get the optimal results. This paper will describe the experiment to perform the molecular similarity searching using different similarity coefficients, which focus on 2D UNITY or ECFP4 fingerprint on 5 activity classes. We will also highlight the different similarity values and the optimal results of similarity measures. All this could depend on what type of fingerprint. As a conclusion, we found that every combination measure has its own advantage. But to look for the best possible results, the nature of molecular activity class could also play an important role. Keywords Chemoinformatics, virtual screening, 2D fingerprints, similarity measure. I. INTRODUCTION Chemoinformatics is a new discipline which emerged from several older disciplines such as computational chemistry, computer chemistry and chemical information (Xu et al, 2002). It involves the use of computer technology to process chemical data. What differentiates chemical data processing from other data processing is that chemical data involves the requirement to work with chemical structures. This requirement necessitated the introduction of special approaches to represent, store and retrieve structures in a computer system. According to Xu et al (2002), chemoinformatics have two aspects in drug discovery. First, it should be able to extract knowledge from large-scale raw high throughput screening databases in less time, and second, it should be able to provide efficient computational tools to predict ADMET properties (ie. a set of tests in drug discovery to determine if a lead can be a potential drug for human consumption). The searching of chemical library in silico is called virtual screening. A virtual screening is the method to boost the efficiency of lead-discovery programs in the pharmaceutical and agrochemicals industries (Werner et al, 2003). Similarity searching is one of the virtual screening methods used to find chemical structures from a known bioactive molecule, such as a hit from highthroughput screening experiment. This molecule hereafter referred to as the target structure, is then compared with each of the molecules in database 2D or 3D chemical structures by calculating a measure of the degree of structural resemblance between the target structure and the database structure. Virtual screening is always used to remove or separate the molecules which are not expected or desired from the library. By doing this, cost and time for drug discovery can be managed efficiently. Performing virtual screening at this early stage will reduce the number of compounds that will be investigated further in drug discovery. II. SIMILARITY COEFFICIENTS In search of the molecular similarities we have to specify attributes such as the following: a = number of bits where X is on and Y is off b = number of bits where X is off and Y is on c = number of bits where X and Y is on d = number of bits where X and Y is off n = total number of bits for a molecule. Example: X: Y: From above, we can get the attribute such as: c = 3 d = 0 a = 2 n = 9 b = 4 After we determined each attribute, we can use the similarity coefficient to find the pairwise similarity values. Next, we filter out the inactives compounds in the top-r ranked that we specify. Every similarity coefficient /13 $ IEEE DOI /AIMS

2 has its own advantages and drawbacks, depending on the questions facing the drug discovery. III. BINARY SIMILARITY COEFFICIENTS Descriptor of molecule can be binary (1,0) numeric or categorical. In chemoinformatics world, we call these descriptor strings fingerprint. Binary descriptors are especially useful, as there are highly efficient computer algorithms that work with binary strings. Figure 1 shows one example of hashing in binary descriptor, where one element can be represented by many bits and vice versa. algorithm to code path lengths of four bonds (ECFP4) or six bonds (ECFP6) or higher in length. We are using the commercial MDDR 2007 subscribed from Accelrys Inc (available from ECFP4 fingerprints are generated from the Pipeline Pilot software, which is the authoring tool for the Accelrys Enterprise Platform, while UNITY fingerprint has been generated using description from Tripos Inc (available from These molecular information databases provide all kinds of molecular structure, molecular weight, and other physical and chemical data (Zhang, 2007). MDDR 2007 contained 102,514 different molecules. TABLE I. LIST SIMILARITY COEFFICIENT IN 2D FINGERPRINT No Coefficient Formula Other Name 1 Tanimoto For Binary Known as Jaccard coefficient. 2 Cosine For Binary Known as Ochiai coefficient 3 Forbes For Binary None Figure 1. Binary Fingerprint (Source : Similarity coefficient is used to calculate the similarities between the reference and target fingerprint. There are many similarity coefficient derived from text retrieval field, also used in chemoinformatics. In this paper, we only describe seven coefficients that has been widely used in this field. Only three of these will be used here, which are Tanimoto, Russell-Rao and Euclidean Distance. According to Werner et al (2003), the Russell- Rao, Kulcynski and Forbes coefficients have been found to be effective for similarity searching in their laboratory, and they would appear to have a straightforward extension to continuous form. Thus, we use Russell-Rao in binary form (dichotomous), to test the hypothesis. UNITY fingerprints provide richer description than the classic fingerprint known as MACCS keys, which simply represent the absence or presence of a small library of functional groups. UNITY fingerprints incorporate a much broader range of features, which includes connected bond path fragments up to seven bonds long. The ECFP series of fingerprint used in Pipeline Pilot, on the other hand, use a different 4 Euclidean Distance For Binary None 5 Dice For Binary Known as Czekanowski coefficient or Sorenson coefficient 6 Russell- Rao For Binary None 7 Soergel Distance For Binary IV. EXPERIMENT None This work involves the use of the similarity measures described above on five activity classes. Table 2 shows the class of molecules used in this virtual screening experiment

3 TABLE II..LIST OF MDDR ACTIVITY CLASSES USED No Activity class name No of active molecules 1 5HT3 Antagonists Angiotensin II AT1 antagonists Thrombin inhibitors Substance P antagonist HT reuptake inhibitors 359 STEP 2 : TABLE IV. THE BASIC PROCEDURE FOR SIMILARITY SEARCHING Where n is total molecule in a particular activity class and N is total molecule in whole database. For i := 1 to n For j := 1 to N Calculate the similarity coefficient, by using i as the query and j as the reference database fingerprints. End For Read the next query fingerprint End For Sort the results in descending similarity values order. Take 1% of top rank Figure 2. Steps of experiment to get similarity activity class again whole database Figure 2 above illustrates the experiment flow on how 5HT activity class been used as the target structures against the whole database. The algorithm will compare the similarities from bit binary of the activity class with the database structures by applying the similarity coefficient. STEP 1: TABLE III. THE PSEUDOCODE TO RUN THE MAIN PROGRAM For i := 1 to X Copy element from database and paste to a temporary query array. Run C program. X is a member of a particular active class. Algorithm in Table III above shows the procedure to run C code with shell script. Firstly, the algorithm will loop following the number of members of a particular activity class. The code will then search the same compound id from this activity class in the database, copy and paste the information (compound id and binary descriptor) into the temporary query file. Finally it will apply the main algorithm for similarity search. Table IV shows the algorithm to calculate the similarity values for every molecule in an activity class against the whole database. After we get all the values, this algorithm will sort the results by decreasing order. We then take 1% of top ranking from the result and extract into a flat file. The next step is to compare the flat file with the activity class file. STEP 3: TABLE V. THE BASIC PROCEDURE TO DETERMINE TRUE POSITIVES For i:= 1 to P Open the file name in directory. Check compound_id from file target using activity class file. Calculate the total of the same compound id. P is list of file in directory The algorithm in Table 5 shows how to match the compound from file in folder directory with the activity class file. From that comparison, we can classify the true and false positives. From here, we can calculate the mean number of true positives retrieval for every activity class. V. RESULTS AND DISCUSSION To carry out similarity searching, we calculate similarity values for each actives in a particular activity class with each molecule in the MDDR database. Next, we rank the results in decreasing order of the values. For each actives, we take the top 1% compound id and generate the mean number of true positives to be used as experimental material. Below we discuss our findings based on the results sought

4 B. ECFP4 Fingerprint ECFP4 fingerprint use 1024 bit binary. Using this fingerprint, Tanimoto similarity coefficient remains the best. However, for Angiotensin activity, Russell Rao shows the most promising results. It has also been found by Todeschini et al (2012) who has conducted evaluations of similarity coefficient using simulated and real data. In that study, he found the considerable merits of the well-established Jaccard- Tanimoto coefficient. VI. CONCLUSIONS Figure 3. Chart about mean molecule is true positive every class with different similarity coefficients in UNITY A. UNITY Fingerprint UNITY fingerprint has 993 bit binary. From the result shown in Figure 3 below, we can see that Tanimoto similarity coefficient is the best coefficient across the activity classes used as sample. Exceptional is in the class of Angiotensin, where Tanimoto shows the least effcetiveness with mean value For Angiotensin activity class, Euclidean Distance shows the highest value of mean similarity Inspection of Table 6 show the results of the mean value for each class of molecule, where Tanimoto gave the highest mean value compared with the others. Exceptional can be seen for Angiotensin activity class, where Euclidean (Eu) gives the highest mean value, and Tanimoto was rated last. TABLE VI. MEAN FOR EVERY CLASS (UNITY) Activity Fingerprint Class UNITY Tan RR Eu 5HT Angiotensin Thrombin HT Substance P Figure 4. Chart about mean molecule is true positive every class with different similarity coefficients in ECFP4 From Table 7, we get a rather different result, where Tanimoto is still the best coefficient for 5HT, Thrombin, 5HT3, and Substance activity classes. In Angiotensin, the best coefficient is Russell-Rao (RR), and Euclidean was rated last. In conclusion, type of molecular fingerprint and activity class plays some part of the molecular similarity calculation. Nevertheless, in this experiment, we showed that Tanimoto similarity coefficient should be used in virtual screening to get the optimal result. For future work, we plan to experiment with different similarity coefficient and different chemical database to investigate further the parameters that can affect similarity searching performance

5 Activity Class TABLE VII. MEAN FOR EVERY CLASS (ECFP4) Fingerprint ECFP4 Tan RR Eu 5HT Angiotensin Thrombin HT Substance P ACKNOWLEDGEMENTS We would like to thank Dr Nurul Malim for comments and feedbacks on the manuscript. This work is jointly supported by the UKM Grant GGPM and USM Short Term Grant 304/PKOMP/ REFERENCES [1] Arif, S. M., Holliday, J. D., & Willett, P. (2013). Comparison of chemical similarity measures using different numbers of query structures. Journal of Information Science, 39(1), [2] Faver, J. C., Ucisik, M. N., Yang, W., & Merz, K. M. (2013). Computer-aided drug design: Using numbers to your advantage. ACS Medicinal Chemistry Letters, 4(9), [3] Green, D. V. S. (2008). Virtual screening of chemical libraries for drug discovery. Expert Opinion on Drug Discovery, 3(9), [4] Lavecchia, A., & Giovanni, C. D. (2013). Virtual screening strategies in drug discovery: A critical review. Current Medicinal Chemistry, 20(23), [5] Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., & Willett, P. (2012). Similarity coefficients for binary chemoinformatics data: Overview and extended comparison using simulated and real data sets. Journal of Chemical Information and Modeling, 52(11), [6] Willett, P. (2011). Similarity-based data mining in files of twodimensional chemical structures using fingerprint measures of molecular resemblance. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), [7] Willett, P., Barnard, J. M., & Downs, G. M. (1998). Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 38(6), [8] Xu, J., & Hagler, A. (2002). Chemoinformatics and drug discovery. Molecules, 7(8), [9] Figure binary fingerprint is available from : [last [10] ECFP4 fingerprint is available from: [last [11] UNITY fingerprint is available from: [last

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA http://www.ftsm.ukm.my/apjitm Asia-Pacific Journal of Information Technology and Multimedia Jurnal Teknologi Maklumat dan Multimedia Asia-Pasifik Vol. 7 No. 1, June 2018: 91-98 e-issn: 2289-2192 COMPARISON

More information

Similarity methods for ligandbased virtual screening

Similarity methods for ligandbased virtual screening Similarity methods for ligandbased virtual screening Peter Willett, University of Sheffield Computers in Scientific Discovery 5, 22 nd July 2010 Overview Molecular similarity and its use in virtual screening

More information

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection American Journal of Applied Sciences 8 (4): 368-373, 2011 ISSN 1546-9239 2010 Science Publications An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

More information

Universities of Leeds, Sheffield and York

Universities of Leeds, Sheffield and York promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Organic & Biomolecular

More information

Introduction. OntoChem

Introduction. OntoChem Introduction ntochem Providing drug discovery knowledge & small molecules... Supporting the task of medicinal chemistry Allows selecting best possible small molecule starting point From target to leads

More information

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW Mubarak Himmat 1, Naomie Salim 1, Ali Ahmed 1, 2, and Mohammed Mumtaz Al-Dabbagh 1 1 Faculty of Computing, University

More information

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics

More information

A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval

A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval Naomie Salim Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia naomie@fsksm.utm.my

More information

Introducing a Bioinformatics Similarity Search Solution

Introducing a Bioinformatics Similarity Search Solution Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute

More information

Cheminformatics analysis and learning in a data pipelining environment

Cheminformatics analysis and learning in a data pipelining environment Molecular Diversity (2006) 10: 283 299 DOI: 10.1007/s11030-006-9041-5 c Springer 2006 Review Cheminformatics analysis and learning in a data pipelining environment Moises Hassan 1,, Robert D. Brown 1,

More information

Reaxys Medicinal Chemistry Fact Sheet

Reaxys Medicinal Chemistry Fact Sheet R&D SOLUTIONS FOR PHARMA & LIFE SCIENCES Reaxys Medicinal Chemistry Fact Sheet Essential data for lead identification and optimization Reaxys Medicinal Chemistry empowers early discovery in drug development

More information

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-aturwissenschaftlichen Fakultät der Rheinischen

More information

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics... 1 1.1 Chemoinformatics... 2 1.1.1 Open-Source Tools... 2 1.1.2 Introduction to Programming Languages... 3 1.2 Chemical Structure

More information

DivCalc: A Utility for Diversity Analysis and Compound Sampling

DivCalc: A Utility for Diversity Analysis and Compound Sampling Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,

More information

Molecular Similarity Searching Using Inference Network

Molecular Similarity Searching Using Inference Network Molecular Similarity Searching Using Inference Network Ammar Abdo, Naomie Salim* Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia Molecular Similarity Searching Search for

More information

Universities of Leeds, Sheffield and York

Universities of Leeds, Sheffield and York promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Statistical Analysis

More information

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Computational chemical biology to address non-traditional drug targets. John Karanicolas Computational chemical biology to address non-traditional drug targets John Karanicolas Our computational toolbox Structure-based approaches Ligand-based approaches Detailed MD simulations 2D fingerprints

More information

Universities of Leeds, Sheffield and York

Universities of Leeds, Sheffield and York promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Quantitative structure

More information

Patent Searching using Bayesian Statistics

Patent Searching using Bayesian Statistics Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?

More information

An Integrated Approach to in-silico

An Integrated Approach to in-silico An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals

More information

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of

More information

This is a repository copy of Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.

This is a repository copy of Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision. This is a repository copy of Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/74399/

More information

Similarity Search. Uwe Koch

Similarity Search. Uwe Koch Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance

More information

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell Handling Human Interpreted Analytical Data Workflows for Pharmaceutical R&D Presented by Peter Russell 2011 Survey 88% of R&D organizations lack adequate systems to automatically collect data for reporting,

More information

The shortest path to chemistry data and literature

The shortest path to chemistry data and literature R&D SOLUTIONS Reaxys Fact Sheet The shortest path to chemistry data and literature Designed to support the full range of chemistry research, including pharmaceutical development, environmental health &

More information

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data 2 Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data Peter Willett Abstract This chapter reviews the techniques available for quantifying the effectiveness

More information

Universities of Leeds, Sheffield and York

Universities of Leeds, Sheffield and York promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Journal of Molecular

More information

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

1. Some examples of coping with Molecular informatics data legacy data (accuracy) Molecular Informatics Tools for Data Analysis and Discovery 1. Some examples of coping with Molecular informatics data legacy data (accuracy) 2. Database searching using a similarity approach fingerprints

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

Reaxys The Highlights

Reaxys The Highlights Reaxys The Highlights What is Reaxys? A brand new workflow solution for research chemists and scientists from related disciplines An extensive repository of reaction and substance property data A resource

More information

Chemical Similarity Searching

Chemical Similarity Searching J. Chem. Inf. Comput. Sci. 1998, 38, 983-996 983 Chemical Similarity Searching Peter Willett* Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Sheffield

More information

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a Retrieving hits through in silico screening and expert assessment M.. Drwal a,b and R. Griffith a a: School of Medical Sciences/Pharmacology, USW, Sydney, Australia b: Charité Berlin, Germany Abstract:

More information

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015 Ignasi Belda, PhD CEO HPC Advisory Council Spain Conference 2015 Business lines Molecular Modeling Services We carry out computational chemistry projects using our selfdeveloped and third party technologies

More information

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Iván Solt Solutions for Cheminformatics Drug Discovery Strategies for known targets High-Throughput Screening (HTS) Cells

More information

Clustering Ambiguity: An Overview

Clustering Ambiguity: An Overview Clustering Ambiguity: An Overview John D. MacCuish Norah E. MacCuish 3 rd Joint Sheffield Conference on Chemoinformatics April 23, 2004 Outline The Problem: Clustering Ambiguity and Chemoinformatics Preliminaries:

More information

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre The road to new drugs. How to find new hits? High Throughput

More information

Practical QSAR and Library Design: Advanced tools for research teams

Practical QSAR and Library Design: Advanced tools for research teams DS QSAR and Library Design Webinar Practical QSAR and Library Design: Advanced tools for research teams Reservationless-Plus Dial-In Number (US): (866) 519-8942 Reservationless-Plus International Dial-In

More information

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery Building innovative drug discovery alliances Just in KIME: Successful Process Driven Drug Discovery Berlin KIME Spring Summit, Feb 2016 Research Informatics @ Evotec Evotec s worldwide operations 2 Pharmaceuticals

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Chemical Databases: Encoding, Storage and Search of Chemical Structures Chemical Databases: Encoding, Storage and Search of Chemical Structures Dr. Timur I. Madzhidov Kazan Federal University, Department of Organic Chemistry * Ray, L.C. and R.A. Kirsch, Finding Chemical Records

More information

Interactive Feature Selection with

Interactive Feature Selection with Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.

More information

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Fast similarity searching making the virtual real. Stephen Pickett, GSK Fast similarity searching making the virtual real Stephen Pickett, GSK Introduction Introduction to similarity searching Use cases Why is speed so crucial? Why MadFast? Some performance stats Implementation

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures

More information

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification Knowledge and Information Systems (20XX) Vol. X: 1 29 c 20XX Springer-Verlag London Ltd. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification Nikil Wale Department of Computer

More information

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP

More information

Early Stages of Drug Discovery in the Pharmaceutical Industry

Early Stages of Drug Discovery in the Pharmaceutical Industry Early Stages of Drug Discovery in the Pharmaceutical Industry Daniel Seeliger / Jan Kriegl, Discovery Research, Boehringer Ingelheim September 29, 2016 Historical Drug Discovery From Accidential Discovery

More information

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters Drug Design 2 Oliver Kohlbacher Winter 2009/2010 11. QSAR Part 4: Selected Chapters Abt. Simulation biologischer Systeme WSI/ZBIT, Eberhard-Karls-Universität Tübingen Overview GRIND GRid-INDependent Descriptors

More information

In Silico Investigation of Off-Target Effects

In Silico Investigation of Off-Target Effects PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics

More information

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK Molecular Modelling Computational Chemistry Demystified Peter Bladon Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK John E. Gorton Gorton Systems, Glasgow, UK Robert B. Hammond Institute

More information

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Dr James Chisholm,* Dr John Barnard, Dr Julian Hayward, Dr Matthew Segall*, Mr Edmund Champness*, Dr Chris Leeding,* Mr Hector

More information

The Schrödinger KNIME extensions

The Schrödinger KNIME extensions The Schrödinger KNIME extensions Computational Chemistry and Cheminformatics in a workflow environment Jean-Christophe Mozziconacci Volker Eyrich Topics What are the Schrödinger extensions? Workflow application

More information

Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Bajusz et al. Journal of Cheminformatics (2015) 7:20 DOI 10.1186/s13321-015-0069-3 RESEARCH ARTICLE Open Access Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

More information

Chemoinformatics and Drug Discovery

Chemoinformatics and Drug Discovery Molecules 2002, 7, 566-600 molecules ISSN 1420-3049 http://www.mdpi.org Review: Chemoinformatics and Drug Discovery Jun Xu* and Arnold Hagler Discovery Partners International, Inc., 9640 Towne Center Drive,

More information

Machine learning for ligand-based virtual screening and chemogenomics!

Machine learning for ligand-based virtual screening and chemogenomics! Machine learning for ligand-based virtual screening and chemogenomics! Jean-Philippe Vert Institut Curie - INSERM U900 - Mines ParisTech In silico discovery of molecular probes and drug-like compounds:

More information

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös KIME-based scoring functions in Muse 3.0 KIME User Group Meeting 2013 Fabian Bös Certara Mission: End-to-End Model-Based Drug Development Certara was formed by acquiring and integrating Tripos, Pharsight,

More information

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.

More information

Catching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries

Catching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries Catching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries Benjamin Köhncke 1, Sascha Tönnies 1, Wolf-Tilo Balke 2 1 L3S Research Center; Hannover, Germany 2 TU Braunschweig, Germany

More information

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg) Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg) The purpose of this tutorial is to generate a library of potential inhibitors

More information

Correlation Analysis of Binary Similarity and Distance Measures on Different Binary Database Types

Correlation Analysis of Binary Similarity and Distance Measures on Different Binary Database Types Correlation Analysis of Binary Similarity and Distance Measures on Different Binary Database Types Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert Department of Computer Science, Pace University, New

More information

Reaxys Pipeline Pilot Components Installation and User Guide

Reaxys Pipeline Pilot Components Installation and User Guide 1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application

More information

Research Article. Chemical compound classification based on improved Max-Min kernel

Research Article. Chemical compound classification based on improved Max-Min kernel Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(2):368-372 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Chemical compound classification based on improved

More information

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity Studying the effect of noise on Laplacian-modified Bayesian nalysis and Tanimoto Similarity David Rogers, Ph.D. SciTegic, Inc. (Division of ccelrys, Inc.) drogers@scitegic.com Description of: nalysis methods

More information

Application Note 12: Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova

Application Note 12: Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova Application Note : Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova Paul Bowyer, Magritek, Inc. and Mark Dixon, Mestrelab Sample screening to verify the identity or integrity

More information

Design and Synthesis of the Comprehensive Fragment Library

Design and Synthesis of the Comprehensive Fragment Library YOUR INNOVATIVE CHEMISTRY PARTNER IN DRUG DISCOVERY Design and Synthesis of the Comprehensive Fragment Library A 3D Enabled Library for Medicinal Chemistry Discovery Warren S Wade 1, Kuei-Lin Chang 1,

More information

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Serge P. Parel, PhD ChemAxon User Group Meeting, Budapest 21 st May, 2014 Outline Exquiron Who

More information

CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS

CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS B. A. BUNIN Collaborative Drug Discovery, San Mateo, CA, U.S.A. B. SIESEL Merrill Lynch & Co., San Francisco,

More information

Using AutoDock for Virtual Screening

Using AutoDock for Virtual Screening Using AutoDock for Virtual Screening CUHK Croucher ASI Workshop 2011 Stefano Forli, PhD Prof. Arthur J. Olson, Ph.D Molecular Graphics Lab Screening and Virtual Screening The ultimate tool for identifying

More information

Integrated Cheminformatics to Guide Drug Discovery

Integrated Cheminformatics to Guide Drug Discovery Integrated Cheminformatics to Guide Drug Discovery Matthew Segall, Ed Champness, Peter Hunt, Tamsin Mansley CINF Drug Discovery Cheminformatics Approaches August 23 rd 2017 Optibrium, StarDrop, Auto-Modeller,

More information

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS PETER GUND Pharmacopeia Inc., CN 5350 Princeton, NJ 08543, USA pgund@pharmacop.com Empirical and theoretical approaches to drug discovery have often

More information

Pipeline Pilot Integration

Pipeline Pilot Integration Scientific & technical Presentation Pipeline Pilot Integration Szilárd Dóránt July 2009 The Component Collection: Quick facts Provides access to ChemAxon tools from Pipeline Pilot Free of charge Open source

More information

Kinome-wide Activity Models from Diverse High-Quality Datasets

Kinome-wide Activity Models from Diverse High-Quality Datasets Kinome-wide Activity Models from Diverse High-Quality Datasets Stephan C. Schürer*,1 and Steven M. Muskal 2 1 Department of Molecular and Cellular Pharmacology, Miller School of Medicine and Center for

More information

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver

More information

Using Self-Organizing maps to accelerate similarity search

Using Self-Organizing maps to accelerate similarity search YOU LOGO Using Self-Organizing maps to accelerate similarity search Fanny Bonachera, Gilles Marcou, Natalia Kireeva, Alexandre Varnek, Dragos Horvath Laboratoire d Infochimie, UM 7177. 1, rue Blaise Pascal,

More information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor

More information

Improving structural similarity based virtual screening using background knowledge

Improving structural similarity based virtual screening using background knowledge Girschick et al. Journal of Cheminformatics 2013, 5:50 RESEARCH ARTICLE Open Access Improving structural similarity based virtual screening using background knowledge Tobias Girschick 1, Lucia Puchbauer

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

TARGET-ORIENTED GENERIC FINGERPRINT-BASED MOLECULAR REPRESENTATION

TARGET-ORIENTED GENERIC FINGERPRINT-BASED MOLECULAR REPRESENTATION TARGET-ORIENTED GENERIC FINGERPRINT-BASED MOLECULAR REPRESENTATION Petr Skoda and David Hoksza Faculty of Mathematics and Physics, Charles University in Prague, Prague, Czech Republic skoda@ksi.mff.cuni.cz

More information

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Functional Group Fingerprints CNS Chemistry Wilmington, USA Functional Group Fingerprints CS Chemistry Wilmington, USA James R. Arnold Charles L. Lerman William F. Michne James R. Damewood American Chemical Society ational Meeting August, 2004 Philadelphia, PA

More information

Design and characterization of chemical space networks

Design and characterization of chemical space networks Design and characterization of chemical space networks Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-University Bonn 16 August 2015 Network representations of chemical spaces

More information

Author Index Volume

Author Index Volume Perspectives in Drug Discovery and Design, 20: 289, 2000. KLUWER/ESCOM Author Index Volume 20 2000 Bradshaw,J., 1 Knegtel,R.M.A., 191 Rose,P.W., 209 Briem, H., 231 Kostka, T., 245 Kuhn, L.A., 171 Sadowski,

More information

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit Alfonso Pozzan Computational and Analytical Chemistry Drug Design and Discovery Department

More information

Bioisosteres in Medicinal Chemistry

Bioisosteres in Medicinal Chemistry Edited by Nathan Brown Bioisosteres in Medicinal Chemistry VCH Verlag GmbH & Co. KGaA Contents List of Contributors Preface XV A Personal Foreword XI XVII Part One Principles 1 Bioisosterism in Medicinal

More information

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software CSD CSD-Enterprise Access the CSD and ALL CCDC application software CSD-Enterprise brings it all: access to the Cambridge Structural Database (CSD), the world s comprehensive and up-to-date database of

More information

This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules.

This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules. This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/8425/

More information

CSD. Unlock value from crystal structure information in the CSD

CSD. Unlock value from crystal structure information in the CSD CSD CSD-System Unlock value from crystal structure information in the CSD The Cambridge Structural Database (CSD) is the world s most comprehensive and up-todate knowledge base of crystal structure data,

More information

Molecular Clustering via Knowledge Mining from Biomedical Scientific Corpora

Molecular Clustering via Knowledge Mining from Biomedical Scientific Corpora FI Molecular Clustering via Knowledge Mining from Biomedical Scientific Corpora Panagiotis Hasapis, Dimitrios Ntalaperas, Christos C. Kannas, Aristos Aristodimou, Dimitrios Alexandrou, Thanassis Bouras,

More information

Introduction to Chemoinformatics

Introduction to Chemoinformatics Introduction to Chemoinformatics www.dq.fct.unl.pt/cadeiras/qc Prof. João Aires-de-Sousa Email: jas@fct.unl.pt Recommended reading Chemoinformatics - A Textbook, Johann Gasteiger and Thomas Engel, Wiley-VCH

More information

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining Samer Haidar 1, Zouhair Bouaziz 2, Christelle Marminon 2, Tiomo Laitinen 3, Anti Poso

More information

Mixture of metrics optimization for machine learning problems

Mixture of metrics optimization for machine learning problems machine learning and Marek mieja Faculty of Mathematics and Computer Science, Jagiellonian University TFML 2015 B dlewo, February 16-21 How to select data representation and metric for a given data set?

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value Anthony Arvanites Daylight User Group Meeting March 10, 2005 Outline 1. Company Introduction

More information

Physical Chemistry Final Take Home Fall 2003

Physical Chemistry Final Take Home Fall 2003 Physical Chemistry Final Take Home Fall 2003 Do one of the following questions. These projects are worth 30 points (i.e. equivalent to about two problems on the final). Each of the computational problems

More information

Open PHACTS Explorer: Compound by Name

Open PHACTS Explorer: Compound by Name Open PHACTS Explorer: Compound by Name This document is a tutorial for obtaining compound information in Open PHACTS Explorer (explorer.openphacts.org). Features: One-click access to integrated compound

More information

How IJC is Adding Value to a Molecular Design Business

How IJC is Adding Value to a Molecular Design Business How IJC is Adding Value to a Molecular Design Business James Mills Sexis LLP ChemAxon TechTalk Stevenage, ov 2012 james.mills@sexis.co.uk Overview Introduction to Sexis Sexis IJC use cases Data visualisation

More information

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National

More information

Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors. Robert D. Clark 2004 Tripos, Inc.

Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors. Robert D. Clark 2004 Tripos, Inc. Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors Robert D. Clark Tripos, Inc. bclark@tripos.com 2004 Tripos, Inc. Outline Background o history o mechanics Finding appropriate

More information

De Novo molecular design with Deep Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning De Novo molecular design with Deep Reinforcement Learning @olexandr Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill olexandr@unc.edu http://olexandrisayev.com About me Ph.D. in Chemistry

More information

Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification. Technical Report

Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification. Technical Report Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200

More information

arxiv: v1 [cs.ds] 25 Jan 2016

arxiv: v1 [cs.ds] 25 Jan 2016 A Novel Graph-based Approach for Determining Molecular Similarity Maritza Hernandez 1, Arman Zaribafiyan 1,2, Maliheh Aramon 1, and Mohammad Naghibi 3 1 1QB Information Technologies (1QBit), Vancouver,

More information