has its own advantages and drawbacks, depending on the questions facing the drug discovery.
|
|
- Eugene Lee
- 5 years ago
- Views:
Transcription
1 2013 First International Conference on Artificial Intelligence, Modelling & Simulation Comparison of Similarity Coefficients for Chemical Database Retrieval Mukhsin Syuib School of Information Technology Faculty of Information Science and Technology Universiti Kebangsaan Malaysia Bangi, Selangor, Malaysia Shereena M. Arif School of Information Technology Faculty of Information Science and Technology Universiti Kebangsaan Malaysia Bangi, Selangor, Malaysia Nurul Malim Pusat Pengajian Sains Komputer Universiti Sains Malaysia Pulau Pinang, Malaysia Abstract-Similarity-based virtual screening is used in drug discovery by using computational model for rapid evaluation of large number of chemical molecules. Similarity searches use 2D or 3D fingerprints and similarity coefficient to calculate the structural resemblance between each molecule in a chemical database and a target structure. The objective of this work is to determine the best coefficient to be used in similarity searching to get the optimal results. This paper will describe the experiment to perform the molecular similarity searching using different similarity coefficients, which focus on 2D UNITY or ECFP4 fingerprint on 5 activity classes. We will also highlight the different similarity values and the optimal results of similarity measures. All this could depend on what type of fingerprint. As a conclusion, we found that every combination measure has its own advantage. But to look for the best possible results, the nature of molecular activity class could also play an important role. Keywords Chemoinformatics, virtual screening, 2D fingerprints, similarity measure. I. INTRODUCTION Chemoinformatics is a new discipline which emerged from several older disciplines such as computational chemistry, computer chemistry and chemical information (Xu et al, 2002). It involves the use of computer technology to process chemical data. What differentiates chemical data processing from other data processing is that chemical data involves the requirement to work with chemical structures. This requirement necessitated the introduction of special approaches to represent, store and retrieve structures in a computer system. According to Xu et al (2002), chemoinformatics have two aspects in drug discovery. First, it should be able to extract knowledge from large-scale raw high throughput screening databases in less time, and second, it should be able to provide efficient computational tools to predict ADMET properties (ie. a set of tests in drug discovery to determine if a lead can be a potential drug for human consumption). The searching of chemical library in silico is called virtual screening. A virtual screening is the method to boost the efficiency of lead-discovery programs in the pharmaceutical and agrochemicals industries (Werner et al, 2003). Similarity searching is one of the virtual screening methods used to find chemical structures from a known bioactive molecule, such as a hit from highthroughput screening experiment. This molecule hereafter referred to as the target structure, is then compared with each of the molecules in database 2D or 3D chemical structures by calculating a measure of the degree of structural resemblance between the target structure and the database structure. Virtual screening is always used to remove or separate the molecules which are not expected or desired from the library. By doing this, cost and time for drug discovery can be managed efficiently. Performing virtual screening at this early stage will reduce the number of compounds that will be investigated further in drug discovery. II. SIMILARITY COEFFICIENTS In search of the molecular similarities we have to specify attributes such as the following: a = number of bits where X is on and Y is off b = number of bits where X is off and Y is on c = number of bits where X and Y is on d = number of bits where X and Y is off n = total number of bits for a molecule. Example: X: Y: From above, we can get the attribute such as: c = 3 d = 0 a = 2 n = 9 b = 4 After we determined each attribute, we can use the similarity coefficient to find the pairwise similarity values. Next, we filter out the inactives compounds in the top-r ranked that we specify. Every similarity coefficient /13 $ IEEE DOI /AIMS
2 has its own advantages and drawbacks, depending on the questions facing the drug discovery. III. BINARY SIMILARITY COEFFICIENTS Descriptor of molecule can be binary (1,0) numeric or categorical. In chemoinformatics world, we call these descriptor strings fingerprint. Binary descriptors are especially useful, as there are highly efficient computer algorithms that work with binary strings. Figure 1 shows one example of hashing in binary descriptor, where one element can be represented by many bits and vice versa. algorithm to code path lengths of four bonds (ECFP4) or six bonds (ECFP6) or higher in length. We are using the commercial MDDR 2007 subscribed from Accelrys Inc (available from ECFP4 fingerprints are generated from the Pipeline Pilot software, which is the authoring tool for the Accelrys Enterprise Platform, while UNITY fingerprint has been generated using description from Tripos Inc (available from These molecular information databases provide all kinds of molecular structure, molecular weight, and other physical and chemical data (Zhang, 2007). MDDR 2007 contained 102,514 different molecules. TABLE I. LIST SIMILARITY COEFFICIENT IN 2D FINGERPRINT No Coefficient Formula Other Name 1 Tanimoto For Binary Known as Jaccard coefficient. 2 Cosine For Binary Known as Ochiai coefficient 3 Forbes For Binary None Figure 1. Binary Fingerprint (Source : Similarity coefficient is used to calculate the similarities between the reference and target fingerprint. There are many similarity coefficient derived from text retrieval field, also used in chemoinformatics. In this paper, we only describe seven coefficients that has been widely used in this field. Only three of these will be used here, which are Tanimoto, Russell-Rao and Euclidean Distance. According to Werner et al (2003), the Russell- Rao, Kulcynski and Forbes coefficients have been found to be effective for similarity searching in their laboratory, and they would appear to have a straightforward extension to continuous form. Thus, we use Russell-Rao in binary form (dichotomous), to test the hypothesis. UNITY fingerprints provide richer description than the classic fingerprint known as MACCS keys, which simply represent the absence or presence of a small library of functional groups. UNITY fingerprints incorporate a much broader range of features, which includes connected bond path fragments up to seven bonds long. The ECFP series of fingerprint used in Pipeline Pilot, on the other hand, use a different 4 Euclidean Distance For Binary None 5 Dice For Binary Known as Czekanowski coefficient or Sorenson coefficient 6 Russell- Rao For Binary None 7 Soergel Distance For Binary IV. EXPERIMENT None This work involves the use of the similarity measures described above on five activity classes. Table 2 shows the class of molecules used in this virtual screening experiment
3 TABLE II..LIST OF MDDR ACTIVITY CLASSES USED No Activity class name No of active molecules 1 5HT3 Antagonists Angiotensin II AT1 antagonists Thrombin inhibitors Substance P antagonist HT reuptake inhibitors 359 STEP 2 : TABLE IV. THE BASIC PROCEDURE FOR SIMILARITY SEARCHING Where n is total molecule in a particular activity class and N is total molecule in whole database. For i := 1 to n For j := 1 to N Calculate the similarity coefficient, by using i as the query and j as the reference database fingerprints. End For Read the next query fingerprint End For Sort the results in descending similarity values order. Take 1% of top rank Figure 2. Steps of experiment to get similarity activity class again whole database Figure 2 above illustrates the experiment flow on how 5HT activity class been used as the target structures against the whole database. The algorithm will compare the similarities from bit binary of the activity class with the database structures by applying the similarity coefficient. STEP 1: TABLE III. THE PSEUDOCODE TO RUN THE MAIN PROGRAM For i := 1 to X Copy element from database and paste to a temporary query array. Run C program. X is a member of a particular active class. Algorithm in Table III above shows the procedure to run C code with shell script. Firstly, the algorithm will loop following the number of members of a particular activity class. The code will then search the same compound id from this activity class in the database, copy and paste the information (compound id and binary descriptor) into the temporary query file. Finally it will apply the main algorithm for similarity search. Table IV shows the algorithm to calculate the similarity values for every molecule in an activity class against the whole database. After we get all the values, this algorithm will sort the results by decreasing order. We then take 1% of top ranking from the result and extract into a flat file. The next step is to compare the flat file with the activity class file. STEP 3: TABLE V. THE BASIC PROCEDURE TO DETERMINE TRUE POSITIVES For i:= 1 to P Open the file name in directory. Check compound_id from file target using activity class file. Calculate the total of the same compound id. P is list of file in directory The algorithm in Table 5 shows how to match the compound from file in folder directory with the activity class file. From that comparison, we can classify the true and false positives. From here, we can calculate the mean number of true positives retrieval for every activity class. V. RESULTS AND DISCUSSION To carry out similarity searching, we calculate similarity values for each actives in a particular activity class with each molecule in the MDDR database. Next, we rank the results in decreasing order of the values. For each actives, we take the top 1% compound id and generate the mean number of true positives to be used as experimental material. Below we discuss our findings based on the results sought
4 B. ECFP4 Fingerprint ECFP4 fingerprint use 1024 bit binary. Using this fingerprint, Tanimoto similarity coefficient remains the best. However, for Angiotensin activity, Russell Rao shows the most promising results. It has also been found by Todeschini et al (2012) who has conducted evaluations of similarity coefficient using simulated and real data. In that study, he found the considerable merits of the well-established Jaccard- Tanimoto coefficient. VI. CONCLUSIONS Figure 3. Chart about mean molecule is true positive every class with different similarity coefficients in UNITY A. UNITY Fingerprint UNITY fingerprint has 993 bit binary. From the result shown in Figure 3 below, we can see that Tanimoto similarity coefficient is the best coefficient across the activity classes used as sample. Exceptional is in the class of Angiotensin, where Tanimoto shows the least effcetiveness with mean value For Angiotensin activity class, Euclidean Distance shows the highest value of mean similarity Inspection of Table 6 show the results of the mean value for each class of molecule, where Tanimoto gave the highest mean value compared with the others. Exceptional can be seen for Angiotensin activity class, where Euclidean (Eu) gives the highest mean value, and Tanimoto was rated last. TABLE VI. MEAN FOR EVERY CLASS (UNITY) Activity Fingerprint Class UNITY Tan RR Eu 5HT Angiotensin Thrombin HT Substance P Figure 4. Chart about mean molecule is true positive every class with different similarity coefficients in ECFP4 From Table 7, we get a rather different result, where Tanimoto is still the best coefficient for 5HT, Thrombin, 5HT3, and Substance activity classes. In Angiotensin, the best coefficient is Russell-Rao (RR), and Euclidean was rated last. In conclusion, type of molecular fingerprint and activity class plays some part of the molecular similarity calculation. Nevertheless, in this experiment, we showed that Tanimoto similarity coefficient should be used in virtual screening to get the optimal result. For future work, we plan to experiment with different similarity coefficient and different chemical database to investigate further the parameters that can affect similarity searching performance
5 Activity Class TABLE VII. MEAN FOR EVERY CLASS (ECFP4) Fingerprint ECFP4 Tan RR Eu 5HT Angiotensin Thrombin HT Substance P ACKNOWLEDGEMENTS We would like to thank Dr Nurul Malim for comments and feedbacks on the manuscript. This work is jointly supported by the UKM Grant GGPM and USM Short Term Grant 304/PKOMP/ REFERENCES [1] Arif, S. M., Holliday, J. D., & Willett, P. (2013). Comparison of chemical similarity measures using different numbers of query structures. Journal of Information Science, 39(1), [2] Faver, J. C., Ucisik, M. N., Yang, W., & Merz, K. M. (2013). Computer-aided drug design: Using numbers to your advantage. ACS Medicinal Chemistry Letters, 4(9), [3] Green, D. V. S. (2008). Virtual screening of chemical libraries for drug discovery. Expert Opinion on Drug Discovery, 3(9), [4] Lavecchia, A., & Giovanni, C. D. (2013). Virtual screening strategies in drug discovery: A critical review. Current Medicinal Chemistry, 20(23), [5] Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., & Willett, P. (2012). Similarity coefficients for binary chemoinformatics data: Overview and extended comparison using simulated and real data sets. Journal of Chemical Information and Modeling, 52(11), [6] Willett, P. (2011). Similarity-based data mining in files of twodimensional chemical structures using fingerprint measures of molecular resemblance. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), [7] Willett, P., Barnard, J. M., & Downs, G. M. (1998). Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 38(6), [8] Xu, J., & Hagler, A. (2002). Chemoinformatics and drug discovery. Molecules, 7(8), [9] Figure binary fingerprint is available from : [last [10] ECFP4 fingerprint is available from: [last [11] UNITY fingerprint is available from: [last
COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA
http://www.ftsm.ukm.my/apjitm Asia-Pacific Journal of Information Technology and Multimedia Jurnal Teknologi Maklumat dan Multimedia Asia-Pasifik Vol. 7 No. 1, June 2018: 91-98 e-issn: 2289-2192 COMPARISON
More informationSimilarity methods for ligandbased virtual screening
Similarity methods for ligandbased virtual screening Peter Willett, University of Sheffield Computers in Scientific Discovery 5, 22 nd July 2010 Overview Molecular similarity and its use in virtual screening
More informationAn Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection
American Journal of Applied Sciences 8 (4): 368-373, 2011 ISSN 1546-9239 2010 Science Publications An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection
More informationUniversities of Leeds, Sheffield and York
promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Organic & Biomolecular
More informationIntroduction. OntoChem
Introduction ntochem Providing drug discovery knowledge & small molecules... Supporting the task of medicinal chemistry Allows selecting best possible small molecule starting point From target to leads
More informationDATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW
DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW Mubarak Himmat 1, Naomie Salim 1, Ali Ahmed 1, 2, and Mohammed Mumtaz Al-Dabbagh 1 1 Faculty of Computing, University
More informationChemoinformatics and information management. Peter Willett, University of Sheffield, UK
Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics
More informationA Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval
A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval Naomie Salim Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia naomie@fsksm.utm.my
More informationIntroducing a Bioinformatics Similarity Search Solution
Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute
More informationCheminformatics analysis and learning in a data pipelining environment
Molecular Diversity (2006) 10: 283 299 DOI: 10.1007/s11030-006-9041-5 c Springer 2006 Review Cheminformatics analysis and learning in a data pipelining environment Moises Hassan 1,, Robert D. Brown 1,
More informationReaxys Medicinal Chemistry Fact Sheet
R&D SOLUTIONS FOR PHARMA & LIFE SCIENCES Reaxys Medicinal Chemistry Fact Sheet Essential data for lead identification and optimization Reaxys Medicinal Chemistry empowers early discovery in drug development
More informationMolecular Complexity Effects and Fingerprint-Based Similarity Search Strategies
Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-aturwissenschaftlichen Fakultät der Rheinischen
More informationContents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics
Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics... 1 1.1 Chemoinformatics... 2 1.1.1 Open-Source Tools... 2 1.1.2 Introduction to Programming Languages... 3 1.2 Chemical Structure
More informationDivCalc: A Utility for Diversity Analysis and Compound Sampling
Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,
More informationMolecular Similarity Searching Using Inference Network
Molecular Similarity Searching Using Inference Network Ammar Abdo, Naomie Salim* Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia Molecular Similarity Searching Search for
More informationUniversities of Leeds, Sheffield and York
promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Statistical Analysis
More informationComputational chemical biology to address non-traditional drug targets. John Karanicolas
Computational chemical biology to address non-traditional drug targets John Karanicolas Our computational toolbox Structure-based approaches Ligand-based approaches Detailed MD simulations 2D fingerprints
More informationUniversities of Leeds, Sheffield and York
promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Quantitative structure
More informationPatent Searching using Bayesian Statistics
Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?
More informationAn Integrated Approach to in-silico
An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals
More informationMolecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.
Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of
More informationThis is a repository copy of Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.
This is a repository copy of Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/74399/
More informationSimilarity Search. Uwe Koch
Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance
More informationHandling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell
Handling Human Interpreted Analytical Data Workflows for Pharmaceutical R&D Presented by Peter Russell 2011 Survey 88% of R&D organizations lack adequate systems to automatically collect data for reporting,
More informationThe shortest path to chemistry data and literature
R&D SOLUTIONS Reaxys Fact Sheet The shortest path to chemistry data and literature Designed to support the full range of chemistry research, including pharmaceutical development, environmental health &
More informationEvaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data
2 Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data Peter Willett Abstract This chapter reviews the techniques available for quantifying the effectiveness
More informationUniversities of Leeds, Sheffield and York
promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Journal of Molecular
More information1. Some examples of coping with Molecular informatics data legacy data (accuracy)
Molecular Informatics Tools for Data Analysis and Discovery 1. Some examples of coping with Molecular informatics data legacy data (accuracy) 2. Database searching using a similarity approach fingerprints
More informationData Mining in the Chemical Industry. Overview of presentation
Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry
More informationReaxys The Highlights
Reaxys The Highlights What is Reaxys? A brand new workflow solution for research chemists and scientists from related disciplines An extensive repository of reaction and substance property data A resource
More informationChemical Similarity Searching
J. Chem. Inf. Comput. Sci. 1998, 38, 983-996 983 Chemical Similarity Searching Peter Willett* Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Sheffield
More informationRetrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a
Retrieving hits through in silico screening and expert assessment M.. Drwal a,b and R. Griffith a a: School of Medical Sciences/Pharmacology, USW, Sydney, Australia b: Charité Berlin, Germany Abstract:
More informationIgnasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015
Ignasi Belda, PhD CEO HPC Advisory Council Spain Conference 2015 Business lines Molecular Modeling Services We carry out computational chemistry projects using our selfdeveloped and third party technologies
More informationVirtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME
Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Iván Solt Solutions for Cheminformatics Drug Discovery Strategies for known targets High-Throughput Screening (HTS) Cells
More informationClustering Ambiguity: An Overview
Clustering Ambiguity: An Overview John D. MacCuish Norah E. MacCuish 3 rd Joint Sheffield Conference on Chemoinformatics April 23, 2004 Outline The Problem: Clustering Ambiguity and Chemoinformatics Preliminaries:
More informationDr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre
Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre The road to new drugs. How to find new hits? High Throughput
More informationPractical QSAR and Library Design: Advanced tools for research teams
DS QSAR and Library Design Webinar Practical QSAR and Library Design: Advanced tools for research teams Reservationless-Plus Dial-In Number (US): (866) 519-8942 Reservationless-Plus International Dial-In
More informationBuilding innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery
Building innovative drug discovery alliances Just in KIME: Successful Process Driven Drug Discovery Berlin KIME Spring Summit, Feb 2016 Research Informatics @ Evotec Evotec s worldwide operations 2 Pharmaceuticals
More informationIntroduction to Chemoinformatics and Drug Discovery
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.
More informationChemical Databases: Encoding, Storage and Search of Chemical Structures
Chemical Databases: Encoding, Storage and Search of Chemical Structures Dr. Timur I. Madzhidov Kazan Federal University, Department of Organic Chemistry * Ray, L.C. and R.A. Kirsch, Finding Chemical Records
More informationInteractive Feature Selection with
Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.
More informationFast similarity searching making the virtual real. Stephen Pickett, GSK
Fast similarity searching making the virtual real Stephen Pickett, GSK Introduction Introduction to similarity searching Use cases Why is speed so crucial? Why MadFast? Some performance stats Implementation
More informationChemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller
Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics
More informationNavigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland
Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures
More informationComparison of Descriptor Spaces for Chemical Compound Retrieval and Classification
Knowledge and Information Systems (20XX) Vol. X: 1 29 c 20XX Springer-Verlag London Ltd. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification Nikil Wale Department of Computer
More informationbcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012
bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP
More informationEarly Stages of Drug Discovery in the Pharmaceutical Industry
Early Stages of Drug Discovery in the Pharmaceutical Industry Daniel Seeliger / Jan Kriegl, Discovery Research, Boehringer Ingelheim September 29, 2016 Historical Drug Discovery From Accidential Discovery
More informationDrug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters
Drug Design 2 Oliver Kohlbacher Winter 2009/2010 11. QSAR Part 4: Selected Chapters Abt. Simulation biologischer Systeme WSI/ZBIT, Eberhard-Karls-Universität Tübingen Overview GRIND GRid-INDependent Descriptors
More informationIn Silico Investigation of Off-Target Effects
PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics
More informationMolecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK
Molecular Modelling Computational Chemistry Demystified Peter Bladon Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK John E. Gorton Gorton Systems, Glasgow, UK Robert B. Hammond Institute
More informationApplying Bioisosteric Transformations to Predict Novel, High Quality Compounds
Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Dr James Chisholm,* Dr John Barnard, Dr Julian Hayward, Dr Matthew Segall*, Mr Edmund Champness*, Dr Chris Leeding,* Mr Hector
More informationThe Schrödinger KNIME extensions
The Schrödinger KNIME extensions Computational Chemistry and Cheminformatics in a workflow environment Jean-Christophe Mozziconacci Volker Eyrich Topics What are the Schrödinger extensions? Workflow application
More informationWhy is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
Bajusz et al. Journal of Cheminformatics (2015) 7:20 DOI 10.1186/s13321-015-0069-3 RESEARCH ARTICLE Open Access Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
More informationChemoinformatics and Drug Discovery
Molecules 2002, 7, 566-600 molecules ISSN 1420-3049 http://www.mdpi.org Review: Chemoinformatics and Drug Discovery Jun Xu* and Arnold Hagler Discovery Partners International, Inc., 9640 Towne Center Drive,
More informationMachine learning for ligand-based virtual screening and chemogenomics!
Machine learning for ligand-based virtual screening and chemogenomics! Jean-Philippe Vert Institut Curie - INSERM U900 - Mines ParisTech In silico discovery of molecular probes and drug-like compounds:
More informationKNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös
KIME-based scoring functions in Muse 3.0 KIME User Group Meeting 2013 Fabian Bös Certara Mission: End-to-End Model-Based Drug Development Certara was formed by acquiring and integrating Tripos, Pharsight,
More informationA Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors
A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.
More informationCatching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries
Catching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries Benjamin Köhncke 1, Sascha Tönnies 1, Wolf-Tilo Balke 2 1 L3S Research Center; Hannover, Germany 2 TU Braunschweig, Germany
More informationTutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)
Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg) The purpose of this tutorial is to generate a library of potential inhibitors
More informationCorrelation Analysis of Binary Similarity and Distance Measures on Different Binary Database Types
Correlation Analysis of Binary Similarity and Distance Measures on Different Binary Database Types Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert Department of Computer Science, Pace University, New
More informationReaxys Pipeline Pilot Components Installation and User Guide
1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application
More informationResearch Article. Chemical compound classification based on improved Max-Min kernel
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(2):368-372 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Chemical compound classification based on improved
More informationStudying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity
Studying the effect of noise on Laplacian-modified Bayesian nalysis and Tanimoto Similarity David Rogers, Ph.D. SciTegic, Inc. (Division of ccelrys, Inc.) drogers@scitegic.com Description of: nalysis methods
More informationApplication Note 12: Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova
Application Note : Fully Automated Compound Screening and Verification Using Spinsolve and MestReNova Paul Bowyer, Magritek, Inc. and Mark Dixon, Mestrelab Sample screening to verify the identity or integrity
More informationDesign and Synthesis of the Comprehensive Fragment Library
YOUR INNOVATIVE CHEMISTRY PARTNER IN DRUG DISCOVERY Design and Synthesis of the Comprehensive Fragment Library A 3D Enabled Library for Medicinal Chemistry Discovery Warren S Wade 1, Kuei-Lin Chang 1,
More informationFarewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology
Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Serge P. Parel, PhD ChemAxon User Group Meeting, Budapest 21 st May, 2014 Outline Exquiron Who
More informationCHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS
CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS CHEMOINFORMATICS: THEORY, PRACTICE, & PRODUCTS B. A. BUNIN Collaborative Drug Discovery, San Mateo, CA, U.S.A. B. SIESEL Merrill Lynch & Co., San Francisco,
More informationUsing AutoDock for Virtual Screening
Using AutoDock for Virtual Screening CUHK Croucher ASI Workshop 2011 Stefano Forli, PhD Prof. Arthur J. Olson, Ph.D Molecular Graphics Lab Screening and Virtual Screening The ultimate tool for identifying
More informationIntegrated Cheminformatics to Guide Drug Discovery
Integrated Cheminformatics to Guide Drug Discovery Matthew Segall, Ed Champness, Peter Hunt, Tamsin Mansley CINF Drug Discovery Cheminformatics Approaches August 23 rd 2017 Optibrium, StarDrop, Auto-Modeller,
More informationEMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS
EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS PETER GUND Pharmacopeia Inc., CN 5350 Princeton, NJ 08543, USA pgund@pharmacop.com Empirical and theoretical approaches to drug discovery have often
More informationPipeline Pilot Integration
Scientific & technical Presentation Pipeline Pilot Integration Szilárd Dóránt July 2009 The Component Collection: Quick facts Provides access to ChemAxon tools from Pipeline Pilot Free of charge Open source
More informationKinome-wide Activity Models from Diverse High-Quality Datasets
Kinome-wide Activity Models from Diverse High-Quality Datasets Stephan C. Schürer*,1 and Steven M. Muskal 2 1 Department of Molecular and Cellular Pharmacology, Miller School of Medicine and Center for
More informationAnalysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing
Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver
More informationUsing Self-Organizing maps to accelerate similarity search
YOU LOGO Using Self-Organizing maps to accelerate similarity search Fanny Bonachera, Gilles Marcou, Natalia Kireeva, Alexandre Varnek, Dragos Horvath Laboratoire d Infochimie, UM 7177. 1, rue Blaise Pascal,
More informationQSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression
APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor
More informationImproving structural similarity based virtual screening using background knowledge
Girschick et al. Journal of Cheminformatics 2013, 5:50 RESEARCH ARTICLE Open Access Improving structural similarity based virtual screening using background knowledge Tobias Girschick 1, Lucia Puchbauer
More informationIn silico pharmacology for drug discovery
In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of
More informationTARGET-ORIENTED GENERIC FINGERPRINT-BASED MOLECULAR REPRESENTATION
TARGET-ORIENTED GENERIC FINGERPRINT-BASED MOLECULAR REPRESENTATION Petr Skoda and David Hoksza Faculty of Mathematics and Physics, Charles University in Prague, Prague, Czech Republic skoda@ksi.mff.cuni.cz
More informationFunctional Group Fingerprints CNS Chemistry Wilmington, USA
Functional Group Fingerprints CS Chemistry Wilmington, USA James R. Arnold Charles L. Lerman William F. Michne James R. Damewood American Chemical Society ational Meeting August, 2004 Philadelphia, PA
More informationDesign and characterization of chemical space networks
Design and characterization of chemical space networks Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-University Bonn 16 August 2015 Network representations of chemical spaces
More informationAuthor Index Volume
Perspectives in Drug Discovery and Design, 20: 289, 2000. KLUWER/ESCOM Author Index Volume 20 2000 Bradshaw,J., 1 Knegtel,R.M.A., 191 Rose,P.W., 209 Briem, H., 231 Kostka, T., 245 Kuhn, L.A., 171 Sadowski,
More informationExpanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit
Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit Alfonso Pozzan Computational and Analytical Chemistry Drug Design and Discovery Department
More informationBioisosteres in Medicinal Chemistry
Edited by Nathan Brown Bioisosteres in Medicinal Chemistry VCH Verlag GmbH & Co. KGaA Contents List of Contributors Preface XV A Personal Foreword XI XVII Part One Principles 1 Bioisosterism in Medicinal
More informationCSD. CSD-Enterprise. Access the CSD and ALL CCDC application software
CSD CSD-Enterprise Access the CSD and ALL CCDC application software CSD-Enterprise brings it all: access to the Cambridge Structural Database (CSD), the world s comprehensive and up-to-date database of
More informationThis is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules.
This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/8425/
More informationCSD. Unlock value from crystal structure information in the CSD
CSD CSD-System Unlock value from crystal structure information in the CSD The Cambridge Structural Database (CSD) is the world s most comprehensive and up-todate knowledge base of crystal structure data,
More informationMolecular Clustering via Knowledge Mining from Biomedical Scientific Corpora
FI Molecular Clustering via Knowledge Mining from Biomedical Scientific Corpora Panagiotis Hasapis, Dimitrios Ntalaperas, Christos C. Kannas, Aristos Aristodimou, Dimitrios Alexandrou, Thanassis Bouras,
More informationIntroduction to Chemoinformatics
Introduction to Chemoinformatics www.dq.fct.unl.pt/cadeiras/qc Prof. João Aires-de-Sousa Email: jas@fct.unl.pt Recommended reading Chemoinformatics - A Textbook, Johann Gasteiger and Thomas Engel, Wiley-VCH
More informationDevelopment of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining
Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining Samer Haidar 1, Zouhair Bouaziz 2, Christelle Marminon 2, Tiomo Laitinen 3, Anti Poso
More informationMixture of metrics optimization for machine learning problems
machine learning and Marek mieja Faculty of Mathematics and Computer Science, Jagiellonian University TFML 2015 B dlewo, February 16-21 How to select data representation and metric for a given data set?
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationRapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value
Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value Anthony Arvanites Daylight User Group Meeting March 10, 2005 Outline 1. Company Introduction
More informationPhysical Chemistry Final Take Home Fall 2003
Physical Chemistry Final Take Home Fall 2003 Do one of the following questions. These projects are worth 30 points (i.e. equivalent to about two problems on the final). Each of the computational problems
More informationOpen PHACTS Explorer: Compound by Name
Open PHACTS Explorer: Compound by Name This document is a tutorial for obtaining compound information in Open PHACTS Explorer (explorer.openphacts.org). Features: One-click access to integrated compound
More informationHow IJC is Adding Value to a Molecular Design Business
How IJC is Adding Value to a Molecular Design Business James Mills Sexis LLP ChemAxon TechTalk Stevenage, ov 2012 james.mills@sexis.co.uk Overview Introduction to Sexis Sexis IJC use cases Data visualisation
More informationNext Generation Computational Chemistry Tools to Predict Toxicity of CWAs
Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National
More informationCharacterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors. Robert D. Clark 2004 Tripos, Inc.
Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors Robert D. Clark Tripos, Inc. bclark@tripos.com 2004 Tripos, Inc. Outline Background o history o mechanics Finding appropriate
More informationDe Novo molecular design with Deep Reinforcement Learning
De Novo molecular design with Deep Reinforcement Learning @olexandr Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill olexandr@unc.edu http://olexandrisayev.com About me Ph.D. in Chemistry
More informationAcyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification. Technical Report
Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200
More informationarxiv: v1 [cs.ds] 25 Jan 2016
A Novel Graph-based Approach for Determining Molecular Similarity Maritza Hernandez 1, Arman Zaribafiyan 1,2, Maliheh Aramon 1, and Mohammad Naghibi 3 1 1QB Information Technologies (1QBit), Vancouver,
More information