Large scale classification of chemical reactions from patent data
|
|
- Randall Blake
- 5 years ago
- Views:
Transcription
1 Large scale classification of chemical reactions from patent data Gregory Landrum NIBR Informatics, Basel Novartis Institutes for BioMedical Research 10th International Conference on Chemical Structures/ 10th German Conference on Chemoinformatics
2 Outline Public data sources and reactions Fingerprints for reactions Validation: Machine learning Clustering Application: models for predicting yield 2
3 Public data sources in cheminformatics an aside at the beginning Publicly available data sources for small molecules and their biological activities/interactions: 3 PDB, PubChem, ChEMBL, etc. Publicly available data sources for the chemistry behind how those molecules were actually made (i.e. reactions): pretty much nothing until recently Plenty of data locked up in large commercial databases, and pharmaceutical companies ELNs, very very little in the open The public/open point is important for collaboration and reproducibility
4 A large, public source of chemical reactions Not just what we made, but how we made it Text-mining applied to open patent data to extract chemical reactions : 1.12 million reactions [1] Reactions classified using namerxn, when possible, into 318 standard types : > classified reactions [2] [1] Lowe DM: Extraction of chemical structures and reactions from the literature. PhD thesis. University of Cambridge: Cambridge, UK; [2] Reaction classification from Roger Sayle and Daniel Lowe (NextMove Software) 4
5 More about the classes Frequency of reaction classes: 5 20 most common classes: Carboxylic acid + amine reaction Williamson ether synthesis Amide Schotten-Baumann Chloro N-arylation Bromo N-alkylation Nitro to amino Chloro N-alkylation CO2H-Me deprotection N-Boc deprotection CO2H-Et deprotection Aldehyde reductive amination Sulfonamide Schotten-Baumann Separation Bromo Suzuki-type coupling Mitsunobu aryl ether synthesis Methoxy to hydroxy Sonogashira coupling Bromo Suzuki coupling Thioether synthesis Hydroxy to chloro
6 Got the reactions, what about reaction fingerprints? Criteria for them to be useful Question 1: do they contain bits that are helpful in distinguishing reactions from another? Test: can we use them with a machine-learning approach to build a reaction classifier? Question 2: are similar reactions similar with the fingerprints Test: do related reactions cluster together? 6
7 Our toolbox: the RDKit Open-source C++ toolkit for cheminformatics Wrappers for Python (2.x), Java, C# Functionality: 2D and 3D molecular operations Descriptor generation for machine learning PostgreSQL database cartridge for substructure and similarity searching Knime nodes IPython integration Lucene integration (experimental) Supports Mac/Windows/Linux Releases every 6 months business-friendly BSD license Code:
8 Similarity and reactions What are we talking about? These two reactions are both type: Ketone reductive amination 8 It s obvious that these are the same, right?
9 Similarity and reactions What are we talking about? These two reactions are both type: Ketone reductive amination 9 It s obvious that these are the same, right?
10 Got the reactions, what about reaction fingerprints? Start simple: use difference fingerprints: FP Reacts = FP Products = i Reactants i Products FP i FP i FP Rxn = FP Prods FP Reacts Similar idea here: 1) Ridder, L. & Wagener, M. SyGMa: Combining Expert Knowledge and Empirical Scoring in the Prediction of Metabolites. ChemMedChem 3, (2008). 2) Patel, H., Bodkin, M. J., Chen, B. & Gillet, V. J. Knowledge-Based Approach to de NovoDesign Using Reaction Vectors. 10 J. Chem. Inf. Model. 49, (2009).
11 Refine the fingerprints a bit Text-mined reactions often include catalysts, reagents, or solvents in the reactants Explore two options for handling this: 1. Decrease the weight of reactant molecules where too many of the bits are not present in the product fingerprint 2. Decrease the weight of reactant molecules where too many atoms are unmapped 11
12 Are the fingerprints useful? Question 1: do they contain bits that are helpful in distinguishing reactions from another? Test: can we use them with a machine-learning approach to build a reaction classifier? Question 2: are similar reactions similar with the fingerprints Test: do related reactions cluster together? 12
13 Machine learning and chemical reactions Validation set: The 68 reaction types with at least 2000 instances from the patent data set - Resolution reaction types removed (e.g Separation and 11.1 Chiral separation) - Final: 66 reaction types Process: Training set is 200 random instances of each reaction type Test set is 800 random instances of each reaction type Learning: random forest (scikit-learn) 13
14 Learning reaction classes Results for test data Overall: Recall: 0.94 Precision: 0.94 Accuracy: For a 66-class classifier, this looks pretty good!
15 Learning reaction classes Confusion matrix for test data ~94% accuracy much of the confusion is between related types Bromo Suzuki coupling Bromo N-arylation Bromo Suzuki-type coupling 15
16 Are the fingerprints useful? Question 1: do they contain bits that are helpful in distinguishing reactions from another? Test: can we use them with a machine-learning approach to build a reaction classifier? Question 2: are similar reactions similar with the fingerprints Test: do related reactions cluster together? 16
17 Clustering reactions Reaction similarity validation set: The 66 most common reaction types from the patent data set Look at the homogeneity of clusters with at least 10 members Ketone reductive amination Ketone reductive amination Ketone reductive amination Integration Interpretation: <30% of clusters are <90% homogeneous 17 Interpretation: <40% of clusters are <80% homogeneous
18 Using the fingerprints Can we help classify the remaining 600K reactions? Apply the 66 class random forest to generate class predictions for the unclassified compounds in order to find reactions we missed Cluster the unclassified molecules, look for big clusters of unclassified molecules, and (manually) assign classes to them. Both of these approaches have been successful 18
19 Predicting yields The data set includes text-mined yield information as well as calculated yields. For modeling: prefer the text-mined value, but take the calculated one if that s the only thing available Look at stats for the 93 reaction classes that have at least 500 members with yields, a min yield > 0 and a max yield < 110 %: 19
20 Predicting yields Look at the most populated classes: 20
21 Try building models for yield Start with class nitro to amino Break into low-yield (<50%) and high-yield (>70%) classes. 14% are low-yield 21
22 Try building models for yield things that don t work Try building a random forest using the atom-pair based reaction fingerprints That s performance on the training set 22
23 Try building models for yield things that don t work Try building a random forest using the atom-pair based reactant fingerprints That s performance on the training set 23
24 Try building models for yield things that don t work? Look at the ROC curve for the training-set data nine wrong low-yield predictions first wrong low-yield prediction The model is doing a great job of ordering compounds, but a bad job of classifying compounds 24
25 Unbalanced data and ensemble classifiers an aside Usual decision rule for a two-class ensemble classifier: take the result that the the majority of the models (decision trees for random forests) vote for. That s a decision boundary = 0.5 If the dataset is unbalanced, why should we expect balanced behavior from the classifier? Idea: use the composition of the training set to decide what the decision boundary should be. For example: if the data set is ~20% low yield, then assign low yield to any example where at least 20% of the trees say low yield 25
26 Try building models for yield Getting close to working Try building a random forest using the atom-pair based reactant fingerprints That s performance on the training set What about moving the decision boundary to 0.2 to reflect the unbalanced data set? 26 Starting to look ok. What about the test set?
27 Try building models for yield Getting close to working Results from a random forest using the atom-pair based reactant fingerprints with the shifted decision boundary test set Not too terrible. 27
28 Try building models for yield Some more models Aldehyde reductive amination (no shift): test set Williamson ether synthesis (boundary 0.3) test set 28
29 Try building models for yield Some more models Chloro N-Alkylation (no shift): test set Chloro N-Alkylation (0.4 shift) test set 29
30 Wrapping up Dataset: 1+ million reactions text mined from patents (publically available) with reaction classes assigned Fingerprints: weighted atom-pair delta and functionalgroup delta fingerprints implemented using the RDKit Fingerprint Validation: 30 Multiclass random-forest classifier ~94% accurate Similarity measure works: similar reactions cluster together Combination of clustering + functional group analysis allows identification of new reaction classes We re also able to use the fingerprints to build reasonable models for yield
31 Acknowledgements NextMove Software: Roger Sayle Daniel Lowe NIBR: Anna Pelliccioli Sereina Riniker Mike Tarselli 31
32 Advertising 3 rd RDKit User Group Meeting October 2014 Merck KGaA, Darmstadt, Germany Talks, talktorials, lightning talks, social activities, and a hackathon on the 24 th. Registration: Full announcement: We re looking for speakers. Please contact greg.landrum@gmail.com 32
Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification
Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification Roger Sayle and daniel lowe NextMove Software, Cambridge, UK Overview Electronic Lab Notebooks (ELNs)
More informationNavigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland
Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures
More informationAnalyzing Success Rates of Supposedly Easy Reactions
Analyzing Success Rates of Supposedly Easy Reactions Roger Sayle and Daniel Lowe NextMove Software, Cambridge, UK Symposium overview This symposium is entitled Retrosynthesis, synthesis planning, reaction
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationVisualization and manipulation of Matched Molecular Series for decision support
250 th ACS National Meeting, Boston 16 th Aug 2015 Visualization and manipulation of Matched Molecular Series for decision support Noel O Boyle and Roger Sayle NextMove Software Matched (Molecular) Pairs
More informationFROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES
FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES DIFFERENT LEVELS OF KNOWLEDGE REPRESENTATION IN CHEMISTRY Michael Braden, PhD ACS / San Diego/ 2016 Overview ChemAxon Who are we? Examples/use cases: Create
More informationIn silico generation of novel, drug-like chemical matter using the LSTM deep neural network
In silico generation of novel, drug-like chemical matter using the LSTM deep neural network Peter Ertl Novartis Institutes for BioMedical Research, Basel, CH September 2018 Neural networks in cheminformatics
More informationHow to add your reactions to generate a Chemistry Space in KNIME
How to add your reactions to generate a Chemistry Space in KNIME Introduction to CoLibri This tutorial is supposed to show how normal drawings of reactions can be easily edited to yield precise reaction
More informationRapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value
Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value Anthony Arvanites Daylight User Group Meeting March 10, 2005 Outline 1. Company Introduction
More informationChemoinformatics and information management. Peter Willett, University of Sheffield, UK
Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics
More informationOECD QSAR Toolbox v.3.4
OECD QSAR Toolbox v.3.4 Predicting developmental and reproductive toxicity of Diuron (CAS 330-54-1) based on DART categorization tool and DART SAR model Outlook Background Objectives The exercise Workflow
More informationChemical Data Retrieval and Management
Chemical Data Retrieval and Management ChEMBL, ChEBI, and the Chemistry Development Kit Stephan A. Beisken What is EMBL-EBI? Part of the European Molecular Biology Laboratory International, non-profit
More informationMolecular Graphics. Molecular Graphics Expt. 1 1
Molecular Graphics Expt. 1 1 Molecular Graphics The study of organic chemistry has for more than a century and a half focussed on the relationship between the structure of an organic molecule (its three-dimensional
More informationQSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov
QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov CADD Group Chemical Biology Laboratory Frederick National Laboratory for Cancer Research National Cancer Institute, National Institutes
More informationExtracting Knowledge from Reaction Databases: Developments from InfoChem
1 Extracting Knowledge from Reaction Databases: Developments from InfoChem CICAG meeting 3rd July 2013 Stephanie North, Allyl Consulting Ltd, representing InfoChem in the UK H. Kraut, H. Matuszczyk, H.
More informationc. Oxidizing agent shown here oxidizes 2º alcohols to ketones and 1º alcohols to carboxylic acids. 3º alcohols DO NOT REACT.
Exam 1 (Ch 17 and Review of CEM 331) Answer Key: 1. ne-step Questions: You need to know reagents for reagent arrows and to be able to draw products. I know a lot of them seem to look alike its your job
More informationcheminformatics toolkits: a personal perspective
cheminformatics toolkits: a personal perspective Roger Sayle Nextmove software ltd Cambridge uk 1 st RDKit User Group Meeting, London, UK, 4 th October 2012 overview Models of Chemistry Implicit and Explicit
More informationIntroduction to Chemoinformatics
Introduction to Chemoinformatics Dr. Igor V. Tetko Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU) Kyiv, 10 August
More informationChemistry Informatics in Academic Laboratories: Lessons Learned
Chemistry Informatics in Academic Laboratories: Lessons Learned Michael Hudock Center for Biophysics & Computational Biology University of Illinois at Urbana-Champaign My Background Ph.D. candidate, Biophysics
More informationLarge Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.
Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion
More informationA first model of learning
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) We observe the data where each Suppose we are given an ensemble of possible hypotheses / classifiers
More informationManual for a computer class in ML
Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the
More informationData Mining in the Chemical Industry. Overview of presentation
Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry
More informationPredictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using
More informationUsing NMR and IR Spectroscopy to Determine Structures Dr. Carl Hoeger, UCSD
Using NMR and IR Spectroscopy to Determine Structures Dr. Carl Hoeger, UCSD The following guidelines should be helpful in assigning a structure from NMR (both PMR and CMR) and IR data. At the end of this
More informationBasic Techniques in Structure and Substructure
Truncating Molecules Basic Techniques in Structure and Substructure Searching for Information Professionals Judith Currano Head, Chemistry Library University of Pennsylvania currano@pobox.upenn.edu Acknowledgements
More informationThe Schrödinger KNIME extensions
The Schrödinger KNIME extensions Computational Chemistry and Cheminformatics in a workflow environment Jean-Christophe Mozziconacci Volker Eyrich Topics What are the Schrödinger extensions? Workflow application
More informationElectrical and Computer Engineering Department University of Waterloo Canada
Predicting a Biological Response of Molecules from Their Chemical Properties Using Diverse and Optimized Ensembles of Stochastic Gradient Boosting Machine By Tarek Abdunabi and Otman Basir Electrical and
More informationMore information can be found in Chapter 12 in your textbook for CHEM 3750/ 3770 and on pages in your laboratory manual.
CHEM 3780 rganic Chemistry II Infrared Spectroscopy and Mass Spectrometry Review More information can be found in Chapter 12 in your textbook for CHEM 3750/ 3770 and on pages 13-28 in your laboratory manual.
More informationComputational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007
Computational Chemistry in Drug Design Xavier Fradera Barcelona, 17/4/2007 verview Introduction and background Drug Design Cycle Computational methods Chemoinformatics Ligand Based Methods Structure Based
More informationContents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics
Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics... 1 1.1 Chemoinformatics... 2 1.1.1 Open-Source Tools... 2 1.1.2 Introduction to Programming Languages... 3 1.2 Chemical Structure
More informationBuilding blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction
Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction Stefan Kuhn 1, Björn Egert 2, Steffen Neumann 2, Christoph Steinbeck 1European Bioinformatics Institute
More informationBasic Organic Chemistry Nomenclature CHEM 104 B
Basic Organic Chemistry Nomenclature CHEM 104 B I have gone ahead and compiled all of the basic naming rules that we will be dealing with into one worksheet. I hope this will be helpful to you as you work
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationIntroduction to Spark
1 As you become familiar or continue to explore the Cresset technology and software applications, we encourage you to look through the user manual. This is accessible from the Help menu. However, don t
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationMirabilis 2.0. Lhasa Limited vicgm. 11 January Martin Ott
Mirabilis 2.0 Lhasa Limited vicgm 11 January 2017 Martin Ott Overview Introduction What is Mirabilis? Impurities Purge parameters and factors Scientific prioritisation Mirabilis reactivity knowledge matrix
More informationLecture 4 Chapter 13 - Polymers. Functional Groups Condensation Rxns Free Radical Rxns
Lecture 4 Chapter 13 - Polymers Functional Groups Condensation Rxns Free Radical Rxns Chemistry the whole year on one page Last semester Basic atomic theory Stoichiometry, balancing reactions Thermodynamics
More informationPredicting flight on-time performance
1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.
More informationStructure-Activity Modeling - QSAR. Uwe Koch
Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity
More informationC.-A. Azencott, M. A. Kayala, and P. Baldi. Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences
Learning Scoring Functions for Chemical Expert Systems C.-A. Azencott, M. A. Kayala, and P. Baldi Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences 237th
More informationLoudon Chapter 19 Review: Aldehydes and Ketones CHEM 3331, Jacquie Richardson, Fall Page 1
Loudon Chapter 19 eview: Aldehydes and Ketones CEM 3331, Jacquie ichardson, Fall 2010 - Page 1 Beginning with this chapter, we re looking at a very important functional group: the carbonyl. We ve seen
More informationAdministrative notes. Computational Thinking ct.cs.ubc.ca
Administrative notes Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) Clicker grades should be on-line now Administrative notes March
More informationQuantum Classification of Malware. John Seymour
Quantum Classification of Malware John Seymour (seymour1@umbc.edu) 2015-08-09 whoami Ph.D. student at the University of Maryland, Baltimore County (UMBC) Actively studying/researching infosec for about
More informationWhen we deprotonate we generate enolates or enols. Mechanism for deprotonation: Resonance form of the anion:
Lecture 5 Carbonyl Chemistry III September 26, 2013 Ketone substrates form tertiary alcohol products, and aldehyde substrates form secondary alcohol products. The second step (treatment with aqueous acid)
More informationIdentifying Functional Groups. Why is this necessary? Alkanes. Why is this so important? What is a functional group? 2/1/16
Identifying Functional Groups The Key to Survival Why is this so important? ver and over again, you will be asked to do reactions, the details to which you will receive in lecture and via your textbook.
More informationChemSpider Reactions: Delivering a free community resource of chemical syntheses
ChemSpider Reactions: Delivering a free community resource of chemical syntheses Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken Karapetyan, David Sharpe and Antony Williams ACS New Orleans April 2013
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationIgnasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015
Ignasi Belda, PhD CEO HPC Advisory Council Spain Conference 2015 Business lines Molecular Modeling Services We carry out computational chemistry projects using our selfdeveloped and third party technologies
More informationFarewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology
Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Serge P. Parel, PhD ChemAxon User Group Meeting, Budapest 21 st May, 2014 Outline Exquiron Who
More informationCSD. Unlock value from crystal structure information in the CSD
CSD CSD-System Unlock value from crystal structure information in the CSD The Cambridge Structural Database (CSD) is the world s most comprehensive and up-todate knowledge base of crystal structure data,
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationWhen planning an organic synthesis there are usually different questions that one must ask.
CE 132 Work Shop Exercise Strategies for rganic Synthesis ne of the things that makes chemistry unique among the sciences is synthesis. Chemists make things. New pharmaceuticals, food additives, materials,
More informationDAMIETTA UNIVERSITY. Energy Diagram of One-Step Exothermic Reaction
DAMIETTA UNIVERSITY CHEM-103: BASIC ORGANIC CHEMISTRY LECTURE 5 Dr Ali El-Agamey 1 Energy Diagram of One-Step Exothermic Reaction The vertical axis in this graph represents the potential energy. The transition
More informationA Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors
A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.
More informationCheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS
Cheminformatics Role in Pharmaceutical Industry Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS Agenda The big picture for pharmaceutical industry Current technological/scientific issues Types
More informationKNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös
KIME-based scoring functions in Muse 3.0 KIME User Group Meeting 2013 Fabian Bös Certara Mission: End-to-End Model-Based Drug Development Certara was formed by acquiring and integrating Tripos, Pharsight,
More informationStrategies for Organic Synthesis
Strategies for rganic Synthesis ne of the things that makes chemistry unique among the sciences is synthesis. Chemists make things. New pharmaceuticals, food additives, materials, agricultural chemicals,
More informationChemical Reactions and Enzymes. (Pages 49-59)
Chemical Reactions and Enzymes (Pages 49-59) Chemical Reactions Chemistry of Life Not just what life is made of. What life does! Chemical Reactions Chemistry of Life Not just what life is made of. What
More informationOECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined
OECD QSAR Toolbox v.3.3 Step-by-step example of how to build a userdefined QSAR Background Objectives The exercise Workflow of the exercise Outlook 2 Background This is a step-by-step presentation designed
More informationCondensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule
Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule Frank Hoonakker 1,3, Nicolas Lachiche 2, Alexandre Varnek 3, and Alain Wagner 3,4 1 Chemoinformatics laboratory,
More informationJoana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)
Joana Pereira Lamzin Group EMBL Hamburg, Germany Small molecules How to identify and build them (with ARP/wARP) The task at hand To find ligand density and build it! Fitting a ligand We have: electron
More informationAn Integrated Approach to in-silico
An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals
More informationPatent Searching using Bayesian Statistics
Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?
More informationAssessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods
J. Chem. Inf. Model. 2010, 50, 979 991 979 Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods Yevgeniy Podolyan, Michael A. Walters, and George Karypis*, Department
More informationCDK & Mass Spectrometry
CDK & Mass Spectrometry October 3, 2011 1/18 Stephan Beisken October 3, 2011 EBI is an outstation of the European Molecular Biology Laboratory. Chemistry Development Kit (CDK) An Open Source Java TM Library
More informationDATA ANALYTICS IN NANOMATERIALS DISCOVERY
DATA ANALYTICS IN NANOMATERIALS DISCOVERY Michael Fernandez OCE-Postdoctoral Fellow September 2016 www.data61.csiro.au Materials Discovery Process Materials Genome Project Integrating computational methods
More informationExpanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit
Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit Alfonso Pozzan Computational and Analytical Chemistry Drug Design and Discovery Department
More informationEarly Stages of Drug Discovery in the Pharmaceutical Industry
Early Stages of Drug Discovery in the Pharmaceutical Industry Daniel Seeliger / Jan Kriegl, Discovery Research, Boehringer Ingelheim September 29, 2016 Historical Drug Discovery From Accidential Discovery
More informationBridging the Dimensions:
Bridging the Dimensions: Seamless Integration of 3D Structure-based Design and 2D Structure-activity Relationships to Guide Medicinal Chemistry ACS Spring National Meeting. COMP, March 13 th 2016 Marcus
More informationFREQUENTLY ASKED QUESTIONS ABOUT SINGLE AND DOUBLE UNKNOWN ANALYSES
FREQUENTLY ASKED QUESTIONS ABOUT SINGLE AND DOUBLE UNKNOWN ANALYSES TABLE OF CONTENTS 1. ON PREPARATION FOR THE EXPERIMENTS 2 2. ON PHYSICAL PROPERTIES..3 3. ON SOLUBILITY TESTS USING ACID-BASE CHEMISTRY...4
More informationChemical Space: Modeling Exploration & Understanding
verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK
More informationFIRST EXAMINATION. Name: CHM 332
ame: CM 332 FIRST EXAMIATI All answers should be written on the exam in the spaces provided. Clearly indicate your answers in the spaces provided; if I have to guess as to what or where your answer is,
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationChemists are from Mars, Biologists from Venus. Originally published 7th November 2006
Chemists are from Mars, Biologists from Venus Originally published 7th November 2006 Chemists are from Mars, Biologists from Venus Andrew Lemon and Ted Hawkins, The Edge Software Consultancy Ltd Abstract
More informationDANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence
DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More informationEASTERN ARIZONA COLLEGE General Chemistry II
EASTERN ARIZONA COLLEGE General Chemistry II Course Design 2013-2014 Course Information Division Science Course Number CHM 152 (SUN# CHM 1152) Title General Chemistry II Credits 4 Developed by Phil McBride,
More informationSuggested solutions for Chapter 14
s for Chapter 14 14 PRBLEM 1 Are these molecules chiral? Draw diagrams to justify your answer. 2 C 2 C Reinforcement of the very important criterion for chirality. Make sure you understand the answer.
More informationFARMINGDALE STATE COLLEGE DEPARTMENT OF CHEMISTRY. CONTACT HOURS: Lecture: 3 Laboratory: 4
FARMINGDALE STATE COLLEGE DEPARTMENT OF CHEMISTRY COURSE OUTLINE: COURSE TITLE: Prepared by: Dr. M. DeCastro September 2011 Organic Chemistry II COURSE NUMBER: CHM 271 CREDITS: 5 CONTACT HOURS: Lecture:
More informationOECD QSAR Toolbox v.3.0
OECD QSAR Toolbox v.3.0 Step-by-step example of how to categorize an inventory by mechanistic behaviour of the chemicals which it consists Background Objectives Specific Aims Trend analysis The exercise
More informationFunctional Group Fingerprints CNS Chemistry Wilmington, USA
Functional Group Fingerprints CS Chemistry Wilmington, USA James R. Arnold Charles L. Lerman William F. Michne James R. Damewood American Chemical Society ational Meeting August, 2004 Philadelphia, PA
More informationThe Molecule Cloud - compact visualization of large collections of molecules
Ertl and Rohde Journal of Cheminformatics 2012, 4:12 METHODOLOGY Open Access The Molecule Cloud - compact visualization of large collections of molecules Peter Ertl * and Bernhard Rohde Abstract Background:
More informationWiley ChemPlanner predicts experimentally verified synthesis routes in medicinal chemistry
Wiley ChemPlanner predicts experimentally verified synthesis routes in medicinal chemistry Simone-Alexandra Stark, University of Regensburg Reinhard Neudert, Wiley Richard Threlfall, Wiley Wiley ChemPlanner
More information10. Amines (text )
2009, Department of Chemistry, The University of Western Ontario 10.1 10. Amines (text 10.1 10.6) A. Structure and omenclature Amines are derivatives of ammonia (H 3 ), where one or more H atoms has been
More informationDivCalc: A Utility for Diversity Analysis and Compound Sampling
Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,
More informationSimilarity Search. Uwe Koch
Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationSolved and Unsolved Problems in Chemoinformatics
Solved and Unsolved Problems in Chemoinformatics Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nürnberg D-91052 Erlangen, Germany Johann.Gasteiger@fau.de Overview objectives of lecture
More informationChemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility
Chemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility Dr. Wendy A. Warr http://www.warr.com Warr, W. A. A Short Review of Chemical Reaction Database Systems,
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationIdentification of functional groups in the unknown Will take in lab today
Qualitative Analysis of Unknown Compounds 1. Infrared Spectroscopy Identification of functional groups in the unknown Will take in lab today 2. Elemental Analysis Determination of the Empirical Formula
More informationLoudon Chapter 23 Review: Amines Jacquie Richardson, CU Boulder Last updated 4/22/2018
This chapter is about the chemistry of nitrogen. We ve seen it before in several places, but now we can look at several reactions that are specific to nitrogen. Amines can be subdivided based on how many
More information