The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database

Size: px
Start display at page:

Download "The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database"

Transcription

1 Supporting Information The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database Richard J. Hall, Christopher W. Murray and Marcel L. Verdonk. Contents This supporting information contains a detailed description of the algorithm to generate the nodes and edges of the fragment network. The document also contains details on inserting nodes and edges into a Neo4j graph database and some example network queries. The additional supporting information contains data that can be inserted into a Neo4j graph database to create an illustrative Fragment Network that is based around the 4-hydroxy-biphenyl example presented in Figure 2 of the manuscript. S1

2 Generating Nodes and Edges for the Fragment Network To generate a set of nodes and edges for a molecule, we first generate a node that represents the molecule itself. The molecule is represented as a nonisomeric smiles string. As well as the smiles string, the heavy atom count (HAC) and ring atom count (RAC) are stored as node attributes, as is a simplified representation of the molecular graph. This simplified graph representation uses the daylight function dt_molgraph to set all bond orders to one, remove aromaticity, set the hydrogen count, remove charges and set masses to zero. Additionally we set the element type of every ring atom to carbon. For 4-hydroxy-biphenyl, this first node has the following data {SMILES: "Oc1ccc(cc1)c2ccccc2", HAC: 13, RAC: 12 RINGSMILES: "OC1CCC(CC1)C2CCCCC2"} (1) Next, a set of molecular components is generated. The SMARTS pattern "[*;R]-;!@[*]" is used to find single acyclic bonds to a ring atom. For each matching bond b, the start and end atom are determined. The bond b is deleted and isotopically labelled xenon atoms are bonded to the start and end atom. The isotopic label is incremented for each matching bond, to provide unique labelling. The components for 4-hydroxy-biphenyl are ['O[100Xe]', '[101Xe]c1ccccc1', '[100Xe]c1ccc([101Xe])cc1'] (2) The bond between the hydroxyl and the phenyl ring is broken and the oxygen and carbon atoms are attached to xenon atoms with the isotopic label 100. The bond between the phenyl rings is broken and the carbon atoms are attached to xenon atoms with the isotopic label 101. Next, this set of molecular components is combined to generate child nodes. Each child node is created by leaving out one of the components. For example, leaving out the O[100Xe] component from (2), we create a child node by considering the two phenyl ring components. These components are combined by locating pairs of matching xenon atom labels. The two phenyl ring components contain one pair of 101Xe labels, as well as an unmatched 100Xe label. A bond is added between the atoms attached to the paired xenon atoms. The bonds to the paired xenon atoms and the xenon atoms themselves are then deleted. The rebuilt molecule will contain the unmatched xenon atoms that indicate the attachment point(s) of the excluded component. In the example there is a single xenon atom that shows the location of the excluded hydroxyl component. If the excluded component is a ring or linker the rebuilt molecule will contain multiple xenon atoms and will be disconnected. We generate a smiles string for the child node by replacing the remaining xenon atom with hydrogen. For the components in (2), the combinations are given in table S1. Exclude Leaving Rebuilt As Child Node [100Xe]c1ccc([101Xe])cc1 ['O[100Xe]', '[101Xe]c1ccccc1'] O[Xe].[Xe]c1ccccc1 O.c1ccccc1 [101Xe]c1ccccc1' ['O[100Xe]', Oc1ccc([Xe])cc1 Oc1ccccc1 '[100Xe]c1ccc([101Xe])cc1'] O[100Xe] ['[101Xe]c1ccccc1', '[100Xe]c1ccc([101Xe])cc1'] [Xe]c1ccc(cc1)c2ccccc2 c1ccc(cc1)c2ccccc2 Table S1. The set of child nodes created from components of 4-hydroxy-biphenyl An edge is created to join each child node to the parent. Each edge is labelled with a set of attributes relating to the parent and child nodes. For example, the edge that links 4-hydroxybiphenyl to biphenyl has the following attributes the type of the excluded component: 'FG' (we used to refer to substituents as functional groups) the type of the rebuilt combination: 'RING' (unused) the nonisomeric smiles of the excluded component: O[Xe] S2

3 the nonisomeric smiles of the rebuilt molecule: [Xe]c1ccc(cc1)c2ccccc2 the simplified graph of the excluded component: O[Xe] the simplified graph of the rebuilt molecule: [Xe]C1CCC(CC1)C2CCCCC2 As with the node attribute, the simplified graph representation for the edge uses the daylight function dt_molgraph to set all bond orders to one, remove aromaticity, set the hydrogen count, remove charges and set masses to zero. Additionally we set every ring atom element to carbon. This means that our simplified graph will be the same for all rings of the same size, irrespective of the aromaticity or the heteroatomic composition of the original component. We can also define equivalencies between rings of different type using our simplified graph representation. This allows us to treat eg a 1,4 substituted 6 membered ring as equivalent to a 1,3 substituted 5 membered ring. These equivalencies are derived from a set of common ring scaffolds in the Astex registry and ChEMBL. When we search the graph, if the excluded component is marked as type RING, we can use our ring equivalence dictionary to match rings of different sizes. For compounds with more than two substituents, the order of the substituents in the ring equivalence dictionary is relevant. For this reason, we add canonicalized isotopic labels to the simplified graph. The canonicalization uses the lexicographical ordering of the smiles pattern for the excluded components. Figure S1 illustrates this. All compounds have common substituents, methyl, fluoro and chloro. Removing the ring system from compounds a-d means that all compounds will have an edge that joins to the node C.Cl.F. Compounds a and b share the same simplified graph for the excluded ring component and can be grouped as replace ring, equivalent vectors. Compound c has a different arrangement of substituents and will have a different simplified graph. This means compound c will be placed in the replace ring, different vectors group. Compound d has a different ring system but the same arrangement of substituents. We can therefore add an equivalency rule between the simplified graph of ring d and the simplified graph of rings a and b that will allow compound d to also be classified as replace ring, equivalent vectors. (a) (b) (c) (d) Figure S1. Related compounds with equivalent and non-equivalent simplified graphs Once the edge data has been recorded, each new node is itself used as input for the algorithm to recursively create additional nodes and edges. Note that the algorithm keeps track of the set of nodes that have been generated. If a node is already in the set the algorithm returns immediately without recomputing child nodes and edges (since these will have been generated in a prior iteration). This provides a significant reduction in the amount of work required to build nodes and edges for a large database. We apply additional rules to deal with the special case of ring-ring bonds and spiro systems. For each ring ring bond in the compound, we add xenon to the bond atoms and delete the bond. For 4- hydroxy-biphenyl, we generate Oc1ccc([Xe])cc1.[Xe]c1ccccc1 S3

4 The xenon atoms are replaced with hydrogen to give the child node Oc1ccccc1.c1ccccc1 An edge between this child node and the parent is added with the following attributes: the type of the excluded component: 'FG' the type of the rebuilt combination: 'RING' (unused) the nonisomeric smiles of the excluded component: [Xe] (indicates a zero length linker) the nonisomeric smiles of the rebuilt molecule: Oc1ccccc1.c1ccccc1 the simplified graph of the excluded component: [Xe] the simplified graph of the rebuilt molecule: OC1CCCCC1.C1CCCCC1 The child node is then passed into the node and edge generating algorithm. A similar approach is used to break spiro ring systems. One can envisage how the algorithm might be extended to deconstruct fused ring systems, although we have not chosen to do so in this implementation. Another suggested enhancement is to treat cyclopropyl rings as both a ring and a substituent. By modifying the smarts pattern that finds ring bonds we could exclude cuts at a cyclopropyl ring. An attribute label will be added to the Oc1ccc(cc1)c2ccccc2 node, to indicate that this compound has the identifier in the EM (emolecules) database. Additional attributes are added to this node, for example a label to mark this node as ChEMBL record ATTR Oc1ccc(cc1)c2ccccc2 EM ATTR Oc1ccc(cc1)c2ccccc2 CHEMBL A complete set of nodes and edges and attributes for 4-hydroxy-biphenyl is listed below. NODE Oc1ccc(cc1)c2ccccc OC1CCC(CC1)C2CCCCC2 0 NODE O.c1ccccc1 7 6 O.C1CCCCC1 1 NODE O 1 0 O 2 EDGE O.c1ccccc1 O RING c1ccccc1 C1CCCCC1 FG O O NODE c1ccccc1 6 6 C1CCCCC1 2 EDGE O.c1ccccc1 c1ccccc1 FG O O RING c1ccccc1 C1CCCCC1 EDGE Oc1ccc(cc1)c2ccccc2 O.c1ccccc1 RING [Xe]c1ccc([Xe])cc1 [100Xe]C1CCC([101Xe])CC1 RING O[Xe].[Xe]c1ccccc1 O[100Xe].[Xe]C1CCCCC1 NODE Oc1ccccc1 7 6 OC1CCCCC1 1 EDGE Oc1ccccc1 O RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 FG O[Xe] O[Xe] EDGE Oc1ccccc1 c1ccccc1 FG O[Xe] O[Xe] RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 EDGE Oc1ccc(cc1)c2ccccc2 Oc1ccccc1 RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 RING Oc1ccc([Xe])cc1 OC1CCC([Xe])CC1 NODE c1ccc(cc1)c2ccccc C1CCC(CC1)C2CCCCC2 1 EDGE c1ccc(cc1)c2ccccc2 c1ccccc1 FG [Xe] [Xe] RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 EDGE c1ccc(cc1)c2ccccc2 c1ccccc1 FG [Xe] [Xe] RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 NODE c1ccccc1.c1ccccc C1CCCCC1.C1CCCCC1 2 EDGE c1ccccc1.c1ccccc1 c1ccccc1 FG [Xe] [Xe] RING c1ccccc1 C1CCCCC1 EDGE c1ccccc1.c1ccccc1 c1ccccc1 FG [Xe] [Xe] RING c1ccccc1 C1CCCCC1 EDGE c1ccc(cc1)c2ccccc2 c1ccccc1.c1ccccc1 FG [Xe] [Xe] RING c1ccccc1.c1ccccc1 C1CCCCC1.C1CCCCC1 EDGE Oc1ccc(cc1)c2ccccc2 c1ccc(cc1)c2ccccc2 FG O[Xe] O[Xe] RING [Xe]c1ccc(cc1)c2ccccc2 [100Xe]C1CCC(CC1)C2CCCCC2 NODE Oc1ccccc1.c1ccccc OC1CCCCC1.C1CCCCC1 1 EDGE Oc1ccccc1.c1ccccc1 Oc1ccccc1 RING c1ccccc1 C1CCCCC1 RING Oc1ccccc1 OC1CCCCC1 EDGE Oc1ccccc1.c1ccccc1 O.c1ccccc1 RING [Xe]c1ccccc1 [100Xe]C1CCCCC1 RING O[Xe].c1ccccc1 O[100Xe].C1CCCCC1 EDGE Oc1ccccc1.c1ccccc1 c1ccccc1.c1ccccc1 FG O[Xe] O[Xe] RING [Xe]c1ccccc1.c1ccccc1 [100Xe]C1CCCCC1.C1CCCCC1 S4

5 EDGE Oc1ccc(cc1)c2ccccc2 Oc1ccccc1.c1ccccc1 FG [Xe] [Xe] RING Oc1ccccc1.c1ccccc1 OC1CCCCC1.C1CCCCC1 ATTR Oc1ccc(cc1)c2ccccc2 EM Loading Data into Neo4j A set of nodes, edges and attributes for the compounds that are listed as results in Figure 2 can be found in the supporting information files jm7b00809_si_002.txt (nodes), jm7b00809_si_003.txt (edges), jm7b00809_si_004.txt (attributes). These data files can be loaded into a Neo4j graph database (our implementation uses Neo4j vesion 2.1.6). Neo4j has a SQL like syntax called cypher. The following cypher commands can be used to load the nodes, edges and attribute into the graph database. USING PERIODIC COMMIT LOAD CSV FROM 'file:///jm7b00809_si_002.txt ' AS line FIELDTERMINATOR ' ' MERGE (:F2 { smiles: line[1], hac: toint(line[2]), chac: toint(line[3]), osmiles: line[4]}); USING PERIODIC COMMIT LOAD CSV FROM 'file:///jm7b00809_si_003.txt ' AS line FIELDTERMINATOR ' ' MATCH (n1:f2 { smiles: line[1]}), (n2:f2 { smiles: line[2]}) MERGE (n1)-[:f2edge{label:line[3]}]- >(n2); USING PERIODIC COMMIT LOAD CSV FROM 'file:///jm7b00809_si_004.txt ' AS line FIELDTERMINATOR ' ' MATCH (n:f2 { smiles: line[1]} ) set n:mol, n:em, n.em=toint(line[3]); Querying Neo4j The cypher query used to find and categorise 'medium' paths of length one from 4-hydroxy-biphenyl to a commercially available compound is match p = (n:f2{smiles:'oc1ccc(cc1)c2ccccc2'})-[nm]-(m:em) where abs(n.hac-m.hac) <= 3 and abs(n.chac-m.chac) <= 1 return split(nm.label, ' ')[4], split(nm.label, ' ')[1], nm.label, m.hac-n.hac, m.smiles, m.em order by split(nm.label, ' ')[4]; This query is somewhat complicated by the decision to store edge metadata as a single attribute. The metadata is pipe separated and needs to be split to construct the query. Anyone wishing to implement a similar network might choose to store each piece of edge metadata as a separate attribute. Our current implementation splits and groups the paths using python code after the paths are returned from Neo4j - the cypher query is for information only. By grouping the matches on the change in atom count and selected attributes of the edge label, we can classify the results as deletions or additions at a specific position. To sort these groups, we use the values from a lookup dictionary that is keyed on the substituent attribute of the edge label. Any substituent that is not in the dictionary will be given a weight of zero. Any ties are broken using the the heavy atom count and the lexicographical sort order of the smiles string. A similar cypher query can be used to find and categorise medium paths of length two between 4- hydroxy-biphenyl and a commercially available compound MATCH (sta:f2 {smiles:"oc1ccc(cc1)c2ccccc2"})-[n4:f2edge]-(n3:f2)-[n2:f2edge]-(end:em) where S5

6 abs(sta.hac-end.hac) <= 3 and abs(sta.chac-end.chac) <= 1 and sta.smiles <> end.smiles RETURN split(n4.label, ' ')[4], split(n4.label, ' ')[2], split(n2.label, ' ')[2], split(n2.label, ' ')[1], end.em, end.smiles order by split(n4.label, ' ')[4], split(n2.label, ' ')[2]; The first column in the output can be used to group similar transformations. The second and third columns can be used to compare equivalencies between simplified graphs. The fourth column lists the replacement. The fifth and sixth columns are the identifier and the smiles string of the related compound. Again the query is for illustration purposes; our implementation splits the edge metadata after the query results are returned. S6

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009 ICM-Chemist How-To Guide Version 3.6-1g Last Updated 12/01/2009 ICM-Chemist HOW TO IMPORT, SKETCH AND EDIT CHEMICALS How to access the ICM Molecular Editor. 1. Click here 2. Start sketching How to sketch

More information

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING 1 NOTES ON REAXYS R201 THIS PRESENTATION COMMENTS AND SUMMARY Outlines how to: a. Perform Substructure and Similarity searches b. Use the functions

More information

Table of Contents. Scope of the Database 3 Searching by Structure 3. Searching by Substructure 4. Searching by Text 11

Table of Contents. Scope of the Database 3 Searching by Structure 3. Searching by Substructure 4. Searching by Text 11 Searrcchiing fforr Subssttanccess and Reaccttiionss iin Beiillsstteiin and Gmelliin 1 Table of Contents Scope of the Database 3 Searching by Structure 3 Introduction to the Structure Editor 3 Searching

More information

Aliphatic Hydrocarbons Anthracite alkanes arene alkenes aromatic compounds alkyl group asymmetric carbon Alkynes benzene 1a

Aliphatic Hydrocarbons Anthracite alkanes arene alkenes aromatic compounds alkyl group asymmetric carbon Alkynes benzene 1a Aliphatic Hydrocarbons Anthracite alkanes arene alkenes aromatic compounds alkyl group asymmetric carbon Alkynes benzene 1a Hard coal, which is high in carbon content any straight-chain or branched-chain

More information

Chuck Cartledge, PhD. 21 January 2018

Chuck Cartledge, PhD. 21 January 2018 Big Data: Data Analysis Boot Camp Non-SQL and R Chuck Cartledge, PhD 21 January 2018 1/19 Table of contents (1 of 1) 1 Intro. 2 Non-SQL DBMS Classic Non-SQL databases 3 Hands-on Airport connections as

More information

ORGANIC CHEMISTRY. Classification of organic compounds

ORGANIC CHEMISTRY. Classification of organic compounds ORGANIC CHEMISTRY Organic chemistry is very important branch of chemistry and it study the compounds which contain carbon (C) and hydrogen (H), in general, and may contains other atoms such as oxygen (O),

More information

file:///biology Exploring Life/BiologyExploringLife04/

file:///biology Exploring Life/BiologyExploringLife04/ Objectives Identify carbon skeletons and functional groups in organic molecules. Relate monomers and polymers. Describe the processes of building and breaking polymers. Key Terms organic molecule inorganic

More information

Assignment 1: Molecular Mechanics (PART 1 25 points)

Assignment 1: Molecular Mechanics (PART 1 25 points) Chemistry 380.37 Fall 2015 Dr. Jean M. Standard August 19, 2015 Assignment 1: Molecular Mechanics (PART 1 25 points) In this assignment, you will perform some molecular mechanics calculations using the

More information

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal Representation of molecular structures Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal A hierarchy of structure representations Name (S)-Tryptophan 2D Structure 3D Structure Molecular

More information

Dictionary of ligands

Dictionary of ligands Dictionary of ligands Some of the web and other resources Small molecules DrugBank: http://www.drugbank.ca/ ZINC: http://zinc.docking.org/index.shtml PRODRUG: http://www.compbio.dundee.ac.uk/web_servers/prodrg_down.html

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

POC via CHEMnetBASE for Identifying Unknowns

POC via CHEMnetBASE for Identifying Unknowns Table of Contents A red arrow is used to identify where buttons and functions are located in CHEMnetBASE. Figure Description Page Entering the Properties of Organic Compounds (POC) Database 1 CHEMnetBASE

More information

Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples

Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples Greeshma Neglur 1,RobertL.Grossman 2, and Bing Liu 3 1 Laboratory for Advanced Computing, University

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

POC via CHEMnetBASE for Identifying Unknowns

POC via CHEMnetBASE for Identifying Unknowns Table of Contents A red arrow was used to identify where buttons and functions are located in CHEMnetBASE. Figure Description Page Entering the Properties of Organic Compounds (POC) Database 1 Swain Home

More information

Canonical Line Notations

Canonical Line Notations Canonical Line otations InChI vs SMILES Krisztina Boda verview Compound naming InChI SMILES Molecular equivalency Isomorphism Kekule Tautomers Finding duplicates What s Your ame? 1. Unique numbers CAS

More information

Naming Organic Compounds: Alkanes

Naming Organic Compounds: Alkanes Naming Organic Compounds: Alkanes Chemical nomenclature assigns compounds a unique name that allows them to be easily identified and structurally understood. The International Union of Pure and Applied

More information

The Schrödinger KNIME extensions

The Schrödinger KNIME extensions The Schrödinger KNIME extensions Computational Chemistry and Cheminformatics in a workflow environment Jean-Christophe Mozziconacci Volker Eyrich Topics What are the Schrödinger extensions? Workflow application

More information

Understanding ATP Activity

Understanding ATP Activity Name: Period: Understanding ATP Activity Background & Objectives: Energy within a cell exists in the form of chemical energy. A source of this chemical energy is a compound called adenosine triphosphate

More information

Geodatabase Programming with Python John Yaist

Geodatabase Programming with Python John Yaist Geodatabase Programming with Python John Yaist DevSummit DC February 26, 2016 Washington, DC Target Audience: Assumptions Basic knowledge of Python Basic knowledge of Enterprise Geodatabase and workflows

More information

Naming and Drawing Carboxylic Acids

Naming and Drawing Carboxylic Acids Assignment 4 Task 5 Due: 11:59pm on Friday, October 5, 2018 You will receive no credit for items you complete after the assignment is due. Grading Policy Naming and Drawing Carboxylic Acids Aromatic carboxylic

More information

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008 Transformers f off ChemAxon By György Pirok Content Standardization Virtual Reactions Metabolism M b li P Prediction di i Fragmentation 2 1 Standardization http://www.chemaxon.com/jchem/doc/user/standardizer.html

More information

Chapter 20. Mass Spectroscopy

Chapter 20. Mass Spectroscopy Chapter 20 Mass Spectroscopy Mass Spectrometry (MS) Mass spectrometry is a technique used for measuring the molecular weight and determining the molecular formula of an organic compound. Mass Spectrometry

More information

Geodatabase Programming with Python

Geodatabase Programming with Python DevSummit DC February 11, 2015 Washington, DC Geodatabase Programming with Python Craig Gillgrass Assumptions Basic knowledge of python Basic knowledge enterprise geodatabases and workflows Please turn

More information

Command-line tools of ChemAxon: tips and tricks

Command-line tools of ChemAxon: tips and tricks Command-line tools of ChemAxon: tips and tricks György Pirok Solutions for Cheminformatics Command-line interface A command-line interface (CLI) is a mechanism for interacting with a computer operating

More information

Searching Substances in Reaxys

Searching Substances in Reaxys Searching Substances in Reaxys Learning Objectives Understand that substances in Reaxys have different sources (e.g., Reaxys, PubChem) and can be found in Document, Reaction and Substance Records Recognize

More information

Pipeline Pilot Integration

Pipeline Pilot Integration Scientific & technical Presentation Pipeline Pilot Integration Szilárd Dóránt July 2009 The Component Collection: Quick facts Provides access to ChemAxon tools from Pipeline Pilot Free of charge Open source

More information

Basic Techniques in Structure and Substructure

Basic Techniques in Structure and Substructure Truncating Molecules Basic Techniques in Structure and Substructure Searching for Information Professionals Judith Currano Head, Chemistry Library University of Pennsylvania currano@pobox.upenn.edu Acknowledgements

More information

FAMILIES of ORGANIC COMPOUNDS

FAMILIES of ORGANIC COMPOUNDS 1 SCH4U October 2016 Organic Chemistry Chemistry of compounds that contain carbon (except: CO, CO 2, HCN, CO 3 - ) Carbon is covalently bonded to another carbon, hydrogen and possibly to oxygen, a halogen

More information

Introduction to Spark

Introduction to Spark 1 As you become familiar or continue to explore the Cresset technology and software applications, we encourage you to look through the user manual. This is accessible from the Help menu. However, don t

More information

Reaxys Pipeline Pilot Components Installation and User Guide

Reaxys Pipeline Pilot Components Installation and User Guide 1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application

More information

OAT Organic Chemistry - Problem Drill 19: NMR Spectroscopy and Mass Spectrometry

OAT Organic Chemistry - Problem Drill 19: NMR Spectroscopy and Mass Spectrometry OAT Organic Chemistry - Problem Drill 19: NMR Spectroscopy and Mass Spectrometry Question No. 1 of 10 Question 1. Which statement concerning NMR spectroscopy is incorrect? Question #01 (A) Only nuclei

More information

Administering your Enterprise Geodatabase using Python. Jill Penney

Administering your Enterprise Geodatabase using Python. Jill Penney Administering your Enterprise Geodatabase using Python Jill Penney Assumptions Basic knowledge of python Basic knowledge enterprise geodatabases and workflows You want code Please turn off or silence cell

More information

Name Date Class HYDROCARBONS

Name Date Class HYDROCARBONS 22.1 HYDROCARBONS Section Review Objectives Describe the relationship between number of valence electrons and bonding in carbon Define and describe alkanes Relate the polarity of hydrocarbons to their

More information

Similarity Search. Uwe Koch

Similarity Search. Uwe Koch Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

BIOLOGY 101. CHAPTER 4: Carbon and the Molecular Diversity of Life: Carbon: the Backbone of Life

BIOLOGY 101. CHAPTER 4: Carbon and the Molecular Diversity of Life: Carbon: the Backbone of Life BIOLOGY 101 CHAPTER 4: Carbon and the Molecular Diversity of Life: CONCEPTS: 4.1 Organic chemistry is the study of carbon compounds 4.2 Carbon atoms can form diverse molecules by bonding to four other

More information

C. Correct! The abbreviation Ar stands for an aromatic ring, sometimes called an aryl ring.

C. Correct! The abbreviation Ar stands for an aromatic ring, sometimes called an aryl ring. Organic Chemistry - Problem Drill 05: Drawing Organic Structures No. 1 of 10 1. What does the abbreviation Ar stand for? (A) Acetyl group (B) Benzyl group (C) Aromatic or Aryl group (D) Benzoyl group (E)

More information

Organometallics & InChI. August 2017

Organometallics & InChI. August 2017 Organometallics & InChI August 2017 The Cambridge Structural Database 900,000+ small-molecule crystal structures Over 60,000 datasets deposited annually Enriched and annotated by experts Structures available

More information

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity OECD QSAR Toolbox v.4.1 Tutorial illustrating new options of the structure similarity Outlook Background Aims PubChem features The exercise Workflow 2 Background This presentation is designed to familiarize

More information

4. NMR spectra. Interpreting NMR spectra. Low-resolution NMR spectra. There are two kinds: Low-resolution NMR spectra. High-resolution NMR spectra

4. NMR spectra. Interpreting NMR spectra. Low-resolution NMR spectra. There are two kinds: Low-resolution NMR spectra. High-resolution NMR spectra 1 Interpreting NMR spectra There are two kinds: Low-resolution NMR spectra High-resolution NMR spectra In both cases the horizontal scale is labelled in terms of chemical shift, δ, and increases from right

More information

SmallWorld: Efficient Maximum Common Subgraph Searching of Large Chemical Databases

SmallWorld: Efficient Maximum Common Subgraph Searching of Large Chemical Databases SmallWorld: Efficient Maximum Common Subgraph Searching of Large Chemical Databases Roger Sayle, Jose Batista and Andrew Grant NextMove Software, Cambridge, UK AstraZeneca R&D, Alderley Park, UK 2d chemical

More information

Answers to Problem Set #2

Answers to Problem Set #2 hem 242 Spring 2008 Answers to Problem Set #2 1. For this question we have been given the molecular formula, 3 5 l. Looking at the IR, the strong signal at 1720 cm 1 tells us that we have a carbonyl (we

More information

Frequent Pattern Mining: Exercises

Frequent Pattern Mining: Exercises Frequent Pattern Mining: Exercises Christian Borgelt School of Computer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, 39106 Magdeburg, Germany christian@borgelt.net http://www.borgelt.net/

More information

Organic Chemistry II KEY March 25, a) I only b) II only c) II & III d) III & IV e) I, II, III & IV

Organic Chemistry II KEY March 25, a) I only b) II only c) II & III d) III & IV e) I, II, III & IV rganic Chemistry II KEY March 25, 2015 Exam 2: VERSIN A 1. Which of the following compounds will give rise to an aromatic conjugate base? E a) I only b) II only c) II & III d) III & IV e) I, II, III &

More information

MEDICINAL CHEMISTRY I EXAM #1

MEDICINAL CHEMISTRY I EXAM #1 MEDICIAL CEMISTRY I EXAM #1 1 September 30, 2005 ame SECTI A. Answer each question in this section by writing the letter corresponding to the best answer on the line provided (2 points each; 50 points

More information

What are the building blocks of life?

What are the building blocks of life? Why? What are the building blocks of life? From the smallest single-celled organism to the tallest tree, all life depends on the properties and reactions of four classes of organic (carbon-based) compounds

More information

Unsaturated hydrocarbons. Chapter 13

Unsaturated hydrocarbons. Chapter 13 Unsaturated hydrocarbons Chapter 13 Unsaturated hydrocarbons Hydrocarbons which contain at least one C-C multiple (double or triple) bond. The multiple bond is a site for chemical reactions in these molecules.

More information

2/25/2015. Chapter 4. Introduction to Organic Compounds. Outline. Lecture Presentation. 4.1 Alkanes: The Simplest Organic Compounds

2/25/2015. Chapter 4. Introduction to Organic Compounds. Outline. Lecture Presentation. 4.1 Alkanes: The Simplest Organic Compounds Lecture Presentation Outline Chapter 4 Introduction to Organic Compounds 4.2 Representing Structures of Organic Compounds Julie Klare Fortis College Smyrna, GA Alkanes are structurally simple organic compounds

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

(b) How many hydrogen atoms are in the molecular formula of compound A? [Consider the 1 H NMR]

(b) How many hydrogen atoms are in the molecular formula of compound A? [Consider the 1 H NMR] CHEM 6371/4511 Name: The exam consists of interpretation of spectral data for compounds A-C. The analysis of each structure is worth 33.33 points. Compound A (a) How many carbon atoms are in the molecular

More information

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Fast similarity searching making the virtual real. Stephen Pickett, GSK Fast similarity searching making the virtual real Stephen Pickett, GSK Introduction Introduction to similarity searching Use cases Why is speed so crucial? Why MadFast? Some performance stats Implementation

More information

Interactive Feature Selection with

Interactive Feature Selection with Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.

More information

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Chemical Databases: Encoding, Storage and Search of Chemical Structures Chemical Databases: Encoding, Storage and Search of Chemical Structures Dr. Timur I. Madzhidov Kazan Federal University, Department of Organic Chemistry * Ray, L.C. and R.A. Kirsch, Finding Chemical Records

More information

Introduction to Chemoinformatics

Introduction to Chemoinformatics Introduction to Chemoinformatics www.dq.fct.unl.pt/cadeiras/qc Prof. João Aires-de-Sousa Email: jas@fct.unl.pt Recommended reading Chemoinformatics - A Textbook, Johann Gasteiger and Thomas Engel, Wiley-VCH

More information

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b) Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem

More information

Chemistry 20 Chapters 2 Alkanes

Chemistry 20 Chapters 2 Alkanes Chemistry 20 Chapters 2 Alkanes ydrocarbons: a large family of organic compounds and they contain only carbon and hydrogen. ydrocarbons are divided into two groups: 1. Saturated hydrocarbon: a hydrocarbon

More information

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann Information Extraction from Chemical Images Discovery Knowledge & Informatics April 24 th, 2006 Dr. Available Chemical Information Textbooks Reports Patents Databases Scientific journals and publications

More information

Unit 5: Organic Chemistry

Unit 5: Organic Chemistry Unit 5: Organic Chemistry Organic chemistry: discipline in chemistry focussing strictly on the study of hydrocarbons compounds made up of carbon & hydrogen Organic compounds can contain other elements

More information

CHM Salicylic Acid Properties (r16) 1/11

CHM Salicylic Acid Properties (r16) 1/11 CHM 111 - Salicylic Acid Properties (r16) 1/11 Purpose In this lab, you will perform several tests to attempt to confirm the identity and assess the purity of the substance you synthesized in last week's

More information

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output:

More information

Chapter 25: The Chemistry of Life: Organic and Biological Chemistry

Chapter 25: The Chemistry of Life: Organic and Biological Chemistry Chemistry: The Central Science Chapter 25: The Chemistry of Life: Organic and Biological Chemistry The study of carbon compounds constitutes a separate branch of chemistry known as organic chemistry The

More information

CHAPTER 2: Structure and Properties of Organic Molecules

CHAPTER 2: Structure and Properties of Organic Molecules 1 HAPTER 2: Structure and Properties of Organic Molecules Atomic Orbitals A. What are atomic orbitals? Atomic orbitals are defined by special mathematical functions called wavefunctions-- (x, y, z). Wavefunction,

More information

Supplementary Material

Supplementary Material Supplementary Material Contents 1 Keywords of GQL 2 2 The GQL grammar 3 3 THE GQL user guide 4 3.1 The environment........................................... 4 3.2 GQL projects.............................................

More information

2. (10 points) Consider the following algorithm performed on a sequence of numbers a 1, a 2,..., a n.

2. (10 points) Consider the following algorithm performed on a sequence of numbers a 1, a 2,..., a n. 1. (22 points) Below, a number is any string of digits that does not begin with a zero. (a) (2 points) How many 6-digit numbers are there? We may select the first digit in any of 9 ways (any digit from

More information

Paper 12: Organic Spectroscopy

Paper 12: Organic Spectroscopy Subject Chemistry Paper No and Title Module No and Title Module Tag Paper 12: Organic Spectroscopy 31: Combined problem on UV, IR, 1 H NMR, 13 C NMR and Mass - Part III CHE_P12_M31 TABLE OF CONTENTS 1.

More information

How to add your reactions to generate a Chemistry Space in KNIME

How to add your reactions to generate a Chemistry Space in KNIME How to add your reactions to generate a Chemistry Space in KNIME Introduction to CoLibri This tutorial is supposed to show how normal drawings of reactions can be easily edited to yield precise reaction

More information

Alkanes and Cycloalkanes

Alkanes and Cycloalkanes Alkanes and Cycloalkanes Families of Organic Compounds Organic compounds can be grouped into families by their common structural features We shall survey the nature of the compounds in a tour of the families

More information

BIOB111 - Tutorial activities for session 8

BIOB111 - Tutorial activities for session 8 BIOB111 - Tutorial activities for session 8 General topics for week 4 Session 8 Physical and chemical properties and examples of these functional groups (methyl, ethyl in the alkyl family, alkenes and

More information

AS Demonstrate understanding of the properties of selected organic compounds. Collated Polymer questions

AS Demonstrate understanding of the properties of selected organic compounds. Collated Polymer questions AS 91165 Demonstrate understanding of the properties of selected organic compounds Collated Polymer questions (2017) (a) Polyvinyl chloride (polychloroethene) is often used to make artificial leather.

More information

Chemical Ontologies. Chemical Ontologies. ChemAxon UGM May 23, 2012

Chemical Ontologies. Chemical Ontologies. ChemAxon UGM May 23, 2012 Chemical Ontologies ChemAxon UGM May 23, 2012 Chemical Ontologies OntoChem GmbH Heinrich-Damerow-Str. 4 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com

More information

(Refer Slide Time: 0:37)

(Refer Slide Time: 0:37) Principles and Applications of NMR spectroscopy Professor Hanudatta S. Atreya NMR Research Centre Indian Institute of Science Bangalore Module 3 Lecture No 14 We will start today with spectral analysis.

More information

Chapter 2 The text above the third display should say Three other examples.

Chapter 2 The text above the third display should say Three other examples. ERRATA Organic Chemistry, 6th Edition, by Marc Loudon Date of this release: October 10, 2018 (Items marked with (*) were corrected in the second printing.) (Items marked with ( ) were corrected in the

More information

2. Atomic Structure and Periodic Table Details of the three Sub-atomic (fundamental) Particles

2. Atomic Structure and Periodic Table Details of the three Sub-atomic (fundamental) Particles 2. Atomic Structure and Periodic Table Details of the three Sub-atomic (fundamental) Particles Particle Position Relative Mass Relative Charge Proton Nucleus 1 +1 Neutron Nucleus 1 Electron Orbitals 1/184-1

More information

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics Chemical Databases Online A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics Combined Chemical Dictionary Dictionary of Natural Products Dictionary of Organic Dictionary of Drugs

More information

CHEMDRAW ULTRA ITEC107 - Introduction to Computing for Pharmacy. ITEC107 - Introduction to Computing for Pharmacy 1

CHEMDRAW ULTRA ITEC107 - Introduction to Computing for Pharmacy. ITEC107 - Introduction to Computing for Pharmacy 1 CHEMDRAW ULTRA 12.0 ITEC107 - Introduction to Computing for Pharmacy 1 Objectives Basic drawing skills with ChemDraw Bonds, captions, hotkeys, chains, arrows Checking and cleaning up structures Chemical

More information

Exercises for Windows

Exercises for Windows Exercises for Windows CAChe User Interface for Windows Select tool Application window Document window (workspace) Style bar Tool palette Select entire molecule Select Similar Group Select Atom tool Rotate

More information

Imago: open-source toolkit for 2D chemical structure image recognition

Imago: open-source toolkit for 2D chemical structure image recognition Imago: open-source toolkit for 2D chemical structure image recognition Viktor Smolov *, Fedor Zentsev and Mikhail Rybalkin GGA Software Services LLC Abstract Different chemical databases contain molecule

More information

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October Contents 1 Cartesian Tree. Definition. 1 2 Cartesian Tree. Construction 1 3 Cartesian Tree. Operations. 2 3.1 Split............................................

More information

Names. Chiral: A chiral object is not superimposable upon its mirror image. A chiral object contains the property of "handedness.

Names. Chiral: A chiral object is not superimposable upon its mirror image. A chiral object contains the property of handedness. CEM 241 IN-CLASS #3 MOLECULAR MODELS EXERCISE Names Stereoisomerism Construct a model containing a tetrahedral carbon (black ball) that is attached to four different atoms (use the green, orange, purple

More information

antidisestablishmenttarianism an-ti-dis-es-tab-lish-ment-ta-ri-an-ism

antidisestablishmenttarianism an-ti-dis-es-tab-lish-ment-ta-ri-an-ism What do you do when you encounter a very long, difficult word? 1 antidisestablishmenttarianism break it up into syllables: an-ti-dis-es-tab-lish-ment-ta-ri-an-ism meaning: antidisestablishmenttarianism

More information

Chapter 15 Molecular Luminescence Spectrometry

Chapter 15 Molecular Luminescence Spectrometry Chapter 15 Molecular Luminescence Spectrometry Two types of Luminescence methods are: 1) Photoluminescence, Light is directed onto a sample, where it is absorbed and imparts excess energy into the material

More information

Alkanes. Introduction

Alkanes. Introduction Introduction Alkanes Recall that alkanes are aliphatic hydrocarbons having C C and C H bonds. They can be categorized as acyclic or cyclic. Acyclic alkanes have the molecular formula C n H 2n+2 (where

More information

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics Marvin Sketching, viewing and predicting properties with Marvin - features, tips and tricks Gyorgy Pirok Solutions for Cheminformatics The Marvin family The Marvin toolkit provides web-enabled components

More information

March 08 Dr. Abdullah Saleh

March 08 Dr. Abdullah Saleh March 08 Dr. Abdullah Saleh 1 Effects of Substituents on Reactivity and Orientation The nature of groups already on an aromatic ring affect both the reactivity and orientation of future substitution Activating

More information

6.1.1 Aromatic Compounds

6.1.1 Aromatic Compounds 6.1.1 Aromatic ompounds There are two major classes of organic chemicals aliphatic : straight or branched chain organic substances aromatic or arene: includes one or more ring of six carbon ams with delocalised

More information

Open PHACTS Explorer: Compound by Name

Open PHACTS Explorer: Compound by Name Open PHACTS Explorer: Compound by Name This document is a tutorial for obtaining compound information in Open PHACTS Explorer (explorer.openphacts.org). Features: One-click access to integrated compound

More information

Polymerization Modeling

Polymerization Modeling www.optience.com Polymerization Modeling Objective: Modeling Condensation Polymerization by using Functional Groups In this example, we develop a kinetic model for condensation polymerization that tracks

More information

Alkanes and Cycloalkanes

Alkanes and Cycloalkanes Chapter 3 Alkanes and Cycloalkanes Two types Saturated hydrocarbons Unsaturated hydrocarbons 3.1 Alkanes Also referred as aliphatic hydrocarbons General formula: CnH2n+2 (straight chain) and CnH2n (cyclic)

More information

Identification of functional groups in the unknown Will take in lab today

Identification of functional groups in the unknown Will take in lab today Qualitative Analysis of Unknown Compounds 1. Infrared Spectroscopy Identification of functional groups in the unknown Will take in lab today 2. Elemental Analysis Determination of the Empirical Formula

More information

Alkanes and Cycloalkanes

Alkanes and Cycloalkanes Alkanes and Cycloalkanes Alkanes molecules consisting of carbons and hydrogens in the following ratio: C n H 2n+2 Therefore, an alkane having 4 carbons would have 2(4) + 2 hydrogens, which equals 10 hydrogens.

More information

Data Structures and Algorithms " Search Trees!!

Data Structures and Algorithms  Search Trees!! Data Structures and Algorithms " Search Trees!! Outline" Binary Search Trees! AVL Trees! (2,4) Trees! 2 Binary Search Trees! "! < 6 2 > 1 4 = 8 9 Ordered Dictionaries" Keys are assumed to come from a total

More information

12.1 The Nature of Organic molecules

12.1 The Nature of Organic molecules 12.1 The Nature of Organic molecules Organic chemistry: : The chemistry of carbon compounds. Carbon is tetravalent; it always form four bonds. Prentice Hall 2003 Chapter One 2 Organic molecules have covalent

More information

Atomic weight = Number of protons + neutrons

Atomic weight = Number of protons + neutrons 1 BIOLOGY Elements and Compounds Element is a substance that cannot be broken down to other substances by chemical reactions. Essential elements are chemical elements required for an organism to survive,

More information

The now-banned diet drug fen-phen is a mixture of two synthetic substituted benzene: fenfluramine and phentermine.

The now-banned diet drug fen-phen is a mixture of two synthetic substituted benzene: fenfluramine and phentermine. The now-banned diet drug fen-phen is a mixture of two synthetic substituted benzene: fenfluramine and phentermine. Chemists have synthesized compounds with structures similar to adrenaline, producing amphetamine.

More information

4.1.1 Organic: Basic Concepts

4.1.1 Organic: Basic Concepts .. rganic: Basic oncepts ydrocarbon is a compound consisting of hydrogen and carbon only Basic definitions to know Saturated: ontain single carbon-carbon bonds only Unsaturated : ontains a = double bond

More information

4. Constraints and Hydrogen Atoms

4. Constraints and Hydrogen Atoms 4. Constraints and ydrogen Atoms 4.1 Constraints versus restraints In crystal structure refinement, there is an important distinction between a constraint and a restraint. A constraint is an exact mathematical

More information

Tautomerism in chemical information management systems

Tautomerism in chemical information management systems Tautomerism in chemical information management systems Dr. Wendy A. Warr http://www.warr.com Tautomerism in chemical information management systems Author: Wendy A. Warr DOI: 10.1007/s10822-010-9338-4

More information

Organic Chemistry. FAMILIES of ORGANIC COMPOUNDS

Organic Chemistry. FAMILIES of ORGANIC COMPOUNDS 1 SCH4U September 2017 Organic Chemistry Is the chemistry of compounds that contain carbon (except: CO, CO 2, HCN, CO 3 2- ) Carbon is covalently bonded to another carbon, hydrogen and possibly to oxygen,

More information

Relations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples.

Relations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples. Relations We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples. Relations use ordered tuples to represent relationships among

More information