Metrabase The Metabolism and Transport Database. user manual v

Size: px
Start display at page:

Download "Metrabase The Metabolism and Transport Database. user manual v"

Transcription

1 Metrabase The Metabolism and Transport Database user manual v

2 Contents 1 INTRODUCTION 3 2 METRABASE CONTENT ACTIVITIES (INTERACTIONS) PROTEINS COMPOUNDS DATA SOURCES 13 3 THE TRANSPORTER SUBSTRATE DATASET 14 4 USAGE 16

3 1 Introduction Metrabase is a cheminformatics and bioinformatics resource that contains manually curated structural, physicochemical and biological data related to small molecule transport and metabolism. Metrabase offers structured and easily accessible data on interactions between proteins and chemical compounds, providing not only actions and measured activities, but also chemical structural information, tissue expression data and negative action types that are essential in modelling activity. Easily accessible refers to computational processing in the first place. Even when data is made available, an easy way to process it computationally is quite often missing in a range of freely available resources (e.g. an online search and browse facility offered, but not download and permission barriers imposed). We aim to construct a comprehensive, thoroughly annotated and easy to use resource of high quality small molecule metabolism and transport information. In particular by covering the areas of biochemistry, pharmacology and toxicology, we hope diverse research communities will find Metrabase useful and valuable. 3

4 2 Metrabase content Metrabase version 1.0 contains curated data related to human transport and metabolism of chemical compounds. Its primary content includes nearly 3500 small molecule substrates and modulators of transport proteins and, to a smaller extent, cytochrome P450 enzymes (CYPs). Proteins Compounds Interactions References 20 transporters and 13 CYPs transporters CYPs The major focus of Metrabase v1.0 is on transport proteins: specifically, on their interactions with small molecules that were experimentally found to be (or not to be) substrates. 4

5 5

6 Metrabase 1.0 schema 6

7 2.1 Activities (interactions) The key information held in the activities table of the database covers the interactions between proteins and chemical compounds, indicating the compound action type as either substrate, non-substrate, inducer, non-inducer, repressor, inhibitor, non-inhibitor, stimulator or binder (the action_type field). Action types Protein activity substrate (transport or catalysis) non-substrate inhibitor/repressor (negative modulators) Compound activity stimulator/inducer (positive modulators) (affecting protein activity/expression) non-inhibitor/non-inducer (inactive compounds) Action type was set to binder where it did not fall into any of these categories, but the molecule was found to bind to the protein. key fields: cmpd_id protein_id ref_id action_type species (However, in version 1.0 species = human for all the records and so can be omitted.) 7

8 Compounds were categorised as substrates or non-substrates according to the results presented in the publication providing the data point and no further evaluation was carried out on our side. Care must be taken with respect to the current status of the inhibition records, since depending on the measurement threshold (e.g. percentage inhibition) some of the compounds annotated as inhibitors can be regarded as non-inhibitors and vice versa. A proper classification of compounds as either inhibitors or noninhibitors is planned for subsequent releases of the database. Other activities fields holding additional extracted data and annotations, such as assay descriptions, relevant experimental measurements, cell systems, compound concentrations and the substrates used in inhibition assays, may have only been partially completed in this release. This is partially due to assay information not being included in most of the reviews. The published_label field contains chemical names, abbreviations or designations employed in publications to label compounds. This field has been completed for all except records linked to the external datasets and can be used to easily identify compounds in their respective publications. Activity types were mostly accepted as found in the publications and therefore they may be overlapping. Consequently, selecting all the activity types relevant for one s search is recommended. 8

9 2.2 Proteins The proteins contained in Metrabase are categorised as either transporters or enzymes (the protein_type field) and are provided with the HUGO Gene Nomenclature Committee (HGNC) approved symbols and names ( as well as UniProt IDs. Protein sequences for the indicated isoforms were included from UniProt ( Other fields include additional information, such as Gene, RefSeq and Ensembl IDs and TC (Transporter Classification) or EC (Enzyme Commission) numbers. Metrabase also contains information about protein expression levels across healthy human tissues. Part of this data is based on immunohistochemistry using tissue microarrays (gene, tissue, cell type, level, expression type and reliability) and comes from the normal_tissue.csv file of the Human Protein Atlas (HPA) v9.0 ( All other expression records contain data that was extracted from the literature. The levels of expression (mrna and/or protein levels) for non-hpa records (i.e. where ref_id is not null): expressed (if the level had not been specified), none, none-low, low, low-medium, medium, medium-high and high. 9

10 2.3 Compounds The total number of records in the compounds table is 3562, but the number of compounds with recorded interaction data for both transporters and enzymes is The remaining compounds are used in other tables, such as cmpd_variants, which holds stereoisomers, multi-component structures and different forms of a compound. Molecular structures are available in MDL molfile format and as absolute (unique and isomeric) SMILES strings (in Kekulé form). They were mostly verified using the Chemspider ( and/or SciFinder ( databases. The standard InChI and InChI Key strings were generated using v1.04 of the InChI software ( 10

11 The great majority of the compounds are small organic molecules (containing just the following atoms: C, H, O, P, S, N, F, Cl, Br and I) and all the other types (coordination complexes, inorganic compounds, metalloid-containing compounds, selenium-containing compounds and polymers) are listed in the compound_types table. This table also contains the DrugBank types of drugs (approved, experimental, illicit, investigational, nutraceutical and withdrawn) taken from DrugBank v3.0 ( and can easily be improved by annotating compounds further, for example, as natural products including their subtypes (e.g. natural product: terpene: sesquiterpene). The properties table contains selected molecular properties that were calculated/predicted for all (molecular mass) or just the small organic single-component structures (constitutional descriptors: atom and bond counts, hydrogen bond donor and acceptor counts, ring count and rotatable bond count; log P and log D) using ChemAxon s Calculator (cxcalc) v6.1.3 ( Experimental properties are not currently provided, i.e. properties.type = c for all records (where c stands for calculated ). The multi-component structures can easily be identified using the compounds.fragment_count field and their single-component counterparts using the cmpd_variants table). The synonyms table contains chemical names of Metrabase compounds (systematic, semi-systematic, common, trade names, abbreviations, codes). One of the synonyms was selected as the main name (the compounds.cmpd_name field) for each compound. Chemical names were obtained mostly from DrugBank 11

12 (these might refer to compound variants as well) and SciFinder. The systematic (IUPAC) names were computer generated using the ChemAxon s IUPAC Naming Plugin v6.1.3 (the compounds.iupac_name field). The cmpd_ids table contains external compound IDs. Most of the compounds have ChemSpider IDs (CSIDs) and only if CSID had not been found, CAS Registry Number was provided (CASRN; CAS Registry Number is a Registered Trademark of the American Chemical Society). DrugBank IDs are also included were identified (especially for the approved drugs). MBCD number is the compound identifier in Metrabase, e.g. mbcd (MBID for compounds). cmpd_id: mbcd (CSID:14034) cmpd_name: Ethidium bromide smiles: [Br-].CC[N+]1=C(C2=CC=CC=C2)C2=CC(N)=CC=C2C2=CC=C(N)C=C12 std_inchi: 1S/C21H19N3.BrH/c (23) (20) (22)12-19(17)21(24) ;/h3-13,23H,2,22H2,1H3;1H std_inchikey: ZMMJGEGLRURXTF-UHFFFAOYSA-N iupac_name: 3,8-diamino-5-ethyl-6-phenylphenanthridin-5-ium bromide formula_dot: C21H20N3.Br fragment_count: 2 12

13 2.4 Data sources The datasources table contains the sources of data in the database, including information about software that was used to calculate molecular properties. The datasource_id and datasource_version fields indicate the source of all Metrabase records where applicable. The refs table contains the publications citation information (bibliographic fields) and links. Most of them (91%) are original peer-reviewed research articles and the aim remains to link all Metrabase records to primary literature sources (7% are reviews). PubMed IDs are provided where available, as well as DOIs (if DOI was not available, URL is given instead in the doi_url field). Attach to DOI to resolve a DOI, e.g. 13

14 3 The transporter substrate dataset We aim to provide a version of the transporter substrate dataset (MBTPsubDS) as a supplement to each Metrabase release. Each MBTPsubDS version contains interactions between small molecules and transporters, and includes all the unique substrate and non-substrate records obtained from Metrabase and processed to facilitate human transporter data analysis and predictive modelling (by 'unique' we mean the unique (cmpd_id, protein_id, action_type) tuples). MBTPsubDS1_0 MBTPsubDS1_0a based on Metrabase v1.0; all the interactions involving conflicting action types (where a compound was found to be both a substrate and a non-substrate of a single transporter) were excluded some of the conflicting action types were resolved upon our evaluation of such records and the corresponding compound-transporter pairs were added to MBTPsubDS1_0 where we thought we could consider the compound as either a substrate or a non-substrate 14

15 15

16 4 Usage Web interface Local MySQL database Search by protein Search by compound Expression data Protein list Download To load Metrabase from a dump file (metrabase1_0.sql), you should first create a database on your system and then load the dump file, for example like this: # tar -xzvf metrabase1_0.tar.gz # mysql -u username -p mysql> CREATE DATABASE metrabase; # mysql -u username -p metrabase < metrabase1_0.sql MySQL Workbench can be used as an interface for MySQL. 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 Credits Metrabase was developed by Lora Mak in collaboration with David Marcus, Andreas Bender and Robert C. Glen at the Centre for Molecular Informatics and Galina Yarova, Guus Duchateau and Werner Klaffke at Unilever, with the much appreciated help from the following (at the time) 2nd and 3rd year undergraduate students of the University of Cambridge: Claire Dickson, Joseph Dixon, Ivan Lam, Richard Lewis, Callum Picken, Claudia Pop, Heyao Shi, Emma Stirk, Yasmin Surani, Paddy Szeto, Nathaniel Wand, Julian Willis and Jing Xiangyi. Metrabase's web interface was developed by Andrew Howlett at the Centre for Molecular Informatics. Andrew also designed the Metrabase logo. Metrabase was realised and is being maintained in the Glen group. 24

25 Licensing Metrabase is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License ( However, with respect to the integrated data, such as the TP-Search, ChEMBL and Human Protein Atlas records that are distributed as part of Metrabase, the user is referred to each external data source regarding their respective licensing. This means that the integrated data retains the licensing of the original data sources. The TP-Search and ChEMBL records may have been modified and augmented, while the Human Protein Atlas records were included unmodified. Attribution We hope you find our database and the associated datasets useful. If you use it, please cite us: Mak L, Marcus D, Howlett A, Yarova G, Duchateau G, Klaffke W, Bender A, Glen RC: Metrabase: a cheminformatics and bioinformatics database for small molecule transporter data analysis and (Q)SAR modeling. Journal of Cheminformatics 2015, 7:31. Metrabase v1.0, University of Cambridge, 25

26 Metrabase - Contact: metrabase@ch.cam.ac.uk The Centre for Molecular Informatics Department of Chemistry, University of Cambridge Lensfield Road, Cambridge, CB2 1EW, UK This document was prepared by Dr Lora Mak and reviewed by Prof Robert C. Glen Metrabase Development Team, University of Cambridge. All rights reserved.

Open PHACTS Explorer: Compound by Name

Open PHACTS Explorer: Compound by Name Open PHACTS Explorer: Compound by Name This document is a tutorial for obtaining compound information in Open PHACTS Explorer (explorer.openphacts.org). Features: One-click access to integrated compound

More information

ATLAS of Biochemistry

ATLAS of Biochemistry ATLAS of Biochemistry USER GUIDE http://lcsb-databases.epfl.ch/atlas/ CONTENT 1 2 3 GET STARTED Create your user account NAVIGATE Curated KEGG reactions ATLAS reactions Pathways Maps USE IT! Fill a gap

More information

Chemical Data Retrieval and Management

Chemical Data Retrieval and Management Chemical Data Retrieval and Management ChEMBL, ChEBI, and the Chemistry Development Kit Stephan A. Beisken What is EMBL-EBI? Part of the European Molecular Biology Laboratory International, non-profit

More information

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann Information Extraction from Chemical Images Discovery Knowledge & Informatics April 24 th, 2006 Dr. Available Chemical Information Textbooks Reports Patents Databases Scientific journals and publications

More information

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

SABIO-RK Integration and Curation of Reaction Kinetics Data  Ulrike Wittig SABIO-RK Integration and Curation of Reaction Kinetics Data http://sabio.villa-bosch.de/sabiork Ulrike Wittig Overview Introduction /Motivation Database content /User interface Data integration Curation

More information

A Journey from Data to Knowledge

A Journey from Data to Knowledge A Journey from Data to Knowledge Ian Bruno Cambridge Crystallographic Data Centre @ijbruno @ccdc_cambridge Experimental Data C 10 H 16 N +,Cl - Radspunk, CC-BY-SA CC-BY-SA Jeff Dahl, CC-BY-SA Experimentally

More information

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database OECD QSAR Toolbox v.3.3 Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database Outlook Background The exercise Workflow Save prediction 23.02.2015

More information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor

More information

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.1 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

How to Create a Substance Answer Set

How to Create a Substance Answer Set How to Create a Substance Answer Set Select among five search techniques to find substances Since substances can be described by multiple names or other characteristics, SciFinder gives you the flexibility

More information

Research Article HomoKinase: A Curated Database of Human Protein Kinases

Research Article HomoKinase: A Curated Database of Human Protein Kinases ISRN Computational Biology Volume 2013, Article ID 417634, 5 pages http://dx.doi.org/10.1155/2013/417634 Research Article HomoKinase: A Curated Database of Human Protein Kinases Suresh Subramani, Saranya

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009 InChI keys as standard global identifiers in chemistry web services Russ Hillard ACS, Salt Lake City March 2009 Context of this talk We have created a web service That aggregates sources built independently

More information

Reaxys Pipeline Pilot Components Installation and User Guide

Reaxys Pipeline Pilot Components Installation and User Guide 1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application

More information

Tautomerism in chemical information management systems

Tautomerism in chemical information management systems Tautomerism in chemical information management systems Dr. Wendy A. Warr http://www.warr.com Tautomerism in chemical information management systems Author: Wendy A. Warr DOI: 10.1007/s10822-010-9338-4

More information

Organometallics & InChI. August 2017

Organometallics & InChI. August 2017 Organometallics & InChI August 2017 The Cambridge Structural Database 900,000+ small-molecule crystal structures Over 60,000 datasets deposited annually Enriched and annotated by experts Structures available

More information

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics Chemical Databases Online A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics Combined Chemical Dictionary Dictionary of Natural Products Dictionary of Organic Dictionary of Drugs

More information

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates INTERNATIONAL STANDARD ISO 19111 Second edition 2007-07-01 Geographic information Spatial referencing by coordinates Information géographique Système de références spatiales par coordonnées Reference number

More information

InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison

InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison InChI/InChIKey vs. CI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (CI/CADD), Laboratory of Medicinal Chemistry, CI-Frederick, I, DS Comparison Standard InChI/InChIKeys

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

In Silico Investigation of Off-Target Effects

In Silico Investigation of Off-Target Effects PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics

More information

NMR Predictor. Introduction

NMR Predictor. Introduction NMR Predictor This manual gives a walk-through on how to use the NMR Predictor: Introduction NMR Predictor QuickHelp NMR Predictor Overview Chemical features GUI features Usage Menu system File menu Edit

More information

CHEMISTRY (CHE) CHE 104 General Descriptive Chemistry II 3

CHEMISTRY (CHE) CHE 104 General Descriptive Chemistry II 3 Chemistry (CHE) 1 CHEMISTRY (CHE) CHE 101 Introductory Chemistry 3 Survey of fundamentals of measurement, molecular structure, reactivity, and organic chemistry; applications to textiles, environmental,

More information

The Case for Use Cases

The Case for Use Cases The Case for Use Cases The integration of internal and external chemical information is a vital and complex activity for the pharmaceutical industry. David Walsh, Grail Entropix Ltd Costs of Integrating

More information

ISO/TR TECHNICAL REPORT. Nanotechnologies Methodology for the classification and categorization of nanomaterials

ISO/TR TECHNICAL REPORT. Nanotechnologies Methodology for the classification and categorization of nanomaterials TECHNICAL REPORT ISO/TR 11360 First edition 2010-07-15 Nanotechnologies Methodology for the classification and categorization of nanomaterials Nanotechnologies Méthodologie de classification et catégorisation

More information

Drug Informatics for Chemical Genomics...

Drug Informatics for Chemical Genomics... Drug Informatics for Chemical Genomics... An Overview First Annual ChemGen IGERT Retreat Sept 2005 Drug Informatics for Chemical Genomics... p. Topics ChemGen Informatics The ChemMine Project Library Comparison

More information

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.0 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

Dictionary of ligands

Dictionary of ligands Dictionary of ligands Some of the web and other resources Small molecules DrugBank: http://www.drugbank.ca/ ZINC: http://zinc.docking.org/index.shtml PRODRUG: http://www.compbio.dundee.ac.uk/web_servers/prodrg_down.html

More information

Dongyue Cao,, Junmei Wang,, Rui Zhou, Youyong Li, Huidong Yu, and Tingjun Hou*,, INTRODUCTION

Dongyue Cao,, Junmei Wang,, Rui Zhou, Youyong Li, Huidong Yu, and Tingjun Hou*,, INTRODUCTION pubs.acs.org/jcim ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics Knowledge Base (PKKB): A Comprehensive Database of Pharmacokinetic and Toxic Properties for Drugs Dongyue Cao,, Junmei Wang,,

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Dock Ligands from a 2D Molecule Sketch

Dock Ligands from a 2D Molecule Sketch Dock Ligands from a 2D Molecule Sketch March 31, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.4.1 OECD QSAR Toolbox v.4.1 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Chemical Space: Modeling Exploration & Understanding

Chemical Space: Modeling Exploration & Understanding verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK

More information

Reaxys Medicinal Chemistry Fact Sheet

Reaxys Medicinal Chemistry Fact Sheet R&D SOLUTIONS FOR PHARMA & LIFE SCIENCES Reaxys Medicinal Chemistry Fact Sheet Essential data for lead identification and optimization Reaxys Medicinal Chemistry empowers early discovery in drug development

More information

ISO INTERNATIONAL STANDARD

ISO INTERNATIONAL STANDARD INTERNATIONAL STANDARD ISO 9276-6 First edition 2008-09-15 Representation of results of particle size analysis Part 6: Descriptive and quantitative representation of particle shape and morphology Représentation

More information

How Do Metabolites Differ from Their Parent Molecules and How Are They Excreted?

How Do Metabolites Differ from Their Parent Molecules and How Are They Excreted? How Do Metabolites Differ from Their Parent Molecules and How Are They Excreted? Johannes Kirchmair 1, Andrew Howlett 1, Julio E. Peironcely 2,3,4, Daniel S. Murrell 1, Mark J. Williamson 1, Samuel E.

More information

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.3.3 OECD QSAR Toolbox v.3.3 Step-by-step example on how to predict the skin sensitisation potential of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific Aims Read

More information

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates Part 2: Extension for parametric values

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates Part 2: Extension for parametric values INTERNATIONAL STANDARD ISO 19111-2 First edition 2009-08-15 Geographic information Spatial referencing by coordinates Part 2: Extension for parametric values Information géographique Système de références

More information

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP

More information

OECD QSAR Toolbox v.3.4

OECD QSAR Toolbox v.3.4 OECD QSAR Toolbox v.3.4 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan Database Speaks Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan Email: liuu@chem.sinica.edu.tw 1 OUTLINES -- Personal experiences Publication types Secondary publication

More information

Chapter 6- An Introduction to Metabolism*

Chapter 6- An Introduction to Metabolism* Chapter 6- An Introduction to Metabolism* *Lecture notes are to be used as a study guide only and do not represent the comprehensive information you will need to know for the exams. The Energy of Life

More information

CIM Report May 8, :01pm

CIM Report May 8, :01pm ATTACHMENT C CIM Report May 8, 2017 3:01pm Course Changes Pending Approval from Graduate Committee Code Field Old Value New Value ARTS 4623 ARTS 4963 4123 4213 Deleted code ARTS 4613 ARTS 4963 Course Catalog

More information

On InChI and evaluating the quality of cross-reference links

On InChI and evaluating the quality of cross-reference links Galgonek and Vondrášek Journal of Cheminformatics 2014, 6:15 RESEARCH ARTICLE Open Access On InChI and evaluating the quality of cross-reference links Jakub Galgonek * and Jiří Vondrášek * Abstract Background:

More information

CSD. Unlock value from crystal structure information in the CSD

CSD. Unlock value from crystal structure information in the CSD CSD CSD-System Unlock value from crystal structure information in the CSD The Cambridge Structural Database (CSD) is the world s most comprehensive and up-todate knowledge base of crystal structure data,

More information

KATE2017 on NET beta version https://kate2.nies.go.jp/nies/ Operating manual

KATE2017 on NET beta version  https://kate2.nies.go.jp/nies/ Operating manual KATE2017 on NET beta version http://kate.nies.go.jp https://kate2.nies.go.jp/nies/ Operating manual 2018.03.29 KATE2017 on NET was developed to predict the following ecotoxicity values: 50% effective concentration

More information

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal Representation of molecular structures Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal A hierarchy of structure representations Name (S)-Tryptophan 2D Structure 3D Structure Molecular

More information

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics Marvin Sketching, viewing and predicting properties with Marvin - features, tips and tricks Gyorgy Pirok Solutions for Cheminformatics The Marvin family The Marvin toolkit provides web-enabled components

More information

Imago: open-source toolkit for 2D chemical structure image recognition

Imago: open-source toolkit for 2D chemical structure image recognition Imago: open-source toolkit for 2D chemical structure image recognition Viktor Smolov *, Fedor Zentsev and Mikhail Rybalkin GGA Software Services LLC Abstract Different chemical databases contain molecule

More information

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES DIFFERENT LEVELS OF KNOWLEDGE REPRESENTATION IN CHEMISTRY Michael Braden, PhD ACS / San Diego/ 2016 Overview ChemAxon Who are we? Examples/use cases: Create

More information

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit October 6 University Faculty of pharmacy Computer Aided Drug Design Unit CADD@O6U.edu.eg CADD Computer-Aided Drug Design Unit The development of new drugs is no longer a process of trial and error or strokes

More information

Command-line tools of ChemAxon: tips and tricks

Command-line tools of ChemAxon: tips and tricks Command-line tools of ChemAxon: tips and tricks György Pirok Solutions for Cheminformatics Command-line interface A command-line interface (CLI) is a mechanism for interacting with a computer operating

More information

Integrated Cheminformatics to Guide Drug Discovery

Integrated Cheminformatics to Guide Drug Discovery Integrated Cheminformatics to Guide Drug Discovery Matthew Segall, Ed Champness, Peter Hunt, Tamsin Mansley CINF Drug Discovery Cheminformatics Approaches August 23 rd 2017 Optibrium, StarDrop, Auto-Modeller,

More information

Navigating between patents, papers, abstracts and databases using public sources and tools

Navigating between patents, papers, abstracts and databases using public sources and tools Navigating between patents, papers, abstracts and databases using public sources and tools Christopher Southan 1 and Sean Ekins 2 TW2Informatics, Göteborg, Sweden, Collaborative Drug Discovery, North Carolina,

More information

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr. Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion

More information

JCICS Major Research Areas

JCICS Major Research Areas JCICS Major Research Areas Chemical Information Text Searching Structure and Substructure Searching Databases Patents George W.A. Milne C571 Lecture Fall 2002 1 JCICS Major Research Areas Chemical Computation

More information

ISO INTERNATIONAL STANDARD. Geographic information Metadata Part 2: Extensions for imagery and gridded data

ISO INTERNATIONAL STANDARD. Geographic information Metadata Part 2: Extensions for imagery and gridded data INTERNATIONAL STANDARD ISO 19115-2 First edition 2009-02-15 Geographic information Metadata Part 2: Extensions for imagery and gridded data Information géographique Métadonnées Partie 2: Extensions pour

More information

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank Eawag: Swiss Federal Institute of Aquatic Science and Technology Presenting: Emma Schymanski Coauthors: Michael Stravs,

More information

INTERNATIONAL STANDARD

INTERNATIONAL STANDARD INTERNATIONAL STANDARD ISO 17710 First edition 2002-05-15 Plastics Polyols for use in the production of polyurethane Determination of degree of unsaturation by microtitration Plastiques Polyols pour la

More information

Structural biology and drug design: An overview

Structural biology and drug design: An overview Structural biology and drug design: An overview livier Taboureau Assitant professor Chemoinformatics group-cbs-dtu otab@cbs.dtu.dk Drug discovery Drug and drug design A drug is a key molecule involved

More information

Scientific Integrity: A crystallographic perspective

Scientific Integrity: A crystallographic perspective Scientific Integrity: A crystallographic perspective Ian Bruno - Director, Strategic Partnerships The Cambridge Crystallographic Data Centre @ijbruno @ccdc_cambridge Scientific Integrity: Can We Rely on

More information

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors Mol2Net, 2015, 1(Section B), pages 1-6, Proceedings 1 SciForum Mol2Net Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors Flavia C. Silva, Ekaterina V. Varlamova, Rodolpho

More information

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination. Module Structure: (15 credits each) Lectures and Assessment: 50% coursework, 50% unseen examination. Module Title Module 1: Bioinformatics and structural biology as applied to drug design MEDC0075 In the

More information

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space pubs.acs.org/jcim How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space Alexios Koutsoukas,, Shardul Paricharak,,, Warren R. J. D. Galloway,

More information

Canonical Line Notations

Canonical Line Notations Canonical Line otations InChI vs SMILES Krisztina Boda verview Compound naming InChI SMILES Molecular equivalency Isomorphism Kekule Tautomers Finding duplicates What s Your ame? 1. Unique numbers CAS

More information

ISO 9277 INTERNATIONAL STANDARD. Determination of the specific surface area of solids by gas adsorption BET method

ISO 9277 INTERNATIONAL STANDARD. Determination of the specific surface area of solids by gas adsorption BET method INTERNATIONAL STANDARD ISO 9277 Second edition 2010-09-01 Determination of the specific surface area of solids by gas adsorption BET method Détermination de l'aire massique (surface spécifique) des solides

More information

Regulatory use of (Q)SARs under REACH

Regulatory use of (Q)SARs under REACH Regulatory use of (Q)SARs under REACH Webinar on Information requirements 10 December 2009 http://echa.europa.eu 1 Using (Q)SAR models Application under REACH to fulfill information requirements Use of

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.4.1 OECD QSAR Toolbox v.4. Tutorial illustrating quantitative metabolic information and related functionalities Outlook Aim Background Example for: Visualizing quantitative data within Toolbox user interface

More information

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered Databases Overview Database Overview Chart Databases ChemIndex / NCI Cancer and AIDS ChemACX The Merck Index Ashgate Drugs Traditional Chinese Medicines And now, a Few Words About Searching chemical structure

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding OECD QSAR Toolbox v.3.3 Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding Outlook Background Objectives Specific Aims The exercise Workflow

More information

Basic Techniques in Structure and Substructure

Basic Techniques in Structure and Substructure Truncating Molecules Basic Techniques in Structure and Substructure Searching for Information Professionals Judith Currano Head, Chemistry Library University of Pennsylvania currano@pobox.upenn.edu Acknowledgements

More information

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic

More information

CHEMISTRY (CHEM) CHEM 5. Chemistry for Nurses. 5 Units. Prerequisite(s): One year high school algebra; high school chemistry

CHEMISTRY (CHEM) CHEM 5. Chemistry for Nurses. 5 Units. Prerequisite(s): One year high school algebra; high school chemistry Chemistry (CHEM) 1 CHEMISTRY (CHEM) CHEM 1A. General Chemistry I. 5 Units Prerequisite(s): High school chemistry and college algebra; sufficient performance on the college algebra diagnostic test, or equivalent;

More information

International Chemical Identifier for Reactions (RInChI)

International Chemical Identifier for Reactions (RInChI) International Chemical Identifier for Reactions (RInChI) Guenter Grethe *, Jonathan M Goodman, Chad H G Allen 352 Channing Way, Alameda, CA 94502-7409, USA Unilever Centre for Molecular Science Informatics,

More information

The Chemistry department approved by the American Chemical Society offers a Chemistry degree in the following concentrations:

The Chemistry department approved by the American Chemical Society offers a Chemistry degree in the following concentrations: Chemistry 1 Chemistry 203-C Materials Science Building Telephone: 256.824.6153 Email: chem.admin@uah.edu (chem@uah.edu) The Chemistry department approved by the American Chemical Society offers a Chemistry

More information

1. (18) Multiple choice questions. Please place your answer on the line preceding each question.

1. (18) Multiple choice questions. Please place your answer on the line preceding each question. CEM 5720 ame KEY Exam 2 ctober 21, 2015 Read all questions carefully and attempt those questions you are sure of first. Remember to proof your work; art work carries as much importance as written responses.

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Peter L Warren, Pamela Y Shadforth ICI Technology, Wilton, Middlesbrough, U.K.

Peter L Warren, Pamela Y Shadforth ICI Technology, Wilton, Middlesbrough, U.K. 783 SCOPE AND LIMITATIONS XRF ANALYSIS FOR SEMI-QUANTITATIVE Introduction Peter L Warren, Pamela Y Shadforth ICI Technology, Wilton, Middlesbrough, U.K. Historically x-ray fluorescence spectrometry has

More information

Internet Resource Guide. For Chemical Engineering Students at The Pennsylvania State University

Internet Resource Guide. For Chemical Engineering Students at The Pennsylvania State University Internet Resource Guide For Chemical Engineering Students at The Pennsylvania State University Faith Tran 29 FEBRUARY 2016 Table of Contents 1 Front Matter 1.1 What is in the Guide... 2 1.2 Who this Guide

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

ISO INTERNATIONAL STANDARD. Sample preparation Dispersing procedures for powders in liquids

ISO INTERNATIONAL STANDARD. Sample preparation Dispersing procedures for powders in liquids INTERNATIONAL STANDARD ISO 14887 First edition 2000-09-01 Sample preparation Dispersing procedures for powders in liquids Préparation de l'échantillon Procédures pour la dispersion des poudres dans les

More information

A multi-label approach to target prediction taking ligand promiscuity into account

A multi-label approach to target prediction taking ligand promiscuity into account Afzal et al. Journal of Cheminformatics (2015) 7:24 DOI 10.1186/s13321-015-0071-9 RESEARCH ARTICLE Open Access A multi-label approach to target prediction taking ligand promiscuity into account Avid M

More information

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline OECD QSAR Toolbox v.3.4 Example for predicting Repeated dose toxicity of 2,3-dimethylaniline Outlook Background Objectives The exercise Workflow Save prediction 2 Background This is a step-by-step presentation

More information

ISO 2575 INTERNATIONAL STANDARD. Road vehicles Symbols for controls, indicators and tell-tales

ISO 2575 INTERNATIONAL STANDARD. Road vehicles Symbols for controls, indicators and tell-tales INTERNATIONAL STANDARD ISO 2575 Eighth edition 2010-07-01 Road vehicles Symbols for controls, indicators and tell-tales Véhicules routiers Symboles pour les commandes, indicateurs et témoins Reference

More information

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures

More information

Analyzing Small Molecule Data in R

Analyzing Small Molecule Data in R Analyzing Small Molecule Data in R Tyler Backman and Thomas Girke December 12, 2011 Analyzing Small Molecule Data in R Slide 1/49 Introduction CMP Structure Formats Similarity Searching Background Fragment

More information

Quality and Coverage of Data Sources

Quality and Coverage of Data Sources Quality and Coverage of Data Sources Objectives Selecting an appropriate source for each item of information to be stored in the GIS database is very important for GIS Data Capture. Selection of quality

More information

e-practicals: how to develop a virtual (chemistry) lab class

e-practicals: how to develop a virtual (chemistry) lab class e-practicals: how to develop a virtual (chemistry) lab class Youcef Mehellou: Lecturer in Medicinal Chemistry Sam Butterworth: Lecturer in Medicinal Chemistry Sarah Thomas: Pharmacist, NIHR Doctoral Research

More information

College of Science (CSCI) CSCI EETF Assessment Year End Report, June, 2017

College of Science (CSCI) CSCI EETF Assessment Year End Report, June, 2017 College of Science (CSCI) North Science 135 25800 Carlos Bee Boulevard, Hayward CA 94542 2016-2017 CSCI EETF Assessment Year End Report, June, 2017 Program Name(s) EETF Faculty Rep Department Chair Chemistry/Biochemistry

More information

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007 Computational Chemistry in Drug Design Xavier Fradera Barcelona, 17/4/2007 verview Introduction and background Drug Design Cycle Computational methods Chemoinformatics Ligand Based Methods Structure Based

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

Kernels for small molecules

Kernels for small molecules Kernels for small molecules Günter Klambauer June 23, 2015 Contents 1 Citation and Reference 1 2 Graph kernels 2 2.1 Implementation............................ 2 2.2 The Spectrum Kernel........................

More information

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008 Transformers f off ChemAxon By György Pirok Content Standardization Virtual Reactions Metabolism M b li P Prediction di i Fragmentation 2 1 Standardization http://www.chemaxon.com/jchem/doc/user/standardizer.html

More information

Enzyme Enzymes are proteins that act as biological catalysts. Enzymes accelerate, or catalyze, chemical reactions. The molecules at the beginning of

Enzyme Enzymes are proteins that act as biological catalysts. Enzymes accelerate, or catalyze, chemical reactions. The molecules at the beginning of Enzyme Enzyme Enzymes are proteins that act as biological catalysts. Enzymes accelerate, or catalyze, chemical reactions. The molecules at the beginning of the process are called substrates and the enzyme

More information

Bioinformatics Workshop - NM-AIST

Bioinformatics Workshop - NM-AIST Bioinformatics Workshop - NM-AIST Day 3 Introduction to Drug/Small Molecule Discovery Thomas Girke July 25, 2012 Bioinformatics Workshop - NM-AIST Slide 1/44 Introduction CMP Structure Formats Similarity

More information

CHEMISTRY (CHEM) CHEM 5. Chemistry for Nurses. 5 Units. Prerequisite(s): One year high school algebra; high school chemistry

CHEMISTRY (CHEM) CHEM 5. Chemistry for Nurses. 5 Units. Prerequisite(s): One year high school algebra; high school chemistry Chemistry (CHEM) 1 CHEMISTRY (CHEM) CHEM 1A. General Chemistry I. 5 Units Prerequisite(s): High school chemistry and college algebra; sufficient performance on the college algebra diagnostic test, or equivalent;

More information

Searching Substances in Reaxys

Searching Substances in Reaxys Searching Substances in Reaxys Learning Objectives Understand that substances in Reaxys have different sources (e.g., Reaxys, PubChem) and can be found in Document, Reaction and Substance Records Recognize

More information