PubChem data extraction and integration using Instant JChem. Oleg Ursu Cristian Bologa Tudor I. Oprea Division of Biocomputing

Similar documents
RoadRunner A publicly available bioactivity database

Integrated Cheminformatics to Guide Drug Discovery

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Ákos Tarcsay CHEMAXON SOLUTIONS

Pipeline Pilot Integration

Pipeline Pilot Integration

The PhilOEsophy. There are only two fundamental molecular descriptors

Using AutoDock for Virtual Screening

DRUG DISCOVERY TODAY ELN ELN. Chemistry. Biology. Known ligands. DBs. Generate chemistry ideas. Check chemical feasibility In-house.

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Ligand Scout Tutorials

The Schrödinger KNIME extensions

Reaxys Medicinal Chemistry Fact Sheet

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Early Stages of Drug Discovery in the Pharmaceutical Industry

The Schrödinger KNIME extensions

Practical QSAR and Library Design: Advanced tools for research teams

Command-line tools of ChemAxon: tips and tricks

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

TRAINING REAXYS MEDICINAL CHEMISTRY

The Schrödinger KNIME extensions

Receptor Based Drug Design (1)

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

LIBRARY DESIGN FOR COLLABORATIVE DRUG DISCOVERY: EXPANDING DRUGGABLE CHEMOGENOMIC SPACE

Biologically Relevant Molecular Comparisons. Mark Mackey

Chemical Data Retrieval and Management

Introduction. OntoChem

In Silico Investigation of Off-Target Effects

Improving structural similarity based virtual screening using background knowledge

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Drug Informatics for Chemical Genomics...

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

How IJC is Adding Value to a Molecular Design Business

The Conformation Search Problem

Hit Finding and Optimization Using BLAZE & FORGE

Merck Virtual Library (MVL): Deployment, Application, and Future Enhancement

Introducing a Bioinformatics Similarity Search Solution

GCC E x h i b i t i o n N e w s l e t t e r. 8 th GERMAN CONFERENCE ON CHEMOINFORMATICS TOPICS

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Implementation of novel tools to facilitate fragment-based drug discovery by NMR:

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC )

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

Different conformations of the drugs within the virtual library of FDA approved drugs will be generated.

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Using the File Geodatabase API. Lance Shipman David Sousa

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Docking. GBCB 5874: Problem Solving in GBCB

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Virtual Screening: How Are We Doing?

Patent Searching using Bayesian Statistics

BLAST. Varieties of BLAST

Searching Substances in Reaxys

Introduction to Chemoinformatics and Drug Discovery

Differential Scanning Fluorimetry: Detection of ligands and conditions that promote protein stability and crystallization

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

MM-PBSA Validation Study. Trent E. Balius Department of Applied Mathematics and Statistics AMS

Medicinal Chemistry/ CHEM 458/658 Chapter 8- Receptors and Messengers

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Structure-Based Drug Discovery An Overview

Scale in the biological world

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Fast similarity searching making the virtual real. Stephen Pickett, GSK

The Rockefeller University Compound Library

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Using Self-Organizing maps to accelerate similarity search

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2:

13-3. Synthesis-Secretory pathway: Sort lumenal proteins, Secrete proteins, Sort membrane proteins

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

Aalto University 2) University of Oxford

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Karsten Vennemann, Seattle. QGIS Workshop CUGOS Spring Fling 2015

Part 6. 3D Pharmacophore Modeling

Supplementary Material

Roadblocks in HTS Assay Development

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Interrogation of small GTPase Activity. Screening GAPs and GEFs with the PHERAstar FSX from BMG LABTECH and Transcreener Assays from BellBrook Labs

Structure based drug design and LIE models for GPCRs

Portal. User Guide Version 1.0. Contributors

Mnova Software Tools for Fragment-Based Drug Discovery

User Guide for LeDock

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

In silico pharmacology for drug discovery

Tautomerism in chemical information management systems

An Integrated Approach to in-silico

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Ranking of HIV-protease inhibitors using AutoDock

OntoChem Software. Chemoinformatic Solutions for Life Sciences Problems

CSD. Unlock value from crystal structure information in the CSD

On InChI and evaluating the quality of cross-reference links

Molecular Dynamics Graphical Visualization 3-D QSAR Pharmacophore QSAR, COMBINE, Scoring Functions, Homology Modeling,..

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Transcriptome analysis of a wild bird reveals physiological responses to the urban environment

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Transcription:

PubChem data extraction and integration using Instant JChem Oleg Ursu Cristian Bologa Tudor I. Oprea Division of Biocomputing

PubChem - why not? Custom SQL queries Pipelining with custom in house or commercial tools Structure search not complete Integration with in house databases Speed of access/queries

PubChem structure search July 24, 2008 (and earlier)

DrugBank structure search tool July 24, 2008 (and earlier)

Integration with other databases Quickly identify HTS hits activity on other target(s)/assays Is there any relationship between my target and other targets in a cell based assay? Profile compounds activity on other than PubChem assay data WOMBAT records overlap with MLSMR ~9000 WOMBAT unique compounds overlap with MLSMR ~ 1700

Small Molecules Repository Large subset of PubChem libraries tested in multiple assays The same supplier, multiple centers including NMMLSC Need to pipeline/automate post HTS analysis for multiple targets Integration with WOMBAT in house database

Building IJC database Download the MLSMR library from PubChem Download assays data and description from PubChem FTP Extract and prepare data from assays files Design and create database tables, relationships and forms

Structures import PubChem limit for download 250,000 structures 2 download batches Clean up using ChemAxon Standardizer with the following configuration

Database creation Import WOMBAT RDF file Assign PUBCHEM_SUBSTANCE_ID to structures in WOMBAT present MLSMR library Import MLSMR library checking for duplicate structures already present in SUBSTANCES table Import PubChem assay data Import PubChem assay description data

PubChem structures import

Assays import CSV file assay test data XML file assay description data Shell script to process and extract assays data XMLStarlet Command Line XML Toolkit to query and extract assays description data (http://xmlstar.sourceforge.net/) Examples: $ xml sel -t -m //PC-AssayDescription -v PC-AssayDescription_name 761.descr.xml > HTS to identify specific small molecule inhibitors of Ras and Ras-related GTPases specifically Cdc42 wildtype $ xml sel -t -m //PC-AssayTargetInfo -v PC-AssayTargetInfo_name -n 761.descr.xml > cell division cycle 42 (GTP binding protein, 25kDa) [Homo sapiens]

Entity relationships Link ACTIVITY table Data with SUBSTANCES table on PUBCHEM_SUBSTANCE_ID Link WOMBAT.ACT.LIST/WOMBAT.MO L.KW with WOMBAT SUBSTANCES table on SMDL.ID

PubChem (MLSMR) view

Wombat view

Analyzing HTS hits Cluster HTS hits, SAR relationships Profile HTS hits on PubChem assays and WOMBAT targets Virtual screening of commercial libraries using ROCS, FP, and docking

Integration with other tools GUI is nice, doesn t play well with other tools jcsearch, jcman ChemAxon command line tools, doesn t allow for join select SQL statements Designed custom search application based on JChemSearch object and JChem API

Using JChemSearch object

Pipelining db search with other tools Select active compounds in GTPases screening assays SQL filter: select distinct cd_id from substances,activity where activity.pubchem_substance_id=substances.pubchem_substance_i d and activity.aid in (757,758,759,760,761,764) and activity.activity_outcome=2 Pipelining search results to MCES based clustering tool $ db_search s get_actives.sql t substances pubchem_substance_id pubchem_ext_datasource_regid mcs -i - -mt 0.4 -o gtp.actives.meas -- out-type m -s 0.3 Apply MESA Analytics grouping module to the result measures matrix $ Clustering gtp.actives.meas -T 0.28 637 > cluster.28.out $ ClusterOutput cluster.28.out gtp.actives.smiles 2 T N > cluster.28.out.smiles Generate Omega conformations needed for ROCS screening $ db_search s get_actives.sql t substances pubchem_substance_id omega2 in - -out gtp.actives.confs.oeb.gz maxconfs 100

Selected Cluster

Similar compounds in MLSMR - Active in other assays

WOMBAT compounds

BIRT reporting framework Use the list of SIDs to create a report on PubChem assays profiling

Cluster compounds profile in PubChem and WOMBAT Active in GTPases assays Active in other PubChem assays # of compounds Tested Active Target(s) name 4 69 6 Rac1 protein GTP-binding protein (rab7) ras protein Ras-related protein Rab-2A. cell division cycle 42 (GTP binding protein, 25kDa) Rac1 protein 4 250 5 qhts Assay for Disrupters of an Hsp90 Co-Chaperone Interaction Catalytic epsilon subunit of the translation initiation factor eif2b, the guaninenucleotide exchange factor for eif2; activity& cytochrome P450, family 2, subfamily C, polypeptide 9 cytochrome P450, family 2, subfamily C, polypeptide 19 thyroid stimulating hormone receptor WOMBAT 5 6 6 DP; prostaglandin D2 receptor EP1; prostaglandin E2 receptor, EP1 subtype EP2; prostaglandin E2 receptor, EP2 subtype EP3; prostaglandin E2 receptor, EP3 subtype EP4; prostaglandin E2 receptor, EP4 subtype cpla2; cytosolic phospholipase A2; phospholipase A2 group IVA Total 13 317 17

ROCS screening 275 250 Plate1A02_000A-0214 Rac_act RawMCF 225 200 175 150-15 -9-8 -7-6 -5-4 -3 Log Compound Conc [M] BOTTOM TOP LOGEC50 HILLSLOPE EC50 Rac_act 240.1 191.3-6.990-0.9937 1.0237e-007 ROCS hit from ChemDiv library, dose response EC 50 =0.102 μm

Future plans Automatic synchronization with PubChem Integration with other databases: DrugBank, Protein Ligand Databases, EMBL-EBI, in house assay data, etc.

Acknowledgments ChemAxon OpenEye Eclipse project Division of Biocomputing at UNM

Division of Biocomputing at UNM Tudor Oprea Cristian Bologa Steve Mathias Jerome Abear Andrei Leitao Ramona Curpan Liliana Halip Jeremy Yang Niranjan Kumar Oleg Ursu