A Journey from Data to Knowledge

Similar documents
Scientific Integrity: A crystallographic perspective

Chemical structure representation challenges encountered when curating the CSD

Crystallographic Databases: Using the Knowledge Available. Amy Sarjeant CCCW-16, Saint Mary s University, Halifax NS

Organometallics & InChI. August 2017

CSD. Unlock value from crystal structure information in the CSD

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

The National Chemical Database Service. Serin Dabb Executive Editor, Data

Tautomerism in chemical information management systems

The Cambridge Structural Database (CSD) a Vital Resource for Structural Chemistry and Biology Stephen Maginn, CCDC, Cambridge, UK

Navigating between patents, papers, abstracts and databases using public sources and tools

Recent activities around the crystallography open databases COD, PCOD and P2D2

1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?!

Crystallographic education and research in the developing world: Experiences in DR Congo

Generating Small Molecule Conformations from Structural Data

Dictionary of ligands

Chemical Journal Publishing in an Online World. Jason Wilde, Publisher Physical Sciences Nature Publishing Group ACS Spring Meeting 2009

Validation of Experimental Crystal Structures

Analyzing Molecular Conformations Using the Cambridge Structural Database. Jason Cole Cambridge Crystallographic Data Centre

Other Related IUPAC RDA Updates

The Global Statistical Geospatial Framework and the Global Fundamental Geospatial Themes

Introduction Molecular Structure Script Console External resources Advanced topics. JMol tutorial. Giovanni Morelli.

Analytical data, the web, and standards for unified laboratory informatics databases

SpringerMaterials The Landolt-Börnstein Database

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

Investigating crystal engineering principles using a data set of 50 pharmaceutical cocrystals

Biologically Relevant Molecular Comparisons. Mark Mackey

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

Dock Ligands from a 2D Molecule Sketch

Spatial Data Infrastructure Concepts and Components. Douglas Nebert U.S. Federal Geographic Data Committee Secretariat

Database and Knowledge Base developments at IAEA A+M Unit

GC and CELPP: Workflows and Insights

Extended Structures, Crystallography and Polymers - Challenges

Ministry of Health and Long-Term Care Geographic Information System (GIS) Strategy An Overview of the Strategy Implementation Plan November 2009

Exploring symmetry related bias in conformational data from the Cambridge Structural Database: A rare phenomenon?

Web Search of New Linearized Medical Drug Leads

EMMA : ECDC Mapping and Multilayer Analysis A GIS enterprise solution to EU agency. Sharing experience and learning from the others

Garib N Murshudov MRC-LMB, Cambridge

GST Geosciences in Space and Time

Protein Structure Prediction and Display

Molecular Crystals And Molecules (Physical Chemistry Monograph Volume 29) By A.I. Kitaigorodsky READ ONLINE

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

THE LIVING PUBLICATION EXECUTIVE SUMMARY MAY 2012

The Time is Right to Commit to Use International Standards. Empowering Australia with Spatial Information

Chemical Data Retrieval and Management

Chemistry in Bioinformatics

Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description. Version 3.0, December 1, 2006 Updated to Version 3.

2007 / 2008 GeoNOVA Secretariat Annual Report

Data File Formats. There are dozens of file formats for chemical data.

Introduction to Chemoinformatics and Drug Discovery

Swedish National Data Services (SND), the OAIS reference model and archaeological data

GBIF Global Biodiversity Information Facility. Dmitry Schigel, Tim Hirsch Arctic data and Global Biodiversity Information Facility

Open PHACTS Explorer: Compound by Name

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Integration for Informed Decision Making

Evaluation of Crystallographic Data with the Program DIAMOND

The Case for Use Cases

Performing a Pharmacophore Search using CSD-CrossMiner

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Land-Line Technical information leaflet

Geospatial Preservation: State of the Landscape. A Quick Overview. Steve Morris NCSU Libraries

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Data Quality Issues That Can Impact Drug Discovery

Contributions should be sent to the Editorial Board of the Russian language Journal (not Springer), at the following address:

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

Brazil Paper for the. Second Preparatory Meeting of the Proposed United Nations Committee of Experts on Global Geographic Information Management

Searching Substances in Reaxys

SPECCHIO for Australia: taking spectroscopy data from the sensor to discovery for the Australian remote sensing community

GEO Geohazards Community of Practice

Bioinformatics. Dept. of Computational Biology & Bioinformatics

π-hole interactions involving nitro compounds: directionality of nitrate esters Antonio Báuza, a Antonio Frontera*,a and Tiddo J.

RDA/IUPAC Interest Group Draft Charter Template. Name of Proposed Interest Group: Chemistry Research Data Interest Group

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

The shortest path to chemistry data and literature

Preparing a PDB File

Curating and Maintenance of the Sediment Library and Dredge Collection

Integrated Cheminformatics to Guide Drug Discovery

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Identifying Interaction Hot Spots with SuperStar

Estonian approach to implementation of INSPIRE directive. Sulev Õitspuu Head of Bureau of Geoinfosystems Estonian Land Board

Journal of e-media Studies Volume 3, Issue 1, 2013 Dartmouth College

Ligand Scout Tutorials

GIS at UCAR. The evolution of NCAR s GIS Initiative. Olga Wilhelmi ESIG-NCAR Unidata Workshop 24 June, 2003

Overview of Geospatial Open Source Software which is Robust, Feature Rich and Standards Compliant

Kentucky Collaborates in GeoMAPP Project: The Advantages and Challenges of Archiving in a State with a Centralized GIS

Quality Assurance Plan. March 30, Chemical Crystallography Laboratory

In silico pharmacology for drug discovery

User Guide for LeDock

ArcGIS for Desktop. ArcGIS for Desktop is the primary authoring tool for the ArcGIS platform.

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

U N I V E R S I T A S N E G E R I S E M A R A N G J U L Y, S U H A R T M A I L. U N N E S. A C. I D E D I T O R D O A J

Structure Validation in Chemical Crystallography with CheckCIF/PLATON

Office of Technology Partnerships GIS Collaboration

Transcription:

A Journey from Data to Knowledge Ian Bruno Cambridge Crystallographic Data Centre @ijbruno @ccdc_cambridge

Experimental Data C 10 H 16 N +,Cl - Radspunk, CC-BY-SA CC-BY-SA Jeff Dahl, CC-BY-SA Experimentally determined 3D coordinates of atoms Arrangement of molecules in the crystal Captured in a CIF file MIYCUT

Crystallographic Information Framework (CIF) Standard format for capturing data about structure and experiment Maintained by the International Union of Crystallography (IUCr) Widely adopted by crystallographic and publishing communities

Crystal Structure Deposition Researchers deposit CIFs with CCDC Deposition services provide basic validation checks Depositor receives a unique accession ID Pre-publication access for trusted reviewers

Data Deposition Challenges Syntax errors Inaccuracies Revisions Republications Timely release Over 60,000 structures deposited annually

Crystal Structure Dissemination On publication, deposited data sets freely available for anyone to download Individual structures accessible via CCDC Summary Page* Web services enable structure lookup and retrieval * New Summary Page scheduled for launch December 2014

Linking articles and data Links are in place for ACS, RSC, Elsevier and IUCr journals

DOIs and Data Citation Implemented Deployed 2013 2014 Unambiguous and persistent identification of datasets Over 500,000 DOIs registered since April 2014 Foundation for formalising data citation and interoperability 10.5517/CCPHZ37 Data should be considered legitimate, citable products of research https://www.force11.org/datacitation Dataset Publication CCDC 892348: Experimental Crystal Structure Determination. A. Crystallographer, Cambridge Crystallographic Data Centre (2013) http://dx.doi.org/10.5517/ccyykfv Potential for linking between derived data at CCDC and raw data stored at STFC based on DOIs. Coverage of CCDC data by the Thomson Reuters Data Citation Index to be achieved via the DataCite metadata store.

Assigning Chemistry Data deposited with CCDC Entry in Cambridge Structural Database Unambiguous identification of datasets Correction of syntax errors Provenance and attribution Assignment of chemistry Additional scientific data and metadata Review by editorial staff Assignment of chemistry is required to make data findable, interoperable and reusable

Assigning Chemistry Data deposited with CCDC Entry in Cambridge Structural Database Unambiguous identification of datasets Correction of syntax errors Provenance and attribution Assignment of chemistry Additional scientific data and metadata Review by editorial staff Assignment of chemistry is required to make data findable, interoperable and reusable

Bulk identification of links between ChemSpider molecules and CCDC Crystal Structures will be facilitated by InChIs. Other opportunities: PubChem UniChem (EBI) Wikipedia Chemical Abstracts Standard InChI: InChI=1S/C33H36O6/c1-7-10-37-31-19- 25-13-23-17-29(35-5)33(39-12-9-3)21-27(23)15-24-18-30(36-6)32(38-11-8-2)20-26(24)14-22(25)16-28(31)34-4/h7-9,16-21H,1-3,10-15H2,4-6H3 Standard InChIKey: IZHKSTHBLQRIOW-UHFFFAOYSA-N

Knowledge-based Solutions Provide insights into molecular dimensions and shape molecular interactions Applicable to drug design and development design of new materials crystal engineering structure validation

Crystal Structure Knowledge Helps Drug Design PDB 1BKF complexed with PDB Ligand FK5 CSD FINWEE10 overlayed on PDB Ligand FK5 Tacrolimus: An immunosuppressive drug. Also used in the treatment of skin conditions. Understanding factors that influence the shape of molecules helps identify better drug candidates Molecular Recognition of Protein Ligand Complexes: Applications to Drug Design. R.Babine and S. Bender. Chem. Rev. (1997) doi:10.1021/cr960370z

Crystal Structure Knowledge Mitigates Risk Different crystal forms, different interactions, different solubility, different stability. Knowing the likelihood of specific molecular interactions occurring helps assess the risk of undesirable crystal formation Knowledge-Based H-Bond Prediction to aid experimental polymorph screening. P.Galek et al. CrystEngComm. (2009) doi:10.1039/b910882c

Experimental Data C 10 H 16 N +,Cl - Radspunk, CC-BY-SA CC-BY-SA Jeff Dahl, CC-BY-SA Structural Knowledge

Experimental Data C 10 H 16 N +,Cl - Radspunk, CC-BY-SA CC-BY-SA Jeff Dahl, CC-BY-SA Structural Knowledge

Crystal Structure Growth Growth of the CSD Cambridge Structural Database organic and metal-organic compounds 750,422 structures Established 1965 700,000 600,000 500,000 400,000 300,000 200,000 100,000 Inorganic Crystal Structure Database inorganic compounds 173,473 structures Growth of the PDB Protein Data Bank biological macromolecules 105,025 structures CSD: 18 Nov 2014; ICSD: 27 Oct 2014; PDB: 25 Nov 2014: Overall: 1,028,920

A Collaborative Journey Academic research community Instrument manufacturers Publishers Data repositories Industrial partners Scientific software vendors CC-BY-SA

We ve come a long way Essential foundations standard file formats rich metadata standard identifiers interoperable services sustainability RDA-WDS Data Publishing Activities Workflows Publishing Data Services Data Bibliometrics Cost Recovery

there are new roads ahead Essential foundations standard file formats rich metadata standard identifiers interoperable services sustainability RDA-WDS Data Publishing Activities Workflows Publishing Data Services Data Bibliometrics Cost Recovery Future opportunities unpublished data enhanced deposition services new publication pathways data metrics and impact

Time s Up! About your speaker: Name: Ian Bruno Company: Cambridge Crystallographic Data Centre Email: bruno@ccdc.cam.ac.uk Social Media: @ijbruno @ccdc_cambridge 10.5517/cc5jk3b 10.5517/cc4zfp6 10.5517/cc790j1 10.5517/ccnk62g 10.5517/ccr3zh9 10.5517/ccrvc86