Scientific Integrity: A crystallographic perspective

Similar documents
A Journey from Data to Knowledge

Crystallographic Databases: Using the Knowledge Available. Amy Sarjeant CCCW-16, Saint Mary s University, Halifax NS

CSD. Unlock value from crystal structure information in the CSD

1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?!

Chemical structure representation challenges encountered when curating the CSD

Structural elucidation and physicochemical properties of mononuclear Uranyl(VI) complexes incorporating dianionic units

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

Supporting Information. Improved Synthesis of Fluticasone Propionate. Key Laboratory for Green Pharmaceutical Technologies and Related Equipment

Supplementary Information

Introduction to single crystal X-ray analysis VI. About CIFs Alerts and how to handle them

Wavelength (nm) Figure 1: Absorption spectra of ferrocene carboxaldehyde andferrocenyl 2,4- thiazolidinedione conjugates (1 and 2) in CH 2 Cl 2.

checkcif/platon report (publication check)

Structure Finalization: A Case Study

Quality Assurance Plan. March 30, Chemical Crystallography Laboratory

Why Crystal Structure Validation?

Exploring symmetry related bias in conformational data from the Cambridge Structural Database: A rare phenomenon?

Structure Validation in Chemical Crystallography with CheckCIF/PLATON

The Cambridge Structural Database (CSD) a Vital Resource for Structural Chemistry and Biology Stephen Maginn, CCDC, Cambridge, UK

Contributions should be sent to the Editorial Board of the Russian language Journal (not Springer), at the following address:

Kiran T Dhavskar & Bikshandarkoil R Srinivasan* Department of Chemistry, Goa University, Goa , India

Analyzing Molecular Conformations Using the Cambridge Structural Database. Jason Cole Cambridge Crystallographic Data Centre

Garib N Murshudov MRC-LMB, Cambridge

Spatial Data Infrastructure Concepts and Components. Douglas Nebert U.S. Federal Geographic Data Committee Secretariat

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

Dictionary of ligands

Refine & Validate. In the *.res file, be sure to add the following four commands after the UNIT instruction and before any atoms: ACTA CONF WPDB -2

Validation of Experimental Crystal Structures

The shortest path to chemistry data and literature

Organometallics & InChI. August 2017

Geospatial Preservation: State of the Landscape. A Quick Overview. Steve Morris NCSU Libraries

Generating Small Molecule Conformations from Structural Data

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

Large Scale Mapping Policy for the Province of Nova Scotia

GIS ADMINISTRATOR / WEB DEVELOPER EVANSVILLE-VANDERBURGH COUNTY AREA PLAN COMMISSION

Internal Audit Report

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

Computational Biology, University of Maryland, College Park, MD, USA

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball

checkcif/platon (full publication check)

Cages on a plane: a structural matrix for molecular 'sheets'

Pipelining Ligands in PHENIX: elbow and REEL

Department of Astrophysical Sciences Peyton Hall Princeton, New Jersey Telephone: (609)

Other Related IUPAC RDA Updates

Molecular Graphics. Molecular Graphics Expt. 1 1

Crystallographic education and research in the developing world: Experiences in DR Congo

Geological information for Europe : Towards a pan-european Geological Data Infrastructure

Selection criteria for the preservation of e-prints

The PLATON checkcif and SQUEEZE Tools

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

GIS Capability Maturity Assessment: How is Your Organization Doing?

Status of implementation of the INSPIRE Directive 2016 Country Fiches. COUNTRY FICHE Netherlands

UN-GGIM: Strengthening Geospatial Capability

Performing a Pharmacophore Search using CSD-CrossMiner

electronic reprint 2-Hydroxy-3-methoxybenzaldehyde (o-vanillin) revisited David Shin and Peter Müller

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Galactic Census: Population of the Galaxy grades 9 12

metal-organic compounds

X-ray Crystallography

Refinement of Disorder with SHELXL

Full wwpdb X-ray Structure Validation Report i

Validating a small-unit-cell structure; understanding checkcif reports

Copernicus Big Data Workshop, Brussels, Belgium

Neighbourhood Planning in Haringey. Myddleton Road Strategic Group 7 th November 2013

South African Research Data Infrastructure. Roadmap Biodiversity Data

The purpose of this report is to recommend a Geographic Information System (GIS) Strategy for the Town of Richmond Hill.

Agenda. Status of GI activities. NGII Framework. SDI from the national policy perspective

= (8) V = (8) Å 3 Z =4 Mo K radiation. Data collection. Refinement. R[F 2 >2(F 2 )] = wr(f 2 ) = S = reflections

INSPIRE Monitoring and Reporting Implementing Rule Draft v2.1

Basic Dublin Core Semantics

metal-organic compounds

Assignment A02: Geometry Definition: File Formats, Redundant Coordinates, PES Scans

CALIFORNIA INSTITUTE OF TECHNOLOGY BECKMAN INSTITUTE X-RAY CRYSTALLOGRAPHY LABORATORY

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

U N I V E R S I T A S N E G E R I S E M A R A N G J U L Y, S U H A R T M A I L. U N N E S. A C. I D E D I T O R D O A J

wwpdb X-ray Structure Validation Summary Report

Spatially Enabled Society

Chem 253. Tutorial for Materials Studio

2-Methoxy-1-methyl-4-nitro-1H-imidazole

Supplementary Information. Single Crystal X-Ray Diffraction

Dock Ligands from a 2D Molecule Sketch

Homeland Security Geospatial Data Model. Mark Eustis SAIC Joe Kelly Traverse Technologies 21 February, 2008

Orthorhombic, Pbca a = (3) Å b = (15) Å c = (4) Å V = (9) Å 3. Data collection. Refinement

Manipulating Ligands Using Coot. Paul Emsley May 2013

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Investigating crystal engineering principles using a data set of 50 pharmaceutical cocrystals

APPENDIX E. Crystallographic Data for TBA Eu(DO2A)(DPA) Temperature Dependence

ion, as obtained from a search of the Cambridge Structural database (CSD), December 2013.

Alluvium Consulting Australia Senior integrated water management specialist Position Description March 2018

Chemistry. Course Description. Rationale. Prerequisite. Measurable Learning Outcomes SCI1100

Integrated Cheminformatics to Guide Drug Discovery

metal-organic compounds

Dr. LeGrande M. Slaughter Chemistry Building Rm. 307E Office phone: ; Tues, Thurs 11:00 am-12:20 pm, CHEM 331D

What s the problem? A Modern Odyssey in Search of Relevance. The search for relevance. Some current drivers for new services. Some Major Applications

The Current Status of EarthCube with an EarthScope Perspective. Tim Ahern IRIS Director of Data Services

GIS for Crime Analysis. Building Better Analysis Capabilities with the ArcGIS Platform

Version 1.2 October 2017 CSD v5.39

NEW CONCEPTS - SOIL SURVEY OF THE FUTURE

Briefing. H.E. Mr. Gyan Chandra Acharya

Transcription:

Scientific Integrity: A crystallographic perspective Ian Bruno - Director, Strategic Partnerships The Cambridge Crystallographic Data Centre @ijbruno @ccdc_cambridge Scientific Integrity: Can We Rely on the Published Scientific Literature? 250 th ACS National Meeting & Exposition, August 16-20 2015, Boston, MA www.ccdc.cam.ac.uk 1

http://blogs.scientificamerican.com/absolutely-maybe/generation-open-sneak-peek-into-science-8217-s-future-at-opencon-2014/ 2

http://researchdata.ox.ac.uk/2014/09/30/report-from-the-research-data-alliance-plenary-meeting-no-4/ The research paper should be considered supplementary to the data Publications are not simply containers for data but rather arguments that are supported by data Barend Mons (Leiden) Christine Borgman (UCLA) supplementary (ˌsʌp ləˈmɛn tə ri) 1. Forming or acting as a supplement. http://www.thefreedictionary.com/supplementary supplement (sŭp lə-mənt) 1. Something added to complete a thing, make up for a deficiency, or extend or strengthen the whole. Ergo: A scientific article without supplementary data is incomplete, deficient or weak! 3

Can we rely on the published scientific literature? Can we rely on the published scientific data? 4

Crystal Structure Databases Cambridge Structural Database organic and metal-organic compounds 790,040 structures Growth of the CSD 5

Data Deposition and Access CIF file CCDC Structure Summary Page Many journals require derived data to be deposited with the CCDC prior to publication Data files are available to reviewers pre-publication and to everyone post-publication http://www.ccdc.cam.ac.uk/getstructures 6

Scientific Validation: checkcif http://checkcif.iucr.org/ Checks consistency and integrity of the data Generates alerts indicating issues that should be corrected or explained Can be run interactively via a web form A checkcif API is now also available 7

Publisher Policies: checkcif Most publishers require checkcif to be run prior to submission Some require report to be uploaded - others for it to be retained Some specifically request a PDF of the checkcif report Stringency varies depending on the journal Some require certain alerts to be justified checkcif reports go to the publisher, data files go to the CCDC Based on a review of Author Guidleines, June 2014 8

checkcif Validation Responses Voids due to exclusion of unknown solvent checkcif alerts and researcher response can be embedded in CIFs Disorder in counter-ion CCDC 813412 9

checkcif comments in the CCDC CIF Repository Look for data items beginning _vrf (Validation Response Form) Subset of around 480,000 deposited CIFs Around 8,000 CIFs contain validation responses (~1.5%) Indicates the number of CIFs where checkcif comments have been added at the point they are deposited with the CCDC. Not necessarily a reflection of how often checkcif is run. Frequently observed explanations for common alerts: Disorder Quality of sample Weak diffraction Limited beam time Water hydrogens hard to locate Modelling of solvent molecules Restraint strategy used to refine model Twinned pyrite crystal "Pyrite 60608" by Vassil. Licensed under Public Domain via Wikimedia Commons 10

Opportunities checkcif provides useful information about deposited datasets separate steps required to deposit and run checkcif not often obvious if checkcif has been run response to checkcif alerts can be revealing but largely hidden Can we make it easier for authors to satisfy journal requirements? make it easier for referees to access checkcif reports? remove uncertainty over whether checkcif has been used? make value added through responses more visible? 11

Recently Released http://www.ccdc.cam.ac.uk/deposit Uses new checkcif API 12

Level A Level B Level C Level G Most likely a serious problem - resolve or explain A potentially serious problem, consider carefully Check. Ensure it is not caused by an omission or oversight General information/check it is not something unexpected http://www.ccdc.cam.ac.uk/deposit 13

Recently Released Responses are included in the CIF being deposited. http://www.ccdc.cam.ac.uk/deposit Download of checkcif reports to be added soon Later hope to enable reviewers to run checkcif if depositor did not 14

Possible CSD-based Checks for Small Molecules Could extend/complement checkcif with: geometry check void analysis interaction analysis commonality of spacegroups Could also feedback to depositor about: other determinations of the same compound related structures (e.g. similarity search) 15

Can we rely on the published scientific literature? Can we rely on the published scientific data? Can we rely on knowledge-based analysis? 16

The CSD: Crystallography and Chemistry Provides understanding of molecular geometry and molecular interactions Enables structural knowledge to be applied to scientific problems Assignment of chemistry is required to make data findable, interoperable and reusable 17

Geometry Analysis ConQuest Search CSD 5.36: 3,435 hits Filters: None Mean angle: 121.9(18) o CSD 5.36: BEXYIO R-factor: 14.5% Angle: 109.29 o Atomic Displacement Parameters indicate uncertainty in the position of an atom - typically represented as ellipsoids. Ellipsoids of significantly different sizes may reflect problems in the structure. ADPs available in deposited CIFs but not yet in the CSD. 18

Search Filters CSD 5.36: MAMSUQ R-factor: 4.96% Angle: 128.6 o Without filters With filters 19

Automated Geometry Analysis Mogul compares the geometry of a 3D molecule against the CSD Aims to strike a balance between being too general and too specific Volume and severity of alerts important in drawing conclusions 20

CSD-based Validation of Protein Ligands CSD-based geometry checks included in PDB validation pipeline Bond lengths, bond angles, acyclic torsions and isolated rings are assessed by comparison with preferred molecular geometries derived from high-quality, smallmolecule structures in the Cambridge Structural Database (CSD). http://www.wwpdb.org/validation-reports.html 21

Can we rely on the published scientific literature? Can we rely on the published scientific data? Can we rely on knowledge-based analysis? Can we rely on research data repositories? 22

Trusted Repositories Researchers will expect guidance on how to select an appropriate repository for their data Standards and guidelines for repositories exist and include the Data Seal of Approval the repository selection process for Thompson-Reuters (sic) Data Citation Index Digital Curation Centre Trusted Repositories Audit and Certification (TRAC) program Potential starting points for a community standards discussion http://dx.doi.org/10.1371/journal.pbio.1001975 23

Repository Certification Various stamps of approval available The Data Seal of Approval ICSU World Data System nestor Seal (derived from DIN 31644) ISO 16363 (based on TRAC) From light-weight to heavy-duty: DSA and WDS self-certifying against ~16 criteria nestor Seal 34 criteria ISO 16363 70 pages, formal audit 24

Self Certification Criteria Criteria of DSA and WDS variously cover organizational framework - governance, sustainability data management - authenticity, integrity, accessibility technical infrastructure - support, security Evaluation procedures submissions are peer-reviewed emphasis on public documentation of procedures RDA DSA/WDS Audit and Certification Working Group explore and develop a DSA WDS partnership with the objectives of realizing efficiencies, simplifying assessment options, stimulating more certifications, and increasing impact on the community 25

Global Initiatives in Research Data Bring together researchers in the domain of chemistry for a discussion about the formation of an RDA Interest Group (IG) on Chemical Data. 251st ACS National Meeting & Exposition, San Diego CA March 13-17, 2016 Global initiatives in research data management and discovery How might the chemistry community might best engage with and learn from broader activities in research data management and discovery? 26

Can we rely on the published scientific literature? Can we rely on the published scientific data? Can we rely on knowledge-based analysis? Can we rely on research data repositories? 27

Concluding Thoughts checkcif greatly aids in the assessment of the quality of the data recent developments reduce barriers to running checkcif supplementary insights more likely to be visible in data files Knowledge-based analysis can provide additional insights potential for supplementing existing validation processes important to think about how alerts are used and presented The role of scientific data repositories is important deposition and access services that ensure supporting data is available enrichment of data to enable reuse in validation and other contexts 28

The Cambridge Crystallographic Data Centre International Data Repository Archive of crystal structure data High quality scientific database Scientific Software Provider Search/analysis/visualisation tools Scientific applications Collaborative Research Organisation New methodologies Fundamental research @ccdc_cambridge ccdc.cambridge http://www.ccdc.cam.ac.uk/ Ian Bruno Director, Strategic Partnerships bruno@ccdc.cam.ac.uk Thanks to Mike Hoyland and others at the IUCr for the checkcif API, advice and support. 29