The Case for Use Cases

Similar documents
The shortest path to chemistry data and literature

Reaxys Medicinal Chemistry Fact Sheet

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Reaxys Pipeline Pilot Components Installation and User Guide

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE

CSD. Unlock value from crystal structure information in the CSD

SciFinder Premier CAS solutions to explore all chemistry MethodsNow, PatentPak, ChemZent, SciFinder n

Chemists are from Mars, Biologists from Venus. Originally published 7th November 2006

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

Reaxys The Highlights

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

TRAINING REAXYS MEDICINAL CHEMISTRY

How to Create a Substance Answer Set

Patent Searching using Bayesian Statistics

Chemically Intelligent Experiment Data Management

Data Mining in the Chemical Industry. Overview of presentation

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Chemical Data Retrieval and Management

Structure and Reaction querying in Reaxys

Practical QSAR and Library Design: Advanced tools for research teams

In Silico Investigation of Off-Target Effects

Demonstration: Searching patents based on chemical structure using SciFinder

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Introducing a Bioinformatics Similarity Search Solution

JCICS Major Research Areas

Navigating between patents, papers, abstracts and databases using public sources and tools

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Design Drugs Collaboratively Using Spotfire Visualization and Analysis

Integrated Cheminformatics to Guide Drug Discovery

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Product Guide. Thermo Scientific Cellomics HCS Solution

Introduction to IsoMAP Isoscapes Modeling, Analysis, and Prediction

INTRODUCTION TO GEOGRAPHIC INFORMATION SYSTEM By Reshma H. Patil

Stephen McDonald, Mark D. Wrona, Jeff Goshawk Waters Corporation, Milford, MA, USA INTRODUCTION

DRUG DISCOVERY TODAY ELN ELN. Chemistry. Biology. Known ligands. DBs. Generate chemistry ideas. Check chemical feasibility In-house.

How IJC is Adding Value to a Molecular Design Business

Status of implementation of the INSPIRE Directive 2016 Country Fiches. COUNTRY FICHE Ireland

Unlocking the potential of your drug discovery programme

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

Searching Substances in Reaxys

Analytical data, the web, and standards for unified laboratory informatics databases

Introduction to GIS I

Ákos Tarcsay CHEMAXON SOLUTIONS

Using Web Technologies for Integrative Drug Discovery

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Course Syllabus. offered by Department of Chemistry with effect from Semester B 2017/18

Data Quality Issues That Can Impact Drug Discovery

Reaxys Improved synthetic planning with Reaxys

Geodatabase An Introduction

SciFinder The choice for chemistry research

The MDL Discovery Framework: Data and Application Integration in the Life Sciences

DP Project Development Pvt. Ltd.

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

Status of implementation of the INSPIRE Directive 2016 Country Fiches. COUNTRY FICHE Malta

Compounding insights Thermo Scientific Compound Discoverer Software

Spatial Data Management of Bio Regional Assessments Phase 1 for Coal Seam Gas Challenges and Opportunities

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Kalexsyn Overview Kalexsyn, Inc Campus Drive Kalamazoo, MI Phone: (269) Fax: (269)

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

Perseverance. Experimentation. Knowledge.

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

IUCLID Substance Data

Technical Specifications. Form of the standard

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE

Introduction. OntoChem

AUTOMATED METERED WATER CONSUMPTION ANALYSIS

My Career in the Pharmaceutical Industry

SCULPT 3.0. Using SCULPT to Gain Competitive Insights. Brings 3D Visualization to the Lab Bench SPECIAL REPORT. 4 Molecular Connection Fall 1999

DiscoveryGate. One place for the answers you need. Chulalongkorn University, Thailand

Discovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager

Land-Line Technical information leaflet

NEW CONCEPTS - SOIL SURVEY OF THE FUTURE

a system for input, storage, manipulation, and output of geographic information. GIS combines software with hardware,

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

Bob Stembridge Patent analysis and graphene 2020 programme

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

ACD/Labs Software Impurity Resolution Management. Presented by Peter Russell

Exascale I/O challenges for Numerical Weather Prediction

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Pipeline Pilot Integration

ArcGIS. for Server. Understanding our World

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Drillworks. DecisionSpace Geomechanics DATA SHEET

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Orbital Insight Energy: Oil Storage v5.1 Methodologies & Data Documentation

MultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

Biologically Relevant Molecular Comparisons. Mark Mackey

Extracting Knowledge from Reaction Databases: Developments from InfoChem

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

Transcription:

The Case for Use Cases The integration of internal and external chemical information is a vital and complex activity for the pharmaceutical industry. David Walsh, Grail Entropix Ltd

Costs of Integrating Internal and It is expensive! External Data Generating quality, comprehensive external data is expensive. Finding data that can be integrated is expensive External chemical data needs to be accurate, comprehensive and contain metadata (e.g. ownership and bioactivity (qualitative/ quantitative)

It s difficult, so Why? Deriving knowledge on competitor company s molecular approaches is valuable competitor intelligence Determination of the creative processes and medicinal chemistry rules that drive lead optimisation is important to assist internal drug discovery processes

Why? Describes what has already formed part of prior art Allows decisions to be made on freedom to operate Describes areas of chemical space that is congested, affecting issues of lead optimisation.

Why? Allows modelling of molecular properties in a comparable way to internal molecules Determination of comparable physicochemical properties between internal and external molecules

How? It is important to provide externally derived molecules in formats which are capable of being integrated with internally derived molecules It may also be important to recognise that internal and external molecules are essentially different.

Pfizer Use Case Study I will make some reference to Pfizer s use cases, which were derived up to 3 years ago, and may have been modified and amended over time... Requirements were for a large molecule collection of exemplified molecules from patents, supplemented by data on pharmaceutically relevant exemplified molecules An investigation of available approaches to investigate Markush space.

Status Value Exemplified Compounds from Patents IBM Database Chemical structures and meta data extracted from the document text of WO, EP, and USPTO documents (1976 to present, 4.5 million unique patents) Automated updating of internal DB as new documents are published (2-4 week lag) Consortium with 9 pharmaceutical and chemical companies - reduces costs and drives technical improvements Image to structure conversions for all patents back to 2000 Pfizer owns the data (8 million unique structures) that can be used and stored as project teams see fit Data can be easily integrated with internal and external datasets and manipulated by internal tools Downsides Automated process: errors from poor name to structure conversions and junk can creep into data stream Only structures that are named in the text are indexed for patents pre 2000

Exemplified Compounds from Patents Status Value GVK Biosciences GoStar Database Chemical structures and bioassay data hand curated from patents and journals Database is updated quarterly Pfizer licenses the database, but does not own the contents GVK GoStar database integrated into CCT protocols for patent analysis High quality structures and bioassay data Little overlap with structures in IBM patent database Data can be easily integrated with internal datasets and manipulated by internal tools Downsides Limited coverage of the patent literature (about 1/10 th that of IBM database)

Cross-disciplinary Teams It is also important to develop cross disciplinary teams to develop use cases across a range of discovery chemists, informatics and patent practitioners, as their combined requirements are important to maximise the use of information and their individual usage or data handling skills and knowledge are vastly different. The cross disciplinary teams are capable of extending the variety of use cases, as well as extracting the meta knowledge of data sources and perspectives for using chemical information, and also allows ownership to be distributed between silos in the organisation.

Cross Disciplinary teams Medicinal chemists Information Scientists Patent Attorneys Cheminformatics People who have experience of Chemical and Patent databases People who have skills in patents, chemical nomenclature People who have experience of the structure and nature of documents

Cross Disciplinary Teams Advocates within various Drug Discovery teams, who can assist colleagues with the development and acceptance of new tools.

Use Cases 1 Integrate third-party and in-house sources of data from chemical patents Chemical structures, text, and numeric data (biological data, where available; clinical data) Incorporate new and delete existing data sources (anticipate change) Operate within third-party guidelines for acceptable use of data Easily updatable and adaptable systems Concordance between Pfizer patents and bioassay on those compounds

Patent chemistry landscape: Use Cases 3 1) Identify structures (exemplified and Markush) in the patent literature (and the RIF) that are relevant to a TA project series. Include external biological data, if available. 2) Inform potential IP overlap of a project s compounds/series against the entire patent landscape (exemplified and Markush). Identify and visualize top ranked patents and molecular feature overlap (patent ranking and feature analysis toolset). 3) Given a list of PF numbers, identify patents which claim or disclose these Pfizer compounds. Target chemistry landscape: Identify biologically relevant molecules (exemplified and Markush) that have activity against a target of interest, or class (family) of targets. Support a workflow to identify key compounds for follow-up and the most similar matches within the file. Provide visualization of property and chemical space landscape including data and assignee information (with option to include internal compounds) to facilitate analysis of competitor s activity in the target space. Patent alerts: Identify patents by similarity (or substructure) to a TA project series, or target, indication, or research area. Implement a push deliver system to automatically alert user as relevant info is published

Patent chemistry landscape Scope: Identify structures in the RIF and chemical patents that are similar to a TA project series. Include biological data, if available. Inputs: Hit structures for a project Outputs: Related structures in the RIF and patent literature with related biological data Process: Similarity and/or substructure searching against internal and external structure databases. Join the structures retrieved from the internal and external databases with other non-structure data (activity data, patent number, patent assignee, filing date, data source, etc.). Need to consider tautomers, stereochemistry, differences in fingerprints applied to similarity searching in internal versus external data sources. Presentation of the output in a format compatible with the desktop applications (e.g., PCAT, Spotfire, MoViT).

Current tools and proposed integration with Markush Get patent structures from IDs Chemistry Landscape Substructure search Similarity search Exact-match search **2. Markush patent chemistry space Patent Ranking Rank order a series of patents by feature similarity to a query structure Target Landscape Patents related to a specified target and/or query structure Pfeda *1. Exemplified IBM GVK *IBM database of 8 million structures from patents *GVK database of 3 million structures from patents **Thomson-Reuters database of 1.2 million Markush from patents Feature Analysis Detailed analysis and visualization of substructural features within patent structures compared with those for a set of query structures

Information Overload Information Overload More than 70,000 chemical patents are filed every year around the world Is there an easier way to identify those patents that have pharmaceutical relevance and eliminate the redundancy between equivalent patents?

Search Pfizer patents: Concordance between Structure IDs and the patents where they are published Scope: Given a list of PF numbers, identify patents which cover these Pfizer compounds. Inputs: A list of PF compound Ids, or PF structures Outputs: The PF compounds and patents which cover these compounds Process: Search for the presence of the PF compounds within a database of curated patent structures (e.g., IBM patent database). This is an exact structure match. Need to account for tautomers, stereoisomers when performing the search. Important in companies that have acquired other companies!!

Patent Alerts Scope: Identify patents by similarity to a TA project series, or target, indication, or research area. Inputs: A series of compounds from a TA project, or a target, indication, or research area Outputs: A formatted summary of patents relevant to the search input.

Project Applications: Chemistry Landscape A Pan-Trk project: Analysis of IP in evaluating a second series. Num of Unique Patents and Assignees Num Similarity in Bins: 90, 80, 70, 60 B Substructure and Similarity Search C Feature Analysis on Identified Patent IDs Currently investigating alternatives to substructure and similarity searching (Bayesian models) to improve identification of patents of interest.

clogp Company X Property Landscaping How do the properties of our chemical matter relate to the external environment? Pfizer 3 Company X patents have very few compounds in desired property space 75 Pfizer strategy: reduce compound lipophilicity and TPSA PSA 75 cmdr BAAB <3 >3 Filing Date

Substructure Search, Similarity and Feature Analysis Substructure and Similarity are well known concepts. Is there a need for other methods of relatedness based on connectivity? One such is Feature Analysis driven by the need to formalise the markush definitions within patents by regression to superatoms and why they cover molecules of relevance.