Comprehensive DWPI SM structure searching using DCR and DWPIM on STN

Similar documents
DERWENT MARKUSH RESOURCE NOW AVAILABLE ON STN. BRIAN LARNER IP & SCIENCE 10 Dec 2015

Structure-based approaches to the indexing and retrieval of patent chemistry. Tim Miller Head of Research May 2010

Demonstration: Searching patents based on chemical structure using SciFinder

How to Create a Substance Answer Set

SEARCHING DWPI POLYMER INDEXING ON NEW STN

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

SEARCHING VALUE ADDED POLYMER INDEXING IN DWPI, DCR & DWPIM ON NEW STN BRIAN LARNER & JIM BROWN OCTOBER 2016

Searching Substances in Reaxys

Table of Contents. Scope of the Database 3 Searching by Structure 3. Searching by Substructure 4. Searching by Text 11

Basic Techniques in Structure and Substructure

Searching Inorganic Chemistry

Structure Searching in CrossFire Beilstein. DiscoveryGate SM Version 1.4 Participant s Guide

Exam 1 (Monday, July 6, 2015)

Unit 5 Test. Name: Score: 37 / 37 points (100%)

PSI Chemistry. 3) How many electron pairs does carbon share in order to complete its valence shell? A) 1 B) 2 C) 3 D) 4 E) 8

The shortest path to chemistry data and literature

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

Reaxys Pipeline Pilot Components Installation and User Guide

Searching CrossFire Beilstein Using DiscoveryGate. DiscoveryGate Version 2.2 Participant s Guide

Chapter 21: Hydrocarbons Section 21.3 Alkenes and Alkynes

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

Aliphatic Hydrocarbons Anthracite alkanes arene alkenes aromatic compounds alkyl group asymmetric carbon Alkynes benzene 1a

Introducing the New SciFinder. Veli Pekka Hyttinen Regional Marketing Manager, Central and Eastern Europe Jasna April 1, 2014

Search for Substance Data

Name Date Class HYDROCARBONS

Version 1.2 October 2017 CSD v5.39

Finding Polymer Information Part 2: Advanced. Dr. Thomas Haubenreich

Information Retrieval: SciFinder

POC via CHEMnetBASE for Identifying Unknowns

Chem 1075 Chapter 19 Organic Chemistry Lecture Outline

OFFICE Room 3268; Tel Chemistry for Changing Times, 14 th Edition, -- Fourth Custom Edition for CCRI - by Hill and McCreary Other Supplies

Organic Chemistry 112 A B C - Syllabus Addendum for Prospective Teachers

Open PHACTS Explorer: Compound by Name

Elsevier R&D Solutions. Tool Sheet. Exploring a chemical reaction

Organic Chemistry. Organic chemistry is the chemistry of compounds containing carbon.

TOK: The relationship between a reaction mechanism and the experimental evidence to support it could be discussed. See

Reaxys Medicinal Chemistry Fact Sheet

Biology Keystone (PA Core) Quiz The Chemical Basis for Life - (BIO.A ) Water Properties, (BIO.A ) Carbon, (BIO.A.2.2.

Thieme Chemistry E-Books

Teacher Instructions

A stand alone calculator (not part of your cell phone) (preferably a scientific calculator).

COURSE UNIT DESCRIPTION. Dept. Organic Chemistry, Vilnius University. Type of the course unit

Organic Chemistry. Introduction to Organic Chemistry

POC via CHEMnetBASE for Identifying Unknowns

2/25/2015. Chapter 4. Introduction to Organic Compounds. Outline. Lecture Presentation. 4.1 Alkanes: The Simplest Organic Compounds

NORTH CENTRAL HIGH SCHOOL NOTE & STUDY GUIDE. Honors Biology I

PETE 203: Properties of oil

Structure Drawing. March Use the New Features. Keyboard Shortcuts and Paste from ChemDraw

Topic 10 Organic Chemistry. Ms. Kiely IB Chemistry (SL) Coral Gables Senior High School

2/18/2013 CHEMISTRY OF CELLS. Carbon Structural Formations. 4 Classes of Organic Compounds (biomolecules)

Name Biology Chapter 2 Note-taking worksheet

2/25/2013. Electronic Configurations

All organic compounds contain carbon, however, not all carbon containing compounds are classified as organic. Organic compounds covalently bonded

Course Syllabus. Department: Science & Technology. Date: April I. Course Prefix and Number: CHM 212. Course Name: Organic Chemistry II

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE

2.1 Atoms, Ions, and Molecules. 2.1 Atoms, Ions, and Molecules. 2.1 Atoms, Ions, and Molecules. 2.1 Atoms, Ions, and Molecules

Searching Pharmaceutical Polymer Patents in Derwent World Patents Index

UNIT 1: BIOCHEMISTRY

Molecular Geometry: VSEPR model stand for valence-shell electron-pair repulsion and predicts the 3D shape of molecules that are formed in bonding.

12.1 The Nature of Organic molecules

Chapter 6 The Chemistry of Life

Chapter 25 Organic and Biological Chemistry

Unit 7 ~ Learning Guide Name:

Chapter 6 Chemistry in Biology

Reaxys Training. How to Find Organometallic and Coordination Compounds in Reaxys

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Name Date Class. aryl halides substitution reaction

Chapter 13 Alkenes and Alkynes Based on Material Prepared by Andrea D. Leonard University of Louisiana at Lafayette

Chapter 22 Hydrocarbon Compounds

Keynotes in Organic Chemistry

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Searching CrossFire Gmelin

BIOB111_CHBIO - Tutorial activity for Session 10. Conceptual multiple choice questions:

Hydrocarbons. Chapter 22-23

BIOCHEMISTRY BIOCHEMISTRY INTRODUCTION ORGANIZATION? MATTER. elements into the order and appearance we now

Option G: Further organic chemistry (15/22 hours)

Chemistry Unit Exam: March 21st. Chapters 1-8

Chapter 25. Organic and Biological Chemistry. Organic and

Moisture holding capacity 61 to 69 % Shipping Weight Specific gravity 1.06 to 1.08 Particle size

GENERAL METHODS OF ORGANIC CHEMISTRY; APPARATUS THEREFOR (preparation of carboxylic acid esters by telomerisation C07C 67/47; telomerisation C08F)

Chapter 2: Chemical Basis of Life I. Introduction A. The study of chemistry is essential for the study of physiology because

DEPARTMENT OF CHEMISTRY KHEMUNDI COLLEGE, DIGAPAHANDI; GANJAM Course Objective and Course Outcome

Chemistry 1110 Exam 4 Study Guide

ADVANCED CHEMISTRY 2

video 14.4 isomers isomers Isomers have the molecular formula but are rearranged in a structure with different properties. Example: Both C 4 H 10

Structure and Reaction querying in Reaxys

2.1 The Nature of Matter

Completions Multiple Enrollment in same semester. 2. Mode of Instruction (Hours per Unit are defaulted) Hegis Code(s) (Provided by the Dean)

Nucleic Acid Derivatised Pyrrolidone By: Robert B. Login

Atomic weight = Number of protons + neutrons

Chapter 1 Reactions of Organic Compounds. Reactions Involving Hydrocarbons

Atoms And The Periodic Table

Introduction to Spark

PRESENTATION TITLE. Chemistry. Chemistry

Springer Materials ABC. The world s largest resource for physical and chemical data in materials science. springer.com. Consult an Expert!

Explain how the structure and bonding of carbon lead to the diversity and number of organic compounds.

- aromatic hydrocarbons carbon atoms connected in a planar ring structure, joined by σ and π bonds between carbon atoms

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Outline. Organic Compounds. Overview: Carbon: The Backbone of Life. I. Organic compounds II. Bonding with Carbon III. Isomers IV.

Elements and Isotopes

Transcription:

Comprehensive DWPI SM structure searching using DCR and DWPIM on STN Brian Larner, IP & Science, Thomson Reuters Robert Austin, Senior STN Trainer, FIZ Karlsruhe 19 May 2016

AGENDA Introduction to DWPI chemical structure indexing DCR The Derwent Chemistry Resource What is it? Search example The Derwent Markush Resource (DWPIM) What is it? Structures indexed Search example Advanced topics Substance descriptors Roles Polymers, Inorganics, Phthalocyanines & Metallocenes 2

WHAT IS CHEMICAL INDEXING? Markush structural indexing for generic structures Created for all Markush Structures that meet the criteria for being indexed Also created to cover generic disclosures described only in words (eg cleaning solution containing a 2-8C alcohol) DCR indexing for specific compounds Created for any specific compounds mentioned in the patent Some compounds may be covered in Markush structure if system limits are exceeded Fragmentation coding auto-generated from the above 3

DWPI structure databases on STN SUBX DWPIM > 1.9 M structures DWPI > 3.2 M patents DCR > 2.5 M structures REFX Each structure has a unique Markush Compound Number (MCN) or DCR number (DCR) which is used as the basis of the cross-file search. 4

CHEMICAL STRUCTURE INDEXING IN DWPI - CRITERIA To receive structural indexing a DWPI record must meet the following criteria Classified in Sections B, C and / or E From a major country* * See list on next slide 5

DWPI COUNTRY COVERAGE FOR CHEMICAL INDEXING 6

WHAT IS INDEXED Compounds claimed to be new Compounds produced by a new process Compounds having a new use Components of compositions Novel catalysts and known specific catalysts Specific reagents and starting materials in production processes (DCR only) Materials detected, detecting agents, detection media Materials recovered or purified in new ways Materials removed and removing agents 7

DERWENT CHEMISTRY RESOURCE (DCR) This is a database of specific chemical substances mentioned in patents They are also organised into families of closely related compounds as follows basic compound salts, isotopes, mixtures, isomers Substance records include structure diagrams and substance data, e.g. IUPAC-name, synonyms molecular formula, molecular weight 8

DWPI CHEMISTRY RESOURCE (DCR) The DCR numbers are associated with the relevant fragmentation codes for the substance so they can be searched in conjunction with non-structural fragmentation codes if desired They also have roles associated with them (e.g. produced, detected)so that you can limit your answers by the role of the compound 9

BENEFITS OF DWPI INDEXING - REAL EXAMPLE Search on Diclofenac or its most common synonyms (Voltarol or Voltaren) using Key words in DWPI title & abstract - Find 3530 documents Search on Diclofenac via DCR record We find 3105 records 447 of these were not found by the keyword search 10

SOME INVENTIONS FOUND ONLY BY THE KEYWORD SEARCH ARE LESS RELEVANT 11

BUT THE ONES FOUND ONLY BY DCR ARE HIGHLY RELEVANT 12

DCR COVERAGE DCR records are only created for patents that are classified in at least one of the following CPI sections B(Pharmaceuticals) C (Agrochemcals) E (General Chemistry) In addition existing DCR records are cited when the substances they relate to are mentioned in the DWPI abstracts for patents classified in Section D, F, G, J and K DCR numbers are auto-generated from the specific compound codes in polymer indexing and added to the indexing 13

DCR COVERAGE BY COMPOUND TYPE Ordinary organic compounds (eg ethanol, ibuprofen) Inorganic compounds (eg Sodium chloride, ammonia) Complexes and organometallics (eg ferrocene, Copper phthalocyanine, diethyl magnesium bromide) Peptides with 10 or less amino acids Proteins and other natural polymers with well defined names* Synthetic Polymers from a standard list of around 340 commonly occurring ones Plant, animal & microbial extracts* *these records do not contain structures 14

WHAT IS NOT COVERED Generic classes of compounds These are covered by other forms of chemical structure indexing in DWPI eg fragmentation coding Synthetic polymers other than the ones in the predefined list of around 340 These are covered by polymer indexing Any compound of ambiguous structure This could be those with ill defined ratios of ions or components Or ones with ambiguous names where we can not be sure of the correct structure 15

New STN workflow is oriented around projects 16 To create a project, click the icon. Projects allow you to: Easily return to previous work Reuse common queries Update searches with the most current information

The new STN interface puts query, history and results at your fingertips 17 Structure Editor Query Builder panel History panel Results panel

Prepare structure queries using the structure editor 18 Click OK to add the query to the structures tab of the history panel.

Search the structure query and review structures 19 Automatic Cross File Search is set ON. Click on any structure to enlarge (zoom).

Crossover with REFX and review hit structures in DWPI 20 Use the REFX operator to retrieve corresponding DWPI references (L2). Hit structures with hit highlighting are included in DWPI full view. The structure search (L1) is combined with technology terms in DWPI (L2).

DERWENT MARKUSH RESOURCE ON STN Approximately 1.9 million structures from around 780,000 patents Covers 33 patent issuing authorities (as basic patent country) Can be searched in conjunction with DCR, MARPAT and CAS REGISTRY on STN In most cases using the same structure query Gives the most comprehensive chemical structure search possible 21

TYPES OF STRUCTURES INDEXED Non-polymeric organic molecules Organometallic compounds Inorganic structures Simple inorganic molecules Extended structures such as clays, zeolites and heteropolyacids Partially defined structures Polymeric structures Only for pharmaceutical and agrochemical patents Includes peptides as well as synthetic polymers 22

MARKUSH COVERAGE IN OLDER PATENTS Prior to the introduction of DCR in 1999/2000 the policy was different Both specific and generic structures were covered by Markush structures, often as part of the same structure Some commonly occurring compounds were indexed using Derwent Compound Numbers It was the analysts choice whether to use these or combine them into a Markush These have now been converted to DCR records and can found by a DCR search But are still included in the Derwent Markush Resource 23

ORGANIC MOLECULES IN THE DERWENT MARKUSH RESOURCE Generally speaking they are indexed as shown in the Patent Counter ions are sometimes ignored (but not in the example below) Derwent Markush Resource Version In the patent 24

WHY THE CORE STRUCTURE CAN DIFFER FROM THE ONE DRAWN IN THE PATENT Indexing conventions Keto-enol tautomerism (keto form is the preferred one) Amidine normalisation (amidine/guanidine groups have normalised bonds not single and double bonds) Use of DWPI markush terminology and shortcuts Use of Superatoms terms (CHK, ARY etc.) & shortcuts (CO2, SO3 etc.) Allowing for variable attachments Replace all the parts of the structure where the attachment can be made by a variable group Allowing for exceptions mentioned in the patent For example where at least one of R1 & R2 is not H Allowing for system limits Means sometimes one structure is split into 2 or more 25

SUPERATOMS AND THEIR MEANING (ORGANIC) Superatom Definition STN query node CHK Fully saturated alkyl chain Ak CHE CHY Carbon chain containing at least one double bond (no triple bonds) Carbon chain containing at least one triple bond (optionally with double bonds) CYC Non-aromatic carbocyclic ring Cb ARY Carbocyclic ring system containing at least one benzene ring or quinoid variant HEA 5 membered ring with 2 double bonds or 6 membered ring with 3 double bonds HET Any mononuclear heterocyclic ring other than HEA HEF Fused heterocyclic ring system Hy Ak Ak Cb Hy Hy See also: DWPIM Reference Manual, Table 3, Page 18. 26

SUPERATOMS AND THEIR MEANING (INORGANIC OR NON-SPECIFIC) Superatom Definition STN query node HAL Halogen excluding At X AMX Alkali(ne earth) metal M A35 Group 3 to 5 metal M TRM Transition metal M LAN Lanthanide (excluding Lanthanum) M ACT Actinide or other trans-uranic metal M MX Unspecified metal M XX UNK Unspecified group but not hydrogen, mostly used for unspecified substituent groups Unspecified group (no longer used but may be present in some older structures) See also: DWPIM Reference Manual, Tables 4-6, Pages 19-20. 27

SUPERATOMS USED FOR DISPLAY ONLY Superatom ACY DYE PRT PEG POL Definition Acyl group (derived from any organic acid, not just carboxylic) Undefined dye chromophore Protecting group Polymer end group Polymer group Please note Derwent Superatoms will become directly searchable in a subsequent release of the Derwent Markush resource on New STN See also: DWPIM Reference Manual, Table 6, Page 20. 28

ATTRIBUTES Attributes can be applied to Superatoms to restrict the scope of the group they describe. For carbon chain Superatoms (CHK, CHE, CHY) we have the following Describing chain length LOW (1-6C), MID (7-10C) & HI (>10C) Describing chain structure STR (Straight) & BRA (Branched) For ring Superatoms we have the following Type of ring system - MON (Monocyclic) & FU (Fused) Degree of saturation SAT (Saturated) & UNS (Unsaturated) MON & FU are not applied to HEA SAT & UNS are not applied to HEA and ARY See also: DWPIM Reference Manual, Table 18, Page 62. 29

Search example 30 Search Query: 1 2 3 4 = No further substitution on Ak (Locked). 1 2 3 4 Thiophene: ML = Atom Carbocycle (Cb): ML = Atom Class Alkyl (Ak): ML = Class Heterocycle (Hy): ML = Atom Class Default settings.

STN variable query nodes retrieve DWPIM generic nodes 31 STN variable query nodes DWPIM retrieved generic nodes DWPIM generic nodes for Ak CHK CHE CHY DWPIM generic nodes for Cb ML = Class ARY CYC DWPIM generic nodes for Hy HEA HET HEF

STN node attributes retrieve DWPIM indexed attributes 32 STN node attributes, e.g. Ak ML = Class DWPIM retrieved attributes DWPIM alkyl (no limitation) CHK CHE CHY ML = Class DWPIM alkyl (low) CHK CHK LOW ML = Class DWPIM alkyl (straight) CHK CHK STR

STN variable query nodes retrieve DWPIM generic nodes 33 STN search query Typical DWPIM assembled hits 3 4 1 2 3 4 1 2 STN query nodes with Match Level Class, retrieve corresponding generic and specific nodes in DWPIM. 4 DWPIM attributes are also accessible, e.g. MON = monocyclic, FUS = Fused. 2 3 1

Prepare structure queries using the structure editor 34 Cb and Hy nodes have been set to Class match. Changes from defaults are indicated with an asterisk. This has no effect on DCR. Right click on a node to change Attributes, e.g. Match Level. Block substitution with the lock atoms tool. Click OK to add the query to the structures tab of the history panel.

Search the structure query and review structures 35 Click on a Markush compound number of interest for detailed display views (next). Assembled structures with hit highlighting. Automatic Cross File Search is set ON. Click on any structure to enlarge (zoom).

DWPIM detailed display Brief view 36 Unassembled DWPIM Markush base structure. Hit fragments are combined to form the assembled structure. Query relevant G- groups (G2, etc.). Hit fragments are highlighted.

37 Detailed display allows you to choose a preferred view Brief unassembled hit Markush base structure with complete hit G-groups related to the query Hit fragments within hit G-groups are highlighted Full unassembled hit Markush base structure with all G-groups, including those not related to the query Hit fragments within hit G-groups are highlighted

Crossover with REFX and review hit structures in DWPI 38 Use the REFX operator to retrieve corresponding DWPI references (L2). The structure search (L1) is combined with terms for antiviral in DWPI (L2).

SUBSTANCE DESCRIPTORS (FILE SEGMENTS IN MMS TERMINOLOGY) These are assigned to all Markush structures You can use them to filter your results There are three types of substance descriptor Technology related define the technology area the structure relates to Structure related - define the type of structure the Markush describes Miscellaneous identifies a Markush which contains structure which are components of a composition At least one technology related Substance Descriptor and at least one structure related Substance Descriptor is applied to each Markush 39

SUBSTANCE DESCRIPTORS RELATING TO STRUCTURE Substance Descriptor C F L M N P V W Z Definition Co-ordination complex (includes metallocenes) Any polymer not covered by P or N Oligomer (Precise definition depends on structure type) Alloy (Section B/C patents only) Natural polymer (Section B/C patents only) Polypeptide (3-10 amino acids only) Ordinary organic compound (not a salt) Extended inorganic structures (eg zeolites) Organic salt (at least one ion is organic) 1 Record derived from DCN database 7 Simple Inorganic compound See also: DWPIM Reference Manual, Table 17, Page 56

OTHER SUBSTANCE DESCRIPTORS Substance descriptor Definition A Patent is classified in CPI Section A* B E Y Patent is classified in CPI Section B and/or C Patent is classified in CPI Section E Substances indexed form part of a mixture *Patent must also have a B, C and/or E class to receive Markush indexing See also: DWPIM Reference Manual, Table 17, Page 56

POLYMER OR OLIGOMER Substance Substance descriptors BC definition E definition Oligopeptide VP 3 amino acids 3 amino acids Polypeptide P >=4 amino acids >=4 amino acids Oligosaccharide L 3-6 sugar units 3-9 sugar units Polysaccharide N >= 7 sugar units >=10 sugar units* Other oligomer L 3-8 repeat units 3-9 repeat units Other polymer F >=9 repeat units >=10 repeat units* BC definition refers to definition used when indexing pharmaceutical and agrochemical patents (Sections B and / or C) E definition refers to the definition used when indexing general chemistry patents (Section E) If a patent is classified in Section E as well as Section B and / or Section C the BC definitions are used * Not indexed unless part of a dye molecule 42

FILTER BY SUBSTANCE DESCRIPTOR 43

ROLES OF MARKUSH RECORDS Role A C D M N P Q R U X Definition Compound is analyzed or detected Catalyst Detecting agent Component of a mixture (at least 2 components have been indexed) New compound Compound is produced or purified Compound defined in terms of starting materials Removing or purifying agent New use of compound Compound is removed See also: DWPIM Reference Manual, Table 15, Page 55. 44

POLYMERS Only for Pharmaceutical (B) and agrochemical (C) patents Addition polymers are typically indexed based on the monomers with Role Q assigned Condensation polymers are typically indexed based on the Structural Repeat Unit (SRU) with Substance Descriptor F assigned (polysiloxane example)...

INORGANIC STRUCTURES Salts are drawn as discrete ions with charges added whenever they are shown or can be easily deduced More complex structures are indexed by listing each element present as a separate entity with zero valency Compounds formed entirely of non metallic elements are mostly shown with covalent bonds in much the same way as for organics 46

PHTHALOCYANINES These are drawn fully normalized The central metal atom used to be bonded to all 4 N atoms but now (since 2000) it is disconnected 47

METALLOCENES Are indexed with the cyclopentadienyl or other π bonded ligands shown disconnected from the metal atom The valency on the metal is reduced by 1 for each bond to a cyclopentadienyl ring For example Ti in titanocene dichloride is shown as 2 valent (a +2 charge would be placed on the Ti atom) 48

THANK YOU! Customer Service For subscriptions, pricing and renewals http://ip-science.thomsonreuters.com/support/ Technical Support For access, content, searching, troubleshooting and technical issues. http://ip-science.thomsonreuters.com/techsupport Training For Thomson Innovation training options. http://ip.thomsonreuters.com/training/ti/ Contact Us US, Canada & Latin America Phone: +1 800 336 4474 ts.info.us@thomsonreuters.com Europe, Middle East and Africa Tel: +44 (0)20 7433 4000 ts.info.emea@thomsonreuters.com Japan Phone: +81 3 5218 6500 ts.info.jp@thomsonreuters.com Asia Pacific (Singapore office) Phone: +65 6411 6888 ts.support.asia@thomsonreuters.com 49

Metallocene search example 50 Hint: bond values are adjusted to normalized, because cyclopentadienyl rings are indexed with normalized bonds in DWPIM. Click OK to add the query to the structures tab of the history panel.

Metallocene search example 51 Automatic Cross File Search is set ON.

Resources 52 DWPIM Reference Manual (new STN Sign In required) https://www.stn.org/help/stn/en/dwpim_manual.pdf Recorded Events http://www.stn-international.com/recorded_events.html Derwent Markush Resource (DWPIM) on STN DWPIM vs. MMS Unified Markush Search on new STN Structure Searching on new STN

For more information CAS help@cas.org Support and Training: www.cas.org FIZ Karlsruhe helpdesk@fiz-karlsruhe.de Support and Training: www.stn-international.de