Truncating Molecules Basic Techniques in Structure and Substructure Searching for Information Professionals Judith Currano Head, Chemistry Library University of Pennsylvania currano@pobox.upenn.edu
Acknowledgements and Thanks Screen shots Reaxys screen shots are used with permission from Elsevier and are copyright 2012 Elsevier Properties SA. All rights reserved. Reaxys(r) is a trademark owned and protected by Elsevier Properties SA and used under license SciFinder and STN screen shots are used with permission from CAS and are copyright (c) 2012, American Chemical Society (ACS). All rights reserved.
Introduction to Structure Searching
How Does Structure Searching Work? 1. Screening using automatically generated criteria to create a list of potential matches (i.e., eliminating undesirable substances)
How Does Structure Searching Work? 1. Screening using automatically generated criteria to create a list of potential matches (i.e., eliminating undesirable substances) 2. Iteration atom by atom and bond bybond comparison of the query structure with each structure in the list of potential matches (i.e., identifying true hits)
How Does Structure Searching Work? 1. Screening using automatically generated criteria to create a list of potential matches (i.e., eliminating undesirable substances) FAST STEP 2. Iteration atom by atom and bond by bond comparison of the query structure with each structure in the list of potential matches (i.e., identifying true hits)
How Does Structure Searching Work? 1. Screening using automatically generated criteria to create a list of potential matches (i.e., eliminating undesirable substances) FAST STEP 2. Iteration atom by atom and bond by bond comparison of the query structure with each structure in the list of potential matches (i.e., identifying true hits) RATE DETERMINING STEP
Connection Tables
Connection Tables The system converts the drawn structure into a series of connected nodes and designs a table describing their connections 6 7 1 2 5 3 4 10 8 9
6 7 Connection Tables The system converts the drawn structure into a series of connected nodes and designs a table describing their connections 1 2 5 3 8 4 9 10 NODE IDENTITY BOND OTHER ATOM 1 C 5 6 = 2 2 C = 1 3 7 3 C 2 = 4 8 5 O 1 4 Redundant Connection Table 6 H 1 (Some segments are omitted in the interest of space.)
Connection Tables 6 7 The system converts the drawn structure into a series of connected nodes and designs a table describing their connections 1 2 5 3 4 10 Nonredundant Connection Table 8 9 NODE IDENTITY BOND OTHER ATOM 1 C = 2 5 6 2 C 3 7 3 C = 4 8 4 C 5 10 8 O 9 This is the entire connection table. Each connection is only described once.
Substructures in Theory Basic Substructure Techniques
A Truncation Search for Molecules? Truncation: entering a word stem to search for all words that contain that stem The stem CHEMI yields
A Truncation Search for Molecules? Truncation: entering a word stem to search for all words that contain that stem The stem CHEMI yields CHEMIST CHEMISTS CHEMISTRY CHEMICAL CHEMICALS CHEMINFORMATICS
A Truncation Search for Molecules? Substructure: entering a structural fragment to search for all structures that contain that fragment A substructure search for this core yields:
A Truncation Search for Molecules? Substructure: entering a structural fragment to search for all structures that contain that fragment A substructure search for this core yields:
A Truncation Search for Molecules? Substructure: entering a structural fragment to search for all structures that contain that fragment A substructure search for this core yields: H 3 C O N H H N H O O CH 3
A Truncation Search for Molecules? Substructure: entering a structural fragment to search for all structures that contain that fragment A substructure search for this core yields: H 3 C O N H O H N H N H O N O CH 3 CH 3
A Truncation Search for Molecules? Substructure: entering a structural fragment to search for all structures that contain that fragment A substructure search for this core yields: H H N N H H 3 C O N H O H N H N H O N O CH 3 CH 3
When Are Substructure Searches Helpful? You know how to make a particular core and you want to find interesting molecules that have that core You are looking for molecules with similar function or chemical behavior to yours You want to find methods of effecting a transformation Y i ht fi di f ti b t l You wish to find information about several similar substances in a single search
Selected Resources that Offer Substructure Searching Resource Name Type of Resource Substructure Requirements CAS Registry/ CASREACT via SciFinder CAS Registry/ CASREACT via STN Reaxys Science of Synthesis e-eros Cambridge Structural Database NIST Chemistry Web Book Aldrich/Sigma/Fluka Combined Catalog Databases of chemical substances with properties and reactions, with conditions. Links seamlessly to Chemical Abstracts, which indexes the literature of 10K+ journals and other documents Databases of chemical substances with properties and reactions, with conditions. Links to Chemical Abstracts, which indexes the literature of 10K+ journals and other documents Handbook. Includes properties, reactions, and references to the literature coming from the Beilstein organic, Gmelin inorganic, and Elsevier Patent Chemistry databases Overview/summary of the organic chemistry literature. Includes text and references to the literature. Encyclopedia of reagents. Includes information about the reagents and a few references to the literature. Database of crystal structures of small organics and organometallic substances Online handbook. Includes thermodynamic and physical properties, as well as some spectra. Online catalog. Includes a few physical properties, pricing information, spectra, and materials safety data sheets. Available through the substructure module. Allows substructure substance and reaction searches, with variables and limited R-group capabilities. Requires substructure plug-in, which is freely available on the Web. Allows variables, userdefined G-groups, screens, text-based substructure tools, and other advanced features. Allows variables, user defined R-groups, and other advanced features. Choose from Java applet or free ChemDraw or ISISDraw plug-ins. Basic substructure features and limited R-group capabilities Choose from Java applet or free ChemDraw or ISISDraw plug-ins. Basic substructure features. Basic substructure features; substructure is default search. Allows user to specify total connectivity of an atom without drawing bonds. Requires free plug-in. Basic substructure features. Requires free plug-in. Basic substructure features.
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata* catal* cataly* catalys* catalyst*
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catal*: cataly*: catalys*: catalys catalyst*:
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catal*: cataly*: catalys*: catalys catalyst*: catalyst, catalysts
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catal*: cataly*: catalys*: catalyst(s), catalysis catalys catalyst(s), catalysis catalyst*: catalyst, catalysts
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catal*: cataly*: catalyst(s), catalysis, catalytic, catalyze(d), etc. catalys*: catalyst(s), catalysis catalyst*: catalyst, catalysts
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catal*: catalog, Catalan, etc. cataly*: catalyst(s), catalysis, catalytic, catalyze(d), etc. catalys*: catalyst(s), catalysis catalyst*: catalyst, catalysts
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catastrophe! You name it, you get it catal*: catalog, Catalan, etc. cataly*: catalyst(s), catalysis, catalytic, catalyze(d), etc. catalys*: catalyst(s), catalysis catalyst*: catalyst, catalysts
The Art of Truncation Your goal: To get EVERYTHING you want and NOTHING you don t want! You are looking for all information available on the subject of catalysts. Where should you truncate? cata*: catastrophe! You name it, you get it catal*: catalog, Catalan, etc. cataly*: catalyst(s), catalysis, catalytic, catalyze(d), etc. catalys*: catalyst(s), catalysis catalyst*: catalyst, catalysts
The Art of Substructure Design Your goal: To get EVERYTHING you want and VERY LITTLE that you don t want!
The Art of Substructure Design Your goal: To get EVERYTHING you want and VERY LITTLE that you don t want! Only about 50 commercially available hits in SciFinder! (like catalyst*) Even limiting to commercially available organic, single-component systems with references, you get just under 18,000 hits in SciFinder! (like cata*) THE CHALLENGE IS TO FIND A HAPPY MEDIUM!
The Art of Substructure Design Things to remember: TAKE NOTHING FOR GRANTED A substructure is a TEMPLATE Anything goes unless you forbid it Gauge the scope of your search carefully Too general, and you ll be wading through hits until Kingdom Come Too specific, and you ll only get the substance with which you started
4 Steps to a Good Substructure 1. Draw the section of interest. 2. Determine the amount and type of substitution that can occur at each open site. 3. Determine the bond order and stereochemistry of each bond. 4. Determine the topology of each bond and each possible attachment.
Drawing the Section of Interest CH 3 H 3 C CH 3 HO O H CH 3 CH 2 H CH 3 3 C You are interested in finding substances that resemble the natural product Curculmol. You are particularly interested in the core ring system. HO O H CH 2
Drawing the Section of Interest You are interested in finding substances that resemble the natural product Curculmol. You are particularly interested in the core ring system. This willyield good results, but you want to make sure you aren t missing anything, so, you cast your net further afield.
Drawing the Section of Interest You are interested in finding substances that resemble the natural product Curculmol. You are particularly interested in the core ring system. You are interested in all systems containing the 5 6 ring system highlighted in red. The heteroatom can be O, S, or Se.
Drawing the Section of Interest You now have the following framework, but would like to vary the atom at the position marked?. We need to set an R group, sometimes called a G group! R groups can help to perform a Boolean OR search for multiple cores simultaneously by searching for one of several designated substituents at a given position.
R Group Basics Suppose you wish to find a benzene ring that has been substituted by one of these four halogens. Method 1: Use a pre defined R group or variable Common Pre Defined R Groups A Any atom except hydrogen Q Any atom except carbon or hydrogen M Any metal X Any halogen Other pre defined R groups can help you search for hydrocarbon chains, cycles, etc.
Designing R Groups Method 2: Design Your Own R Group (Order presented may differ from order required by structure editor.) You would like to have O, S, or Se at the point marked? 1. Insert a placeholder into the molecule oecueat the site eat which the R group should appear. 2. Define the atoms, fragments, or variables that may be part of the R group. 3. For fragment based groups, indicate each atom in each fragment that attaches to an atom in the parent.
Designing R Groups Insert a placeholder into the molecule (parent) at the site at which the R group should appear You would like to have O, S, or Se at the point marked?
Designing R Groups Insert a placeholder into the molecule (parent) at the site at which the R group should appear You would like to have O, S, or Se at the point marked? Dfi Define the atoms, fragments, or variables that may be part of the R group. Method 1: Use an atom list Where L is defined dfi das O, S, Se
Designing R Groups Insert a placeholder into the molecule (parent) at the site at which the R group should appear You would like to have O, S, or Se at the point marked? Dfi Define the atoms, fragments, or variables that may be part of the R group. Method 1: Use an atom list Method 2: Use a parent and fragments Where L is defined dfi das O, S, Se
Designing R Groups Insert a placeholder into the molecule (parent) at the site at which the R group should appear You would like to have O, S, or Se at the point marked? Dfi Define the atoms, fragments, or variables that may be part of the R group. Method 1: Use an atom list Method 2: Use a parent and fragments Wh L i dfi d O S S Where L is defined as O, S, Se Judith s First Law: R groups should always have two or more substituents; otherwise just draw the desired substituent as part of the parent!
Designing R Groups Indicate each atom in each fragment that bonds to an atom in the parent. You would like to have O, S, or Se at the point marked? 1 2 1 2 1 2 The method of setting attachment points will vary from structure editor to structure editor. In some cases, an example such as this will not require attachment points since it is obvious which atoms attach to the parent.
Designing R Groups Indicate each atom in each fragment that bonds to an atom in the parent. You would like to have O, S, or Se at the point marked? 1 2 1 2 1 2 The method of setting attachment points will vary from structure editor to structure editor. In some cases, an example such as this will not require attachment points since it is obvious which atoms attach to the parent. Judith s Second Law: The number of attachment points in a fragment must equal the number of bonds between the placeholder and the rest of the molecule.
Determining the Amount and Type of Substitution at Each Open Site A substructure is a template, so, the system assumes substitution is allowed wherever valence is not filled. What kind of substitution can be at each free site? 1. Hd Hydrogen only Draw it in or use a lock out tool 2. Anything Do nothing; it s already a template 3. Specific substituents or types of substituent Use a variable or R group.
Determining the Amount and Type of Substitution at Each Open Site A substructure is a template, so, the system assumes substitution is allowed wherever valence is not filled. We want only H at this position. What kind of substitution can be at each free site? 1. Hd Hydrogen only Draw it in or use a lock out tool 2. Anything Do nothing; it s already a template 3. Specific substituents or types of substituent Use a variable or R group.
Determining the Amount and Type of Substitution at Each Open Site A substructure is a template, so, the system assumes substitution wherever valence is not filled. We want only H at this position. What kind of substitution can be at each free site? 1. Hd Hydrogen only Draw it in or use a lock out tool 2. Anything Do nothing; it s already a template 3. Specific substituents or types of substituent Use a variable or R group.
Determining the Amount and Type of Substitution at Each Open Site A substructure is a template, so, the system assumes substitution wherever valence is not filled. We want only H at this position. We will allow any kind of substitution everywhere else. What kind of substitution can be at each free site? 1. Hd Hydrogen only Draw it in or use a lock out tool 2. Anything Do nothing; it s already a template 3. Specific substituents or types of substituent Use a variable or R group.
Determining the Bond Order and Stereochemistry of Each Bond Bond order isusually fixed unless specified otherwise. Unless wedge, hash, and double steric bonds are used, stereochemistry and geometry are usually also variable.
Determining the Bond Order and Stereochemistry of Each Bond Bond order isusually fixed unless specified otherwise. Unless wedge, hash, and double steric bonds are used, stereochemistry and geometry are usually also variable. Always ask whether the bond order and stereochemistry are fixed or flexible. This bond can be either single or double. Other bonds are as drawn.
Determining the Bond Order and Stereochemistry of Each Bond Bond order isusually fixed unless specified otherwise. Unless wedge, hash, and double steric bonds are used, stereochemistry and geometry are usually also variable. Always ask whether the bond order and stereochemistry are fixed or flexible. Either stereoconfiguration is acceptable here. Other bonds can have any stereochemistry or geometry, as well. This bond can be either single or double. Other bonds are as drawn.
Determining the Bond Order and Stereochemistry of Each Bond Bond order is fixed unless specified otherwise. Unless wedge, hash, and double steric bonds used, stereochemistry and geometry are usually also variable. Always ask whether the bond order and stereochemistry are fixed or flexible. Either stereoconfiguration is acceptable here. Other bonds can have any stereochemistry or geometry, as well. This bond can be either single or double. Other bonds are as drawn. Warning: In many systems, stereochemistry may not be used with R groups.
Determining Topology R H Setting topology varies greatly from resource to resource. You can only be sure of one thing: if it looks like a ring, it s a ring! Topology Questions to Ask 1. Rings: Are they isolated or can they be part of larger systems? 2. Chains: Must they be chains, or could they be part of rings? 3. Attachments: Are there any topological restrictions on specific attachments that are not drawn?
Determining Topology R H Setting topology varies greatly from resource to resource. You can only be sure of one thing: if it looks like a ring, it s a ring! Topology Questions to Ask 1. Rings: Are they isolated or can they be part of larger systems? 2. Chains: Must they be chains, or could they be part of rings? 3. Attachments: Are there any topological restrictions on specific attachments that are not drawn? In this case, we don t have any topological restrictions in our structure.
The Finished Substructure Set an R group where Set an R group where R = O, S, Se
The Finished Substructure Set an R group where R = O, S, Se This site may not be further substituted, but all other sites may.
The Finished Substructure Set an R group where R = O, S, Se This bond can be either single or double This site may not be further substituted, but all other sites may.
The Finished Substructure Set an R group where R = O, S, Se This bond can be either single or double Any stereo configuration is possible for these bonds. This site may not be further substituted, but all other sites may.
The Finished Substructure Set an R group where R = O, S, Se This bond can be either single or double Any stereo configuration is possible for these bonds. This site may not be further substituted, but all other sites may. There are no restrictions on topology
Substructures in Practice Queries Requiring Substructure Searches
A Sample Substructure Reference Interview I am looking for thermodynamic properties of molecules that look like this.
A Sample Substructure Reference Interview I am looking for thermodynamic properties of molecules that look like this. Th b bt th ht t t b btit d d th The carbon between the heteroatoms may not be substitued, and the other carbons may only have alkyl attachments or hydrogen.
A Sample Substructure Reference Interview I am looking for thermodynamic properties of molecules that look like this. The carbon between bt the ht heteroatomst may not be substitued, btit and the other carbons may only have alkyl attachments or hydrogen. The N has to be there, but I guess it would be OK if we had O instead of S.
A Sample Substructure Reference Interview I am looking for thermodynamic properties of molecules that look like this. The carbon between bt the ht heteroatomst may not be substitued, btit and the other carbons may only have alkyl attachments or hydrogen. The N has to be there, but I guess it would be OK if we had O instead of S. There can be either a single or double bond connecting the N and the adjacent C, but I d prefer if it were double.
A Sample Substructure Reference Interview I am looking for thermodynamic properties of molecules that look like this. The carbon between bt the ht heteroatomst may not be substitued, btit and the other carbons may only have alkyl attachments or hydrogen. The N has to be there, but I guess it would be OK if we had O instead of S. There can be either a single or double bond connecting the N and the adjacent C, but I d prefer if it were double. I think that it needs to be an isolated ring. A fused ring system will be too I think that it needs to be an isolated ring. A fused ring system will be too different from the substances that interest me.
Building the Substructure I am looking for thermodynamic properties of molecules that look like this. The carbon between the heteroatoms may not be substituted.
Building the Substructure I am looking for thermodynamic properties of molecules that look like this. The carbon between the heteroatoms may not be substituted. R 1 = ALK H Those carbons can have only alkyl attachments or hydrogen.
Building the Substructure I am looking for thermodynamic properties of molecules that look like this. It would be OK if we had O instead of S at this position. The carbon between the heteroatoms may not be substituted. R 1 = ALK H Those carbons can have only alkyl attachments or hydrogen. R 2 = S O
Building the Substructure I am looking for thermodynamic properties of molecules that look like this. It would be OK if we had O instead of S at this position. The carbon between the heteroatoms may not be substituted. This could be single or double, but I d prefer double. R 1 = ALK H Those carbons can have only alkyl attachments or hydrogen. R 2 = S O
Building the Substructure I am looking for thermodynamic properties of molecules that look like this. It would be OK if we had O instead of S at this position. The carbon between the heteroatoms may not be substituted. This could be single or double, but I d prefer double. R 1 = ALK H R 2 = S O Those carbons can have only alkyl attachments or hydrogen. A fused ring system would be too different from the substances that interest me. Set topology of attachments to chain only
Thanks for Your Attention! Any Questions?