Cerius 2. Hypothesis & Receptor Models. Release 4.0 April 1999

Size: px

Start display at page:

Download "Cerius 2. Hypothesis & Receptor Models. Release 4.0 April 1999"

Priscilla Chandler
5 years ago
Views:

1 Cerius 2 Hypothesis & Receptor Models Release 4.0 April 1999 Molecular Simulations Inc Scranton Road San Diego, CA / Fax: 619/

3 Copyright * This document is copyright 1999, Molecular Simulations Inc., a subsidiary of Pharmacopeia, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means or stored in a database retrieval system without the prior written permission of Molecular Simulations Inc. The software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. Restricted Rights Legend Use, duplication, or disclosure by the Government is subject to restrictions as in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFAR or subparagraphs (c)(1) and (2) of the Commercial Computer Software Restricted Rights clause at FAR , as applicable, and any successor rules and regulations. Trademark Acknowledgments Catalyst, Cerius 2, Discover, Insight II, and QUANTA are registered trademarks of Molecular Simulations Inc. Biograf, Biosym, Cerius, CHARMm, Open Force Field, NMRgraf, Polygraf, QMW, Quantum Mechanics Workbench, WebLab, and the Biosym, MSI, and Molecular Simulations marks are trademarks of Molecular Simulations Inc. IRIS, IRIX, and Silicon Graphics are trademarks of Silicon Graphics, Inc. AIX, Risc System/ 6000, and IBM are registered trademarks of International Business Machines, Inc. UNIX is a registered trademark, licensed exclusively by X/Open Company, Ltd. PostScript is a trademark of Adobe Systems, Inc. The X-Window system is a trademark of the Massachusetts Institute of Technology. NSF is a trademark of Sun Microsystems, Inc. FLEXlm is a trademark of Highland Software, Inc. Permission to Reprint, Acknowledgments, and References Molecular Simulations usually grants permission to republish or reprint material copyrighted by Molecular Simulations, provided that requests are first received in writing and that the required copyright credit line is used. For information published in documentation, the format is Reprinted with permission from Document-name, Month Year, Molecular Simulations Inc., San Diego. For example: Reprinted with permission from Cerius 2 Hypothesis & Receptor Models, April 1999, Molecular Simulations Inc., San Diego. Requests should be submitted to MSI Scientific Support, either through electronic mail to support@msi.com or in writing to: * U.S. version of Copyright Page

4 MSI Scientific Support and Customer Service 9685 Scranton Road San Diego, CA To print photographs or files of computational results (figures and/or data) obtained using Molecular Simulations software, acknowledge the source in the format: Computational results obtained using software programs from Molecular Simulations Inc. dynamics calculations were done with the Discover program, using the CFF91 forcefield, ab initio calculations were done with the DMol program, and graphical displays were printed out from the Cerius 2 molecular modeling system. To reference a Molecular Simulations publication in another publication, no author should be specified and Molecular Simulations Inc. should be considered the publisher. For example: Cerius 2 Hypothesis and Receptor Models, April San Diego: Molecular Simulations Inc., 1999.

5 Contents How to Use This Book Preparing to work x How to find information xi Using other Cerius 2 books xi Typographical conventions xii 1. Introduction to Cerius 2 Hypothesis and Receptor Models 1 Understanding the modules Before you begin Accessing C 2 modules ix PART 1: ALIGNMENT 2. Aligning Models 9 Aligning models using field alignment Aligning models using Moments alignment Alignment of non-rigid models Multiple model alignment strategies RMS alignment Incorporating Align Data in QSAR 21 Preparing models for generating alignment similarity descriptors 21 Cerius 2 Hypothesis and Receptor Models/April 1999 v

6 Adding alignment similarity descriptors to the QSAR Study Table 22 Using the study table to analyze a consensus alignment.. 24 Using alignment similarity descriptors for QSAR analysis 24 Recalculating alignment similarity descriptors Theory: Alignment 27 Model Alignment Strategies and Methods RMS Superposition Electrostatic Potential Similarity Steric Shape Similarity Field Alignment (Electrostatic & Steric Similarity) Steric Clash (Bump) Checking Implementation: Gaussian Approximation of Overlap Functions 33 PART 2: Receptor 5. Introduction to Receptor 39 Accessing Receptor Using Receptor Receptor QuickStart 43 QuickStart Tutorial: Receptor Surface Analysis (RSA) 47 Before you begin Load Models and add them to the Study Table Generate a Receptor Surface Model (RSM) Add Receptor Surface energies to the Study Table Calculate a QSAR Generating a Receptor Model 53 Generating a receptor surface model Setting preferences vi Cerius 2 Hypothesis and Receptor Models/April 1999

7 Saving and restoring a receptor model Listing a receptor model Mapping property values Evaluating Structures Against a Receptor Model 61 Evaluating structures Setting preferences Viewing the results Incorporating receptor data in QSARs Building a QSAR using receptor data Using a QSAR with receptor data to estimate activity of a structure Theory: Receptor Models 71 Receptor surface models Mapping properties Molecule-Receptor model interactions RSA (Receptor Surface Analysis) PART 3: Pharmacophore 11. Introduction to Generating a Pharmacophoric Hypothesis 79 Using ConFirm and HipHop Generating Conformations 83 Running ConFirm Generating and Using Alignment Hypotheses 87 Aligning common molecular features Setting preferences using the Preferences control panel...89 Incorporating activity data into a hypothesis Using aligned structures to generate receptor models Cerius 2 Hypothesis and Receptor Models/April 1999 vii

8 PART 4: Database Query 14. Performing a Database Query 95 Available database searches Preparing to work with Database Query Methodology Constructing a query Constructing an atom query Constructing a bond query Constructing a centroid query Constructing a distance query Deleting all query features Submitting a query Submitting a Catalyst query Submitting a SHAPE query Submitting an ISIS query Defining a Database Domain from Prior Query Hit List 112 Retrieving and browsing database hits Manipulating a hit list APPENDICES A. References 121 viii Cerius 2 Hypothesis and Receptor Models/April 1999

9 How to Use This Book What is in this book Cerius 2 Hypothesis and Receptor Models is a guide to the Cerius 2TM modules C 2 Alignment, C 2 Receptor, C 2 DBAccess, C 2 Active Site Viewer, and the Cerius 2 interfaces to Catalyst ConFirm TM, HipHop ΤΜ, and Hypo TM Topics in this book describe module tasks and their associated procedures, combining conceptual, procedural, and reference material in one book. Because these modules are part of the highly modular Cerius 2 product line, you perform some tasks using functions located in other Cerius 2 modules. Detailed descriptions of these tasks are in other Cerius 2 documentation and are cross-referenced throughout this book. (See Using other Cerius2 books.) This book is divided into four parts: Part One, Alignment, describes tools used to superimpose molecules to satisfy various alignment conditions. These tools permit alignment of multiple molecules using least square fitting of specified atoms or field fitting, by employing the overlap similarity of molecular forcefields. This section will be of interest to users who want to superimpose molecules that they are studying to satisfy various alignment conditions. Part Two, Receptor, describes the 3D visual environment that Receptor provides for receptor hypothesis exploration. It includes information on how to generate receptor surface models from the overlay of active compounds and how to use pseudoreceptors to evaluate new, potentially active compounds. This section will be of interest to users who want to use a 3D visual environment for receptor hypothesis exploration and for new compound evaluation. Part Three, Pharmacophore, describes the pharmacophoric hypotheses that are generated through the interfaces to Catalyst ConFirm, HipHop, and Hypo. It includes information on how to use the interfaces to generate conformers, align structures by Cerius 2 Hypothesis and Receptor Models/April 1999 ix

10 common chemical features, generate hypotheses, and use aligned structures to generate receptor surfaces in Receptor. This section will be of interest to users who want to generate pharmacophoric hypotheses using the common chemical features of a set of molecules. Part Four, Database Query, describes how to use Catalyst/Info and MDL ISIS to construct queries to search databases and to retrieve, examine, and save structures from the databases that fit your criteria. This section will be of interest to users who want to construct queries to search databases and to retrieve, examine, and save the structures from the databases that certain criteria. Preparing to work You should already be familiar with Your workstation should have To use the software as described in this book, you should be familiar with: The Cerius 2 Visualizer and graphical user interface Basic Cerius 2 facilities for model manipulation You need the following resources to use the Drug Design Workbench (DDW) software described in this book: A licensed copy of Cerius 2 Licensed copies of some or all of the following modules: Alignment, Receptor, DBAccess, Catalyst/Info, ConFirm, HipHop, and Hypo, along with the appropriate Cerius 2 interfaces A home directory in which you can create subdirectories x Cerius 2 Hypothesis and Receptor Models/April 1999

11 How to find information How to find information If you want to know about The C 2 modules available for activity prediction and their related tasks Performing alignment operations Performing receptor operations Performing pharmacophoric hypothesis generation Performing database searches Background for the topics in this book Read Chapter 1, Introduction to Cerius2 Hypothesis and Receptor Models Chapter 2, Aligning Models Chapter 3, Incorporating Align Data in QSAR Chapter 4, Theory: Alignment Chapter 5, Introduction to Receptor Chapter 6, Receptor QuickStart Chapter 8, Generating a Receptor Model Chapter 9, Evaluating Structures Against a Receptor Model Chapter 10, Theory: Receptor Models Chapter 11, Introduction to Generating a Pharmacophoric Hypothesis Chapter 12, Generating Conformations Chapter 13, Generating and Using Alignment Hypotheses Chapter 14, Performing a Database Query Chapter A, References Using other Cerius 2 books You can find additional information about Cerius 2 in several other books. Cerius 2 QSAR+ Describes the C 2 QSAR+ module and its regression and analysis technologies integrated in a chemically aware molecular spreadsheet Cerius 2 Diversity Describes the C 2 Diversity module, which analyzes chemical diversity to design and evaluate compound libraries and reagent sets for combinatorial chemistry Cerius 2 Hypothesis and Receptor Models/April 1999 xi

12 Cerius 2 Modeling Environment Describes the integrated set of tools for session management and atomistic modeling that form the core of the Cerius 2 modeling environment Cerius 2 Builders Discusses the specialized builder modules (that is, the Analog Builder, Crystal Builder, Surface Builder, Interface Builder, Polymer Builder, and Amorphous Builder modules) that can be added to supplement the basic model sketching capabilities provided by the Cerius 2 model environment Cerius 2 Simulation Tools Discusses the Open Force Field, Force Field Editor, Charges, Minimizer, Dynamics Simulation and Analysis Cerius 2 Conformational Search and Analysis Discusses Conformer Search, Conformer Analysis, and Field Calculation Cerius 2 Command Scripts Guide Shows how to capture and replay a script of Cerius 2 commands, and how to enhance your command scripts with the features of the Tool Command Language (Tcl) Cerius 2 Installation and Administration Guide Provides step-bystep instructions for installing and administering Cerius 2 in your operating environment. Typographical conventions Unless otherwise noted in the text, this book uses the typographical conventions described below: Words in italic represent variables. For example: Pred_dependent_variable In this example, the name of an appropriate dependent variable replaces the value dependent_variable. (For example, if the name of the dependent variable were Activity, QSAR+ would display a value of Pred_Activity.) Names of Cerius 2 objects and elements that appear in the user interface are presented in bold type. For example: xii Cerius 2 Hypothesis and Receptor Models/April 1999

13 Typographical conventions Access the QSAR card deck, then choose View Equations from the Equation Viewer menu card. Items that you type are presented in bold type. For example: To name your file, enter testset.qsar in the text entry box. Messages that appear on your screen and excerpts of input/output files appear in courier font: When written to the textport, the listed data forms columns: Surface Coordinates point X Y Z mol atomchargeesp hbond hydrophob Cerius 2 Hypothesis and Receptor Models/April 1999 xiii

14 xiv Cerius 2 Hypothesis and Receptor Models/April 1999

15 1 Introduction to Cerius 2 Hypothesis and Receptor Models This book contains descriptions of a set of tools that are useful in some phases of drug design and in the search for novel active compounds. With the Cerius 2 Align tools described in this book you may choose to perform spatial alignments of entire sets of molecules to look for common elements that may predict their usefulness as new drugs. Alternatively, you can use the Cerius 2 interface to Catalyst Confirm and HipHop to generate pharmacophoric hypotheses. You create these by first producing conformations for a set of molecules, then using these to find and align functional groups common to all the molecules in the set. The tools available allow you to select hypotheses that do the best job of predicting activity for the set of study molecules, then use these aligned structures to generate a model of an active site in Cerius 2 Receptor. The Cerius 2 Receptor tool can use models generated by ConFirm and HipHop, or initial models focusing on the active site of an enzyme or antibody can be generated by Cerius 2 Receptor itself. Whatever the source of these initial models, you can then use them to predict 3D structure of active sites, which can aid the medicinal chemist in designing novel compounds. The Database tools described in this book allow you to store these models, and to retrieve and examine them at a later time for various structural and activity criteria that you define using the Cerius 2 Database Query software. With DBAccess, you can also search ISIS/MDL, CatShape, and Catalyst databases, giving you added flexibility in how you store your hypothesis and receptor model data. Cerius 2 Hypothesis and Receptor Models/April

16 1. Introduction to Cerius 2 Hypothesis and Receptor Models Understanding the modules All Cerius 2 modules discussed here are based on the common control and presentation platform of the Cerius 2 Visualizer. The Cerius 2 Visualizer is an integrated collection of tools for session management and atomistic modeling that is the core of the Cerius 2 modeling environment. These tools perform functions associated with model management, 3D graphical representation, session logging, environment customization including tables and graphs, and visualization of molecular models. Please see the Cerius 2 Modeling Environment for more detailed information on Cerius2 Visualizer. These are the Cerius 2 software modules documented in this manual: Alignment (Cerius 2 Align), which provides tools to superimpose molecules to satisfy various alignment conditions. These tools permit alignment of molecules using least square fitting with atom equivalencies specified either by automatic atom matching algorithms or by manual atom matching. In addition to rigid body superpositioning, the module provides tools for flexibly aligning one molecule over another using a fit optimizer algorithm. Field Fitting (Cerius 2 Field Fit), a module within Alignment, which employs the overlap similarity of molecular forcefields. Receptor (Cerius 2 Receptor), which provides a 3D visual environment for receptor hypothesis exploration. The module creates receptor surface models using information generated from the overlay of active compounds. The receptor models can be used to evaluate new compounds and evaluate conformations and constraints on compounds in the receptor site. Database Query (Cerius 2 DBAccess), which provides access to the search facilities of Catalyst and ISIS to search molecular databases, retrieve, examine, and save structures that fit your search criteria. Catalyst is an MSI product that performs hypothesis generation, database searches, and activity information. The database search functions, Cat/Info and CatShape, are accessed through Cerius 2 interfaces. Please see the Catalayst online help for more detailed information about Catalyst. 2 Cerius 2 Hypothesis and Receptor Models/April 1999

17 Before you begin Interfaces for Catalyst ConFirm and Catalyst HipHop, which access Catalyst applications that provide tools to generate pharmacophoric hypotheses. The hypotheses are generated by first generating conformations for a set of study molecules and then using the conformations to find and align chemically important functional groups common to the molecules in the study set. The DRUG DISCOVERY card deck includes interface menu cards to both of these applications. Before you begin To start Cerius 2 Before you use Cerius 2 for activity prediction, you must have installed on your system a properly licensed copy of the Cerius 2 software containing the modules you have purchased. If you have any questions about your system setup or your software license, talk to your system administrator. You should become familiar with the Cerius 2 environment. The Cerius 2 Modeling Environment introduces you to the Cerius 2 user interface and to the Visualizer tools. The book also provides detailed information about standard elements of the modeling environment. Cerius 2 software runs on workstations using the UNIX operating system. If you installed Cerius 2 with the default startup command, then start the software using the following command in a UNIX shell window: > cerius2 Note You must be in a directory where you have write access. Please see you system administrator if this default command does not start the software. Accessing C 2 modules The tools described in this book are found in the following card decks: Cerius 2 Hypothesis and Receptor Models/April

1. Introduction to Cerius 2 Hypothesis and Receptor Models DRUG DISCOVERY deck DRUG DISCOVERY HYPOTHESIS MODELS This section describes the card decks for the modules covered in this book.

(Note that depending on which modules are installed, you may notice a difference between the menu cards and card decks described here and those you see displayed on your screen.

18 1. Introduction to Cerius 2 Hypothesis and Receptor Models DRUG DISCOVERY deck DRUG DISCOVERY HYPOTHESIS MODELS This section describes the card decks for the modules covered in this book. Each module is accessed by making the appropriate selection from a menu card. (Note that depending on which modules are installed, you may notice a difference between the menu cards and card decks described here and those you see displayed on your screen.) If the full set of Life Sciences modules is installed, you can see the card decks described below on the Main panel by selecting them from the popup menu in the Deck Selector. The DRUG DISCOVERY card deck includes the QSAR, MODEL RECEPTOR, ALIGN MOLECULES, and QUERY DATABASE cards. The menu cards for this deck are illustrated below: This deck includes tools to generate and apply QSARs, to align molecules, to generate and evaluate receptor models, and to construct molecular queries for searching databases and retrieving structures. The QSAR menu selections are mainly described in the Cerius 2 QSAR+ book. However, the Study Table is used to incorpo- 4 Cerius 2 Hypothesis and Receptor Models/April 1999

19 Before you begin DATABASES deck rate alignment data into QSAR. This procedure is described in Incorporating Align Data in QSAR. The ALIGN MOLECULES selections are described in three chapters in this book: Aligning Models, Incorporating Align Data in QSAR, and Theory: Alignment. The MODEL RECEPTOR card contains items that are discussed in: Introduction to Receptor, Receptor QuickStart, Generating a Receptor Model, Evaluating Structures Against a Receptor Model, and Theory: Receptor Models. The QUERY DATABASE selections are described in: Performing a Database Query. The DATABASES card deck includes the CATALYST, CAT- SHAPE, and MDL INTERFACE menu cards, as illustrated: HYPOTHESIS MOD- ELS deck This deck includes tools to construct molecular queries for database searches and structure retrieval using either Catalyst or MDL ISIS and the databases associated with their search facilities. These instruments are described in more detail in Performing a Database Query. The HYPOTHESIS MODELS card deck includes the CONFIRM INTERFACE, HIPHOP INTERFACE, and MODEL RECEPTOR menu cards. The menu cards for this deck are illustrated below: Cerius 2 Hypothesis and Receptor Models/April

20 1. Introduction to Cerius 2 Hypothesis and Receptor Models : These cards provide interfaces to the Catalyst tools ConFirm and HipHop through the Cerius 2 interface. The MODEL RECEPTORS card is included here to allow you to easily work with Receptor models within the ConFirm and HipHop software environments. 6 Cerius 2 Hypothesis and Receptor Models/April 1999

21 Part 1 Alignment Aligning Models Incorporating Align Data in QSAR Theory: Alignment

23 2 Aligning Models Sections in this chapter You can perform an alignment as soon as you have two or more models loaded or built and imported into the Align Models table. With just two models you can perform an alignment immediately using either the Field Fitting or Moments alignment methods. You may also use RMS alignment by specifying atom matches, either directly or using the automated subgraph mapping tool in the Align Models panel. In addition, you may choose to align the models rigidly or to use a specified set of flexible torsions. With more than two models imported into the Align Models table you may choose between Consensus and Target Model alignment strategies to define how the models should be aligned to each other. The topics covered in this chapter are: Aligning models using field alignment Aligning models using Moments alignment Performing a target model alignment to a single model RMS alignment Aligning models using field alignment The Field Align method aligns models by calculating the steric and electrostatic forcefields about two models (using a potential probe) then orienting the models to achieve a maximum similarity overlap of those forcefields. This means that the models are aligned so that similar regions in terms of shape and electrostatic charge distribution are overlapped. Hence, one immediate advantage that Field Align has over RMS Align is that you are not required to specify how atoms from one model are matched to those in another. Cerius 2 Hypothesis and Receptor Models/April

24 2. Aligning Models Importing models for alignment Performing a Field alignment Align UNDO and REDO To perform an alignment you must first import your models. Bring up the ALIGN MOLECULES card by going to the DRUG DIS- COVERY deck of cards and clicking on ALIGN MOLECULES. Click Align Models to bring up the Align Models panel. At the top of this panel is the ADD button. This button is used to import models from the model manager into the Align Models table. Next to the ADD button is a popup which specifies which models to add, either the Current model, the Selected models or All models. Click the ADD button to append your currentlyselected model to the table. Select a second model and click the ADD button again to add the second model to the table. The table now contains a row for each model you have added. In addition to the model name, each row of the table now contains four columns. The #Tor column reports the number of movable torsions that are currently set for the model. The Use column specifies whether this model is to be included in a subsequent alignment. This property is also useful for temporarily excluding a model from the set of models to align, without actually deleting this model from the Align Models table. The columns M1 and M2 actually refer to the corresponding rows of the table and are used to report how many RMS atom matches are defined between pairs of models. In this case there are zero matches specified between your two models. (Please see the RMS alignment section below for more information on atom matches.) Now that you have imported your models, you can perform a Field alignment. Go to the ALIGN MOLECULES card and click Align to bring up the Align panel. This panel is used to specify and perform a new alignment on the models specified in the Align Models table. On the control panel, set the Align Method value to Field and the Align Type value to Rigid. (This specifies a Field alignment treating the align models as rigid.) Click the ALIGN button on the panel to perform the alignment. Your model display window should now show the two models superimposed on each other, although only the second model is actually reoriented. Notice that after the alignment has finished the UNDO button is no longer grayed out. Click this button to revert your models back into their original orientations. (Note that you may not be able to see both of them, since one model be translated out of view.) The 10 Cerius 2 Hypothesis and Receptor Models/April 1999

25 Aligning models using field alignment Customizing Field alignment UNDO button now becomes a REDO button. Click this button again to revert the models back to the aligned orientations. The UNDO/REDO tool is very useful for examining the effect of alignment, for reverting back after a mistake, or for adjusting the alignment parameters. These buttons only become gray again if you edit an alignment model or remove it from the Align Models table. You may move or reorient the models, but the UNDO/ REDO buttons will always return the orientations to those immediately present before/after you performed the last alignment. You can customize Field alignment by either clicking the Prefs... button at the bottom of the Align panel, or clicking Align Preferences... on the ALIGN MOLECULES card. Both of these actions open the Align Preferences panel. The middle section of this panel deals with parameters used in the Field alignment process. By default, the Pre-align Moments check box is checked, indicating that the models should first be pre-aligned using the Moments align method before executing Field alignment. This is mainly so that the models are moved into a decent relative orientation and position before performing the field similarity calculations. You will probably want this option checked most of the time. The Steric Field value specifies the ratio of steric to electrostatic components in the overall field calculation. At 100%, models are aligned purely on the similarity of there shapes/volumes, while at 0% models are aligned purely on the distribution of their electrostatic charges. The Potential Probe popup specifies the neutral probe atom that the individual molecular forcefields are calculated with respect to before their similarity is determined. The probe atom only affects the steric field component of the field fitting, and usually has little effect on the overall alignment of a set of models. (If you think of steric field fitting as aligning models based on their respective shapes, then choosing a smaller probe atom, for example, hydrogen, will result in the surface of these calculated shapes being bumpier.) Cerius 2 Hypothesis and Receptor Models/April

26 2. Aligning Models Note Field alignment employs parameters which are specific to the current forcefield loaded into Cerius 2. Changing this forcefield may have a significant effect on the overall alignment of your models. Aligning models using Moments alignment Performing a Moments alignment Customizing Moments Align The Moments alignment method is the fastest, but least accurate, method for aligning models. It is typically employed for aligning a large number of models quickly (for example, to place models in the active site of a protein relative to the position and orientation of a ligand already docked in that site). To perform a Moments alignment, you must first import a couple of models into the Align Model table and bring up the Align panel as described in the previous section, Importing models for alignment. To perform a Moments alignment, set the Align Method option to Moments and the Align Type option to Rigid. Move and reorient your models relative to each other if you have just performed an alignment, then click the ALIGN button on the Align panel to perform a Moments alignment. Your models should now be approximately aligned in terms of centers of mass and relative orientation. If you wish to customize the Moments alignment, do so by opening the Align Preferences panel (by clicking either clicking the Prefs... button at the bottom of the Align panel, or clicking Align Preferences... on the ALIGN MOLECULES card). This is the same panel used in the previous section to customize the Field alignment. The top section of this panel contains options used in the Moments alignment process. There are two Moments alignment methods you can choose from: Inertia and Electrostatic. Inertia refers to principle moments of inertia. Electrostatic refers to dipole and quadrupole electrostatic moments. These are calculated for each model you align and are used directly to translate and orient models in the alignment. Typically, you would employ moments of inertia since these are welldefined for all models. However, if your models are highly polar 12 Cerius 2 Hypothesis and Receptor Models/April 1999

27 Aligning models using Moments alignment or have peculiar charge distributions you may find electrostatic alignment gives a better result. Moments alignment also employs field calculations to choose the relative direction that the moments of two models are aligned. Hence, changing the Steric Field factor could affect the resulting aligned moments orientations. Alignment of non-rigid models Selecting torsion angles So far only rigid alignment of models has been discussed, that is, the models have been aligned purely on the basis of translating and re-orienting one model relative to another. However, a better alignment is usually achieved when one model is allowed to be flexible so that it may adopt a conformation that is closer to that of the second model. Align models may be flexible with respect to torsion angles, (that is, bond dihedral angles). Torsions are not defined automatically since you may wish to have some models remain flexible (with torsions specified) while others are rigid (with no torsions specified) during the same (flexible) alignment. If you try to perform a flexible alignment on a set of models that have no torsions defined, Align will warn you that it is proceeding with a Rigid alignment. If you perform a Rigid alignment of models for which torsions are defined, however, these torsions are ignored. Align will always choose one model to treat as rigid when performing a flexible alignment. This will usually be the first model in the Align Models table or the first model it finds for which no torsions are currently defined. (For the target model Align strategy the first N models are always considered to be rigid please see Multiple model alignment strategies for more information on this topic.) To quickly specify which torsion angles to vary during flexible alignment, click the Find Torsions button. This defines all the movable torsions using the defaults in the Search Torsions panel. The setting affects all models in the Align Model table which do not already have torsions defined. To manually specify individual torsions for a model, use the Search Torsion control panel. To open the Search Torsions control panel, go to the ALIGN MOL- ECULES menu card and select Torsions. This control panel is also used by the Conformer Search module (see the chapter on con- Cerius 2 Hypothesis and Receptor Models/April

28 2. Aligning Models Flexible alignment Customizing flexible alignment former search in Cerius 2 Conformational Search and Analysis). Click the FIND button to specify all the movable torsions on the current model. When you look back at the Align Models table you will notice that the value in the #Tor column has been updated to reflect the numbers of torsions currently defined on your align models. Once you have two models in the Align Models table and have defined torsions for at least one of them you are ready to perform a Flexible alignment. Choose the Align Method you wish to use (Field or Moments) in the Align panel, select Flexible as the Align Type and click the ALIGN button. To customize the flexible alignment, open the Align Preferences panel (by either clicking the Prefs... button at the bottom of the Align panel, or clicking Align Preferences... on the ALIGN MOL- ECULES card). The bottom section of this panel contains options you can adjust for the Flexible alignment process. There is only one option you can adjust for Flexible alignment: Bump Check. By default this option is unchecked, indicating that there is no bump checking during the part of the alignment that optimizes the movable torsions. If you select Bump Check, then another option, Bump Factor, appears. Bump checking ensures that the conformations resulting from torsional optimization are reasonable (that is, that the conformation is not invalid due to large van der Waals overlaps of atoms). The Bump Factor indicates how important these overlaps are higher values mean you are less likely to produce aligned conformations with atom overlaps. Since at least one model is always treated as a rigid reference for the alignment, if your models start with valid conformations they will usually have valid conformations after the alignment. Hence, since bump checking can significantly reduce the performance of an alignment, the default setting is no bump checking. Multiple model alignment strategies When aligning more than two models you have additional options for how you wish to perform the alignment. In version 3.5 of Cerius 2 Align, you would specify just one of your models as the target model to which all others would be aligned. However, the default alignment strategy for Align is now Consensus alignment. 14 Cerius 2 Hypothesis and Receptor Models/April 1999

29 Aligning models using Moments alignment Performing a Consensus alignment of four models. Performing a target model alignment to a single model The consensus strategy is to align all models to each other, such that the overall alignment value (the similarity or RMS) is optimized. This does not necessarily give the best alignment for any given pair of models in a set of consensus-aligned models but can be thought of as an alignment of all models to a virtual model which is the average conformation of the aligned models. The other alignment strategy available to you is Target Model. In this strategy all models are aligned separately to one or more specified target models. So far you have been using consensus alignment by default. This was not mentioned previously because with only two models the results are the same for consensus and target model alignments. To perform a Consensus alignment of four models, import two more models into the Align Models table using the method described in Importing models for alignment, so that you now have four align models. On the Align panel select Consensus as the Align Strategy, Field as the Align Method, and Rigid as the Align Type parameters. (You may use Moments and/or Flexible align if you wish.) Click the ALIGN button to perform the Consensus alignment. Your view should be updated to show all four models aligned and superimposed. To perform a target model alignment, open up the Align Models panel if it is not still in view. Notice that there is a option called Target Models which currently has the value of 1. This specifies that there is currently one model specified as a target model in fact it is always the first model in the table. Change the Align Strategy option on the Align panel to Target Model and then click the ALIGN button. Your models should now be aligned to the first model in your table and results for the individual alignments should appear in your text window. Now you should change the target model in the Align Models table. To do this select a table row, (for example, row 3), and click the Target Model icon. This moves the row to the top of the table and designates the model in this row as the new target model. Click the ALIGN button in the Align panel again to verify that your models now align to this new target model. (You may wish to move your models around before you align them to see the effect more clearly.) Cerius 2 Hypothesis and Receptor Models/April

30 2. Aligning Models Performing a Consensus Alignment on a subset of Align models Performing a target model alignment to multiple target models To perform a Consensus alignment on a subset of Align models, go to the Align Models table and click on the cell in the Use column for the last model in the table. This should toggle the value from Yes to No. Next you must mark all models as used in the Align Models table to be presented in the Model View window. To do this, click the Update Overlay Display button. (You may also select the last row and click on the Remove Model button. However, this totally removes the model from the Align Models table and you must then re-import it to follow the subsequent steps.) On the Align panel set the Align Strategy back to Consensus and click the ALIGN button. This time just the three models are consensus aligned. Using the target model Align strategy you can now take your fourth model and align it to the three consensus aligned models in the last step. First you must specify the first three models as target models. There are two ways to do this: 1. Set the Target Models value in the Align Models table to 3. or 2. Select three table rows and click the Target Model button. (This also moves these models to the top of the table if they are not already there.) Now include the last model in the align models set by clicking on the Use table cell for that model to set it back to Yes. On the Align panel set the Align Strategy option to Target Model and click the ALIGN button. Your fourth model should now be aligned relative to the other three, which will not move since they are specified as align target models. To see this more clearly you may wish to move the target models apart and click the ALIGN button again. Your fourth model should now be aligned in the center of your target models. If you had more than four models in your Align Model table then each non-target model would be aligned with the target models set individually. Since target models do not move in the alignment, any torsions defined on those models are ignored. You are also unable to change the value in the Use column for target models from Yes while Target Model is specified as the Align Strategy. 16 Cerius 2 Hypothesis and Receptor Models/April 1999

31 Aligning models using Moments alignment RMS alignment Preparing models for RMS alignment: atom matching RMS alignment aligns models by producing the best superposition of two or more models using the collective RMS of their atomic coordinates (please see Theory: Alignment for more information on this topic). the RMS method is generally faster and more accurate than forcefield-based alignment but requires that for each pair of models, the corresponding atoms be identified. This is referred to as RMS atom matching. To match atoms between pairs of models, use the Align Models panel (open it by clicking Align Models on the ALIGN MOLE- CULES card). There are two ways to perform atom matching between models: automatically and manually. Usually you should use a combination of both methods. Click the Common Subgraph button in the Align Models table. When the dialog box appears, read what it says and then click OK. After a moment you will see that the table columns under Number of Align Atom Matches, M1, and M2 fill up with integer values. These values represent the number of current atom matches specified between pairs of models (for example, for the first model in row 1 the value in the M3 column is the number of atom matches to the third model in the table, in row 3). To see all the models in the Align Models table with green dashed lines between atom pairs, showing the current atom matches specified, click Update Overlay Display. (You may wish to drag one of the models around to see this more clearly.) You can click on the Display Atom Matches check box to hide or redisplay these align atom matching monitors. This last operation creates atom matches between each pair of models in the Align Models table. This is often necessary when you are preparing to generate QSAR Alignment Similarity descriptors based on RMS alignments (refer to next chapter). However, you usually do not need to specify atom matches between each pair of models. For example, to specify that four atoms named "A" on four models are to be superimposed during RMS alignment you could simply select three atom match pairs: (Model1:A,Model2:A), (Model2:A,Model3:A), and (Model3:A,Model4:A). This implicitly defines the atom matches (Model1:A,Model3:A), (Model1:A,Model4:A), (Model1:A,Model3:A), (and so on), which are necessary for the alignment calculation. Alternatively, the same set of atom matches Cerius 2 Hypothesis and Receptor Models/April

32 2. Aligning Models can be defined by specifying atom matches between the atoms of one model to each of the other align models, for example: (Model1:A,Model2:A), (Model1:A,Model3:A), and (Model1:A,Model4:A). This is the recommended procedure when setting up atom matches between multiple models from scratch and simply involves selecting one model (row) in the Align Models table and clicking the Common Subgraph button. (When setting up for targeted alignment you should choose one of the target models as your selection.) Note Performing an RMS alignment Editing atom matches If you explicitly specify an atom match, (for example: Model2:A,Model3:B), using an atom which is already implicitly matched to another atom in the same model, (for example: Model2:A,Model3:A), then your overall alignment may be affected. Here you are effectively specifying that the atom Model2:A is to be superimposed for RMS alignment for both atoms Model3:A and Model3:B in fact Model2:A will be superimposed on the geometric center of atoms Model3:A and Model3:B. Due to the implicit atom matches, this means all atoms matched to Model3:A are also matched to Model3:B. This situation can occur quite easily when using common subgraph matching to generate atom matches between multiple align models, as a result of the ambiguity in choosing the best common subgraph matches (when equivalent topological mappings of atoms exist due to local symmetry in models). To perform an RMS alignment, go to the Align panel and set the Align Strategy option to Consensus and the Align Method to RMS. Click the ALIGN button. The models should be realigned by RMS value, and the consensus RMS value should be reported in the text window. Like field and moments alignment methods, you may also use RMS alignment with the target model strategy and flexible or rigid alignment options. The Common Subgraph tool produces atom matches based on the common topologies of the atoms in each pair of align models. It then chooses between equivalent best topology matches by the RMS of superposition of the these atoms sets using the current align model conformations. This tool works in combination with selected rows in the table: With one row selected, atom matches are made between this align model and all others in the table. With two or more rows selected, atom matches are made between all 18 Cerius 2 Hypothesis and Receptor Models/April 1999

33 Aligning models using Moments alignment these align models. Above you used this tool with no rows selected, which has the same effect as selecting all rows. The Clear Atom Matches tool works with selected rows in exactly the same way as the Common Subgraph tool, except that it clears the atom matches between align models. When your Align models have similar chemical features, the Common Subgraph atom matching often gives you exactly the atom matching you want. More often, however, you will want to adjust the resulting matches by adding or subtracting individual atom matches or even specifying all the atom matches by hand. To manually edit the atom matches, you must first have the Align models in the same view (use Overlay Viewing Mode) and then click on the Pick Atom Matches icon to set the global atom picking mode to atom matching. To add a new atom match, simply pick one atom in one Align model, then pick its match in a second Align model. A new atom match monitor appears and the corresponding number of atom matches in the Align Models table are updated. To remove an existing atom match pick a pair of atoms on two Align models for which an atom match monitor already exists. (You may wish to employ the Use column of the Align Models table and the Update Overlay Display button to temporarily hide other Align models from the view to make the atom picking easier.) You can also pick atom matches between displayed models which are not already in the Align Models table. When you do this, these models are added to the table with the atom match you picked. (Note that this is an alternative way to import models into the Align Models table.) Cerius 2 Hypothesis and Receptor Models/April

34 2. Aligning Models 20 Cerius 2 Hypothesis and Receptor Models/April 1999

35 3 Incorporating Align Data in QSAR This chapter discusses: The Align module functionality is designed to physically align sets of models in space. Align only returns a small amount of numeric data for analysis to the text window: either the overall consensus alignment similarity/rms or the similarity/rms values for each model aligned to the target model(s). You can also perform alignment similarity/rms calculations using the alignment similarity descriptors through the QSAR Study Table. In this case more statistics are available for an alignment calculation but no model s position or configuration is actually altered. The topics covered in this chapter are: Preparing models for generating alignment similarity descriptors Adding alignment similarity descriptors to the QSAR Study Table Using the study table to analyze a consensus alignment Using alignment similarity descriptors for QSAR analysis Recalculating alignment similarity descriptors Preparing models for generating alignment similarity descriptors Alignment Similarity descriptors are calculated for models in the Study Table with respect to a specific model or alignment of models. These target models are the same ones specified in the Align Models table. Thus before you can add Alignment Similarity descriptors to the Study Table, you must first import models into the Align Models table and denote them as target models (please refer to Chapter 2, Aligning Models). If you are using more than Cerius 2 Hypothesis and Receptor Models/April

36 3. Incorporating Align Data in QSAR one target model, you should also perform a consensus alignment on these models. Adding alignment similarity descriptors to the QSAR Study Table With one or more target models loaded into the Align Models table you can add alignment descriptors to the QSAR Study Table. To add these descriptors, go to the Study Table pulldown menu commands and select Descriptors/Select... to open the Descriptors control panel. Use the Descriptors in Family popup to select the Align family. Now click the Preferences button to open the Alignment Similarity control panel. This panel allows you to choose descriptors to add to the Study Table. You may add up to six descriptors: three types of qualitative calculations based on alignments using two different alignment strategies. A descriptor (column) of each of the types specified here is added to the Study Table for each target model. When a descriptor is actually calculated for a Study Table model, this model is first aligned to the target models using the specified method (although the model s coordinates are not actually changed as they would be if an Align command was used). Electrostatic Field Similarity, Steric Field Similarity, and Matched Atom RMS values are then calculated, based on this alignment, for each Align target model. 22 Cerius 2 Hypothesis and Receptor Models/April 1999

37 Adding alignment similarity descriptors to the QSAR Study Table Note Matched Atom RMS values can only be calculated if atom matches between a QSAR model and the Align target model(s) are specified in the Align Models table. You normally only need calculate these descriptor values if you are also using RMS as your Align Method. In this case, all align models must have atom matches defined for each of the alignment target models, since each alignment descriptor value (table cell entry) represents a RMS value for a pair of models. (This is also the case if you wish to create pairwise field similarity descriptors when the Align Method in the Align control panel is set to RMS.) The method of aligning QSAR (Study Table) models to target models when calculating descriptor values is largely determined by current settings in the Align panel. For example, the Align Method, Align Preferences, and Align Type effect whether or not the models are treated as flexible during alignment. However, the choice of Align Strategy is controlled by the type of descriptor calculated: Consensus target alignment specifies descriptors calculated by aligning a model with the consensus alignment of the target models. This may be thought of as aligning a model with a virtual model which represents the average (consensus) alignment of all the target models, although the descriptor values are still calculated with respect to each target model individually. Pairwise targets alignment specifies descriptors calculated by aligning a model with each individual target model. Hence the current alignment of the target models with respect to each other target model is not relevant when calculating pairwise targets alignment descriptors. Once you have selected which descriptors to add to the Study Table, select the Alignment Similarity row in the Descriptors table and click the ADD button. After a few moments the new descriptor columns appear in the Study Table. Since these descriptors are relative to specific target models the names of the these target models are incorporated into the descriptor names. Cerius 2 Hypothesis and Receptor Models/April

38 3. Incorporating Align Data in QSAR Using the study table to analyze a consensus alignment When you perform a consensus alignment on a set of models using the Align module only an overall alignment similarity or RMS is reported to the text window. If you wish to examine this overall alignment in more detail, (that is, to look at each pair of models involved in the overall alignment), you must calculate alignment similarity descriptors in the Study Table. To calculate alignment similarity descriptors: 1) Change all consensus aligned models in the Align Models table to target models. 2) Choose and add the alignment similarity descriptors you wish to examine (as described above). 3) Add the aligned models into the Study Table. The descriptor values are calculated for the models as they are added to the table. Note Before adding the models to the Study table you should turn the energy minimization option OFF within the Molecule Preferences control panel of the Study Table. Minimizing the models is not desirable because it affects the current alignment of the models. Using alignment similarity descriptors for QSAR analysis Alignment similarity descriptors can be useful for drug discovery when combined with activity data for your target models. Typically you first align a small number of active compounds and use these as your target models to generate alignment similarity descriptors. Alignment similarity descriptor values calculated for (non-target) models in the Study Table represent a qualitative measurement of the structural and forcefield similarities between these models and 24 Cerius 2 Hypothesis and Receptor Models/April 1999

39 Recalculating alignment similarity descriptors the active/lead/target molecules. To generate these values you need only add these models to the Study Table the descriptor values are then calculated automatically. Note Creating these descriptors for non- align target models actually requires that these models also be present in the Align Models table. However, these models are automatically added to this table if they are not already present when you check the option ON in the Alignment Similarity control panel Recalculating alignment similarity descriptors Alignment similarity descriptors are specific to a set of (aligned) target models, as specified in the Align Models table. Like other complex descriptors, the calculated values are saved with models as user data which is also used to indicate whether the displayed values are up-to-date with respect to all variables used to calculate them. (This user data is written for these models when the models are saved to a MSI formatted file and is loaded back into the Study Table when the models are added and these descriptors are present.) This user data is marked as invalid for alignment similarity descriptors when: Align models are changed, (for example, atoms edited or moved) Align models are moved with respect to relative orientations (for example, because of re-alignment) Alignment options and/or preferences are changed in the Align module. (When you make such a change for an align target model then all alignment similarity descriptors are marked as not current; if you change a non- align target model, only the descriptors for this model are marked as not being current.) Alignment Similarity descriptors may be recalculated by issuing the Recalc Descriptors command from the Study Table s Descriptors command menu. Recalculation does not happen automatically whenever the current values for the descriptors are outdated, Cerius 2 Hypothesis and Receptor Models/April

40 3. Incorporating Align Data in QSAR since you may still want to view the old values or you may wish to make several changes to your alignment set up before performing a costly recalculation of all your QSAR descriptors. On the other hand, issuing the Recalc Descriptors command when your descriptors are already updated takes only a moment. You may remove alignment similarity descriptors from the Study Table at any time by simply selecting the descriptor column and deleting it using the Cut button. You can add these descriptors back, or add new ones, by simply re-adding the alignment similarity descriptors via the Descriptors control panel. If you change the target models in the Align Models table and re-add the descriptors to the Study Table, you may end up with alignment similarity descriptors (columns) which are no longer valid. Invalid descriptor values simply show up as empty table cells. Please refer to the Cerius 2 QSAR+ documentation for more general information on manipulating descriptors and the Study Table. 26 Cerius 2 Hypothesis and Receptor Models/April 1999

41 4 Theory: Alignment Model Alignment Strategies and Methods Align offers two strategies for aligning a set of models: 1. Consensus alignment aligns a set of specified models to a virtual model which represents the best average alignment between each pair of models. In this iterative alignment strategy all models are initially aligned to the first model to move the models into a reasonable starting alignment. Each model is then fit to the rest of the models as if these represented one large model but with the fitting parameter (that is, RMS (Root Mean Square Superposition) or field similarity) averaged over the number of models. This process is repeated for each model in turn (at least once per model) until the overall fitting parameter converges (for example, until the change in RMS over successive iterations is small enough). 2. Targeted alignment aligns a set of specified models to one or more specified target models. With just a single target model the resulting alignment is a set of pair-wise alignments of the specified models to the target model (which does not move). This alignment strategy is equivalent to that employed in previous versions of Cerius 2 Align. When you specify multiple models as the target for an alignment these models are not re-aligned, but instead represent a consensus target (virtual model) for other models to be aligned against. For example, you may wish to perform a consensus alignment on a set of active analog models to find an active configuration. Using this active configuration as your alignment target, you may align other models to investigate their ability to adopt a similar configuration. For either of the above two multiple-model alignment strategies you may employ one of three Align Methods: Cerius 2 Hypothesis and Receptor Models/April

42 4. Theory: Alignment 1. RMS Atoms alignment performs pair-wise model alignments based on the superposition of a set of matching atoms. You must define these for each pair of models in the Align Models panel. This method is relatively fast and accurate for models where a strong correspondence between specific sets of atoms (for example, functional groups and backbone) can be readily identified. 2. Moments alignment aligns models using either electrostatic moments or principal moments of inertia. A field similarity calculation is made to determine the best alternative orientation for these moment vectors. This is the fastest, but least accurate of the alignment methods. It is useful for tasks such as placing models in protein cavities. 3. Field alignment aligns models by maximizing the overlap between the steric and electrostatic fields calculated about the models using a probe potential (for example, a point positive charge). This method may be slower and less accurate than RMS but has the advantage of not requiring definition of any matching atoms. Non-optimal alignment may occur due to the fitting finding local minima when optimizing the field overlap. This effect can usually be reduced by decreasing the number of flexible torsions and/or choosing a more appropriate probe potential for your models. Non-target models may be aligned in two ways: rigidly, using their current configurations; or flexibly, using a set of specified torsion bonds which are rotated about during the alignment procedure. Generally, using flexible models produces a better alignment but takes a little longer. RMS Superposition With RMS Atoms selected as the Align Method, models are aligned using superposition of the matching atoms between each pair of models. The function to be minimized is the sum of squares of the distances between all atoms to be superimposed, as in Eq Cerius 2 Hypothesis and Receptor Models/April 1999

43 RMS Superposition M 1 M Npq (, ) F = ( r ip r iq ) 2 p = 1 q = p + 1 i = 1 Eq. 1 where: M = the number of molecules; N(p,q) = the number of atoms between the p th and the q th molecules to be aligned; and r = the transformed atomic coordinate of the i th ip atom in the p th molecule. This is described as: r = f( r 0, θ, t, φ) Eq. 2 where: r 0 = the original atomic coordinate; θ = the rotation angles of the molecule; t = the translation vector of the molecule; and φ = the angles of all torsions in the molecule to be optimized. Eq. 1can be minimized by conventional nonlinear least-squares methods. The RMS value reported by the alignment calculation is: RMS = F N pair Eq. 3 where F is calculated using Eq. 1, and Cerius 2 Hypothesis and Receptor Models/April

44 4. Theory: Alignment N pair = M 1 p = 1 M q = p + 1 Npq (, ) Eq. 4 Electrostatic Potential Similarity A number of different techniques have been proposed and applied to electrostatic potential similarity calculation, which is becoming a well-established modeling technique (Good, 1992). Different formulas for similarity determination have been proposed. Here, we use the Hodgkin index as discussed by Good for electrostatic potential similarity calculation: SF( a, b) = 2 P a P b dv Eq. 5 In Eq. 5, P a and P b are the electrostatic potentials for molecules a and b, which are dependent on the atomic charges and distance according to Eq. 6: P adv + P bdv P r = n i = 1 Q i R i r Eq. 6 where: n = the number of atoms in the molecule; r = the coordinate where electrostatic potential is to be evaluated; R i = the coordinate position of atom i; and 30 Cerius 2 Hypothesis and Receptor Models/April 1999

45 Steric Shape Similarity Q i = the charge assigned to atom i. The value of the function, SF, ranges from -1, maximum dissimilarity, to 1, indicating identical potentials. A value of 0 corresponds to two molecules with zero electrostatic potential overlap, either because the molecules are far apart or because the value of the positive overlap equals the value of the negative overlap. For multiple molecules, Eq. 7 is used for the similarity calculation and optimization: SF = M M ( M 1) SF ( a, b ) a = 1 M b = a + 1 Eq. 7 where M = the number of molecules. Here, the SF function again ranges from -1 (inverse) to 1 (identical). Steric Shape Similarity The molecular steric similarity of two molecules is calculated with Eq. 5, where P a and P b are the steric functions for molecules a and b, which are Lennard-Jones potentials as in Eq. 8 ("9-6" potential) or Eq. 9 ("12-6" potential), depending on forcefield choice. P r = i=1 n e - k s - 9 s - r 6 r Eq. 8 P r = i=1 n e - k s - r 12 s - 6 r Eq. 9 where: n = the number of atoms in the molecule; Cerius 2 Hypothesis and Receptor Models/April

46 4. Theory: Alignment r = the coordinate where steric potential is to be evaluated; k = a constant; e = epsilon for the atom type; and s = sigma for the atom type. The function SF ranges from 0, meaning zero steric overlap (molecules are too far apart), to 1 indicating identity. Field Alignment (Electrostatic & Steric Similarity) The combined similarity is calculated for field (and moments) alignment using Eq. 10. SF = w SF( steric) + ( 1 w) SF( electrostatic) Eq. 10 where: w = user-specified weighting factor, ranging from 0 to 1. The combined similarity function SF ranges from -1, meaning maximum dissimilarity, to 1, indicating identity. Steric Clash (Bump) Checking To avoid van der Waals clashes in flexible fitting, a penalty function is added to the similarity function or the RMS function during the optimization process, as in Eq. 11. F( Optimize) = SF + w F( penalty) Eq. 11 where: SF = the similarity function or RMS function. The following penalty function is used: 32 Cerius 2 Hypothesis and Receptor Models/April 1999

47 Implementation: Gaussian Approximation of Overlap Functions i=1 r o F( penalty) = r n 2 Eq. 12 F( penalty) 0 when r 2 2 = r o Eq. 13 where: n = the number of atom pairs between rotatable segments in the molecule; 2 r o = the sum of the van der Waals radii; r 2 = the distance between the two atoms. To provide a degree of "softness" in the penalty function, scaling factors are applied to the van der Waals radii of atoms. The scaling factors are: 1-4 interactions: 0.85 H-bond candidates: 0.65 Others: 0.95 These scaling factors may not be altered but an overall scaling of the contribution of van der Waals clash penalty function can be altered using the Align Preferences panel. Implementation: Gaussian Approximation of Overlap Functions In the electrostatic calculation, P r in Eq. 6 is replaced by a Gaussian function approximation of two terms, as in Eq. 14. The integrals in Eq. 5 have a simple form based on exponent values and the distance between atom centers (Good et al., 1992) e r2 r e r2 Eq. 14 Cerius 2 Hypothesis and Receptor Models/April

48 4. Theory: Alignment The Lennard-Jones potential in Eq. 8 or Eq. 9 is approximated by the following Gaussians over a range of r covering just inside the repulsive part of the potential, as follows: LJ 9 6 ( ε,σ,r) ε exp ( 6.81( r σ) 2 ) LJ 12 6 ( ε,σ,r) ε exp ( ( r σ) 2 ) Eq Cerius 2 Hypothesis and Receptor Models/April 1999

49 Implementation: Gaussian Approximation of Overlap Functions Cerius 2 Hypothesis and Receptor Models/April

50 4. Theory: Alignment 36 Cerius 2 Hypothesis and Receptor Models/April 1999

51 Part 2 Receptor Introduction to Receptor Receptor QuickStart Generating a Receptor Model Evaluating Structures Against a Receptor Model Theory: Receptor Models

53 5 Introduction to Receptor Where to learn about Receptor The Receptor application creates hypothetical models, called receptor surface models, that characterize the active site of an enzyme or antibody based on the construction of surfaces to represent spatial and electrostatic properties of a receptor active site. Molecular models are minimized within the receptor surface model and interaction energies are calculated, which allows the evaluation of new candidate compounds. Receptor site models are inferred from activity data of a set of compounds considered to act upon the active site of an enzyme or antibody whose 3D structure is unknown. We assume an underlying complementarity between the shape and properties (for example, hydrophobicity, hydrogen bonding proclivity) of the receptor and the set of compounds that bind to it. This chapter provides an overview of Receptor. The chapter discusses: Accessing Receptor (next section) Using Receptor Describes a typical flow of activity in the module. The discussion directs you to appropriate chapters in Cerius 2 books where more detailed information about each activity can be found. You can get a feel for Receptor s capabilities and interface by working through the Receptor QuickStart. See the other Receptor chapters for details of: Generating a Receptor Model, Chapter 8 Evaluating Structures Against a Receptor Model, Chapter 9 Theory: Receptor Models, Chapter 10 Cerius 2 Hypothesis and Receptor Models/April

54 5. Introduction to Receptor Accessing Receptor Before you begin To start Receptor This section describes the procedure for starting Receptor and the things you need to work with the software. You must have one or more aligned structures displayed in the model window (see Part One, Alignment or Part Three, Pharmacophore in this book, or see Molecular Shape Analysis in Cerius 2 QSAR+). 1. Select Drug Discovery from the Deck Selector popup. The Drug Discovery card deck appears. 2. Choose the Model Receptor menu card from the card deck. Using Receptor The typical flow of tasks in Receptor is outlined in the following steps: 1. Begin with a single molecule or a set of structures that have been aligned using Alignment, ConFirm, or the alignment capabilities of Molecular Shape Analysis. If you have a lot of data it is better to use a subset of the most active compounds. Note that you can generate structures for alignment in many ways, including these: Build one or more structures using the Cerius 2 3D Sketcher (see the Cerius 2 Modeling Environment and the Cerius 2 Builders) Generate a series of analogous structures using the Analog Builder (see the Cerius 2 Modeling Environment) Import previously built structures using the Load Model control panel accessed from the File pulldown menu on the menu bar (see the Cerius 2 Modeling Environment) 2. Generate a receptor model after you have identified the solvent region (if any) of the molecule and the structures you want to 40 Cerius 2 Hypothesis and Receptor Models/April 1999

55 Using Receptor use as templates. Alternatively, load a receptor model saved from previous work (Saving and restoring a receptor model). Before you generate a model, you can adjust generation preferences by accessing the Preferences control panel (Setting preferences). 3. After the receptor model is generated, you may map properties such as charge or hydrophobicity onto the surface of the model (Mapping property values). 4. Orient new candidate structures within the receptor model to evaluate the quality of the fit between the structure and the receptor surface. Evaluating structures inside a receptor model provides quantitative information that you can use to rate the fit. Before you evaluate a model, you can adjust evaluation preferences by accessing the Preferences control panel (Setting preferences). Note that you can automatically orient candidates to a model. 5. Map favorable and unfavorable interactions between structures and the surface of the receptor model (Viewing the results). 6. Incorporate the receptor data into a QSAR to further evaluate and to estimate the biological activities of your structures (Incorporating receptor data in QSARs). 7. At this point, you can choose either of two iterative processes that use the steps described above: Modify (optimize) candidate structures within the receptor model (see the Cerius 2 Modeling Environment and the Cerius 2 Builders) and re-evaluate them (Evaluating structures). Generate refined receptor models using new alignments of the template structures you have been using, or use new template structures. Alternatively, you can load saved receptor models from previous work. New receptor models are available for evaluating new or previously evaluated structures (Generating a receptor surface model). Cerius 2 Hypothesis and Receptor Models/April

56 5. Introduction to Receptor 42 Cerius 2 Hypothesis and Receptor Models/April 1999

57 6 Receptor QuickStart This chapter allows you to work in Receptor to solve a simple problem. QuickStart You begin by reading in a set of compounds, the dopamine betahydroxylase inhibitors. 1. Load beta hydroxylase inhibitor model files from Cerius2- Resources directory Starting from a new Cerius 2 session, load all beta hydroxylase inhibitors from the directory Cerius2-Resources/ EXAMPLES/DBH: files dbh49.msi through dbh52.msi. Make all the loaded structures visible and set Cerius 2 into Overlay mode. Note that the structures are pre-aligned. 2. Add the models to the Study Table Open the Study Table and select Molecules from the Preferences pulldown. When the Molecule Preferences panel appears, click the Add Hydrogens, Minimize Energy, and Calculate Charges buttons off, then select Add All from the Molecules pulldown menu. Cerius 2 Hypothesis and Receptor Models/April

58 6. Receptor QuickStart All four models are added to the Study Table, without being minimized, which might disturb the alignment. 3. Enter biological activity data Enter the following values into the Activity column: dbh dbh dbh dbh Create a receptor surface model Select DRUG DISCOVERY from the card deck, then click MODEL RECEPTOR and then Generate Receptor Model to start the Receptor module. Click Preferences to bring up the Preferences control panel and then click the Use QSAR Activity Data button. This tells Receptor to weight the contribution of each model by its biological activity. Compounds that are highly active contribute more to the receptor surface model than less active ones by default (although this can be reversed). Next click the GENERATE button of the Generate Receptor Model control panel. This creates a receptor surface model. Note the Activity weight information messages in the textport. The way the shape of the receptor surface model reflects the shapes of 44 Cerius 2 Hypothesis and Receptor Models/April 1999

59 QuickStart the contributing models is probably obvious, but there is also an electrostatic component to the receptor model. The potential exerted by each molecule (atomic point charge model) is calculated at each point on the receptor model surface. These potentials are summed and may be displayed by clicking the ELECTROSTATIC Map Property button. You can make the colors more vivid by resetting the Min and Max values to and Construct a new lead compound Next you will construct a hypothetical hydroxylase inhibitor. Load the model Cerius2-Resources/EXAMPLES/DBH/ dbh33.msi. Make all the models invisible except this new model, dbh33. Using the 3D-Sketcher, change both of the hydrogen atoms that are ortho to the fluorine atom into fluorine atoms, creating a 3,4,5-trifluoride. Clean up the structure with the Clean button on the Sketcher control panel. 6. Evaluate the new compound in the receptor model Now you can evaluate the new molecule in your receptor model. Click Evaluate Molecules. When the Evaluate Receptor Model control panel appears, click TOTAL ENERGY. The energies of interaction (E interact ) between each molecule and the receptor surface model are displayed in the table of the Evaluate Receptor Model panel. Your new lead interacts with an energy of approximately -9 kcal, which makes it a little less active than the two Cerius 2 Hypothesis and Receptor Models/April

60 6. Receptor QuickStart most active compounds in the series. Make the receptor surface model visible once more by clicking the Visible buttons of the Cerius 2 Model Manager. The interaction between the new lead and the surface is mostly favorable (purple areas) with only one region of unfavorable interaction: the part of the surface colored green complements a phenolic proton in most of the contributing models (including three of the four most active compounds). The fluorine atom is even more electronegative than the phenolic oxygen atom so it interacts favorably with the part of the surface nearest to it, causing an intensely purple area. We can show that these interactions are primarily electrostatic by coloring the surface by VDW energy. Click the VDW ENERGY Map Property button. Note that there are no green or purple areas near the para substituent, since F is about the same size as O. Sterically, our new lead is a good fit, but electrostatically the absence of a positively charged hydrogen atom causes unfavorable interactions which reduce biological activity. 7. Finish up The QuickStart lesson is now over. To end the Cerius 2 session, close all open panels and select File/Exit from the Visualizer menu bar. If you want to go on to another tutorial, or use Cerius 2 to run an experiment, first close all panels and select File/New Session from the Visualizer menu bar. 46 Cerius 2 Hypothesis and Receptor Models/April 1999

61 7 Tutorial: Receptor Surface Analysis (RSA) This chapter shows how to use the new Receptor Surface Analysis functionality in QSAR. Construct a Receptor Surface Model provides specific instructions on how to select a set of Independant Variables based on Varianced a Receptor Surface Model from a set of aligned biologically active models. Use the new QSAR descriptor Receptor_RSA provides specific instructions on how to add energies evaluated at each vertex of the Receptor Surface Model to the Study Table. Construct a QSAR relationship tells the user how to select a subset of the added surface points on the basis of highest variance and use these points to derive a QSAR equation which identifies the most significant points on the surface. Before you begin To complete this tutorial, you need a licensed copy of Cerius 2 that includes these modules: QSAR+ Receptor Cerius 2 Tutorials Life Science/April

62 Load Models and add them to the Study Table In this lesson you will add molecules to the QSAR Study Table, and enter activity data. Load beta hydroxylase inhibitor model files from Cerius- Resources directory. Starting from a new Cerius 2 session, load all 47 beta hydroxylase inhibitors from the directory Cerius-Resources/ EXAMPLES/DBH/, files dbh02.msi through dbh52.msi. To clean up the models with the Receptor clean method, select DRUG DISCOVERY from the application pulldown menu, then pick MODEL RECEPTOR. Click on Evaluate Molecules to bring up the Evaluate Receptor Model control panel. Click on the EVALUATE button. This will optimize the models with the Receptor clean utility. To add the models to the Study Table, bring up the Study Table and select Molecules... from the Preferences pulldown. When the Molecule Preferences panel appears deselect the Add Hydrogens, Minimize Energy and Calculate Charges buttons, then select Add All from the Molecules pulldown menu. 48 Cerius 2 Tutorials Life Science/April 1997

63 All 47 models will be added to the Study Table. To import biological activity data, select Import... from the File pulldown menu. When the Import Table panel appears select File Contains Column Labels. Then read in the dbhactivity.dat file from the Cerius2-Resources/EXAM- PLES/DBH directory. Select the column of activity data by clicking on the column heading -log(ic50). Copy this column of data by clicking on the COPY button in the Table Manager. To transfer this column to the Study Table, click on the column heading Activity, then select Paste from the Edit pulldown of the Study Table. Insert the -log(ic50) column before the blank Activity column, then select Delete from the Edit menu to delete the blank. Generate a Receptor Surface Model (RSM) Now we will create a Receptor Surface Model using a subset of the most active models. All the models except the last one loaded (dbh52) should be invisible at the moment. Make models dbh51 through dbh49 visible by clicking on the Visible buttons in the Model Manager. Cerius 2 Tutorials Life Science/April

64 Next, click on Generate Receptor Model in the deck of cards menu for MODEL RECEPTOR. When the Generate Receptor Model control panel appears, click on PREFER- ENCES... and select the Use QSAR Activity Data button. Now click on the -log(ic50) column header in the Study Table, then select Set Y under the Variables pulldown. This identifies the -log(ic50) column as Independant Y variables. Receptor will use this activity data to weight the contribution of each molecule to the Receptor Surface Model. We want to build the RSM from only the most active compounds so set the apply To Molecule(s) popup menu to All Visible. Finally, click on the GENERATE button of the Generate Receptor Model panel to create the Receptor Surface Model (RSM). Add Receptor Surface energies to the Study Table. The energy of interaction between the active models and each point on the surface will now be added to the Study Table. This allows us to identify places on the surface which are related to biological activity, and use these to calculate a QSAR. Choose Select... from the Descriptors pulldown. When the Descriptors panel appears, select Receptor from the family popup menu, then click on the preferences button Receptor... Two panels will appear which control the addition of energies of interaction between the receptor surface model and the set of molecules to 50 Cerius 2 Tutorials Life Science/April 1997

65 the Study Table. The second panel deals with the surface - molecule interactions at each point on the surface, and the first panel deals with the sum of these interactions. For example, if selected, the button Nonbonded TOT Energy between Molecule and Receptor will add sum the VDW and electrostatic energies of interaction between each point on the surface for each molecule to the Study Table, i.e., one number for each molecule. If the TOT button in the RSA Preferences is slected, then the VDW and electrostatic energies at each point will be added. This may be thousands of columns of data, one for each point on the surface. Unless we perform some kind of filtering process to this, there will be a great many columns of data added, probably too much for the statistical methods we may want to apply to this data later. Therefore the Filter Surface Points popup menu in the RSA Preferences panel lets us reduce input to the Study Table by adding only every nth point, variance, and correlation. The default method for filtering Study Table input is to enter only ten percent of surface points, choosing those with highest variance. It is hard to decide before running the calculation what filtering method should be used, so a good place to start is to select the Add Every Nth surface point option. Next, click on the button under the ADD button in the Descriptors panel, and then click on row 23 Receptor_RSA to select the RSA descriptor. Finally, click on the ADD button. The Receptor Surface energies will be added to the Study Table and you will see columns added on the right (TOT/1, TOT/11, etc). We can see which points on the surface have been used by selecting Equation Viewer... and when the panel appears, click on the Show selected points More... button. Cerius 2 Tutorials Life Science/April

66 Change the 3D-QSAR Labels popup menu from MFA to RSA, then click on Label Independant Variables. If you have not already done so, put Cerius 2 Visualizer into Overlay mode so that all the objects are superimposed in the graphics window. The chosen Independant X variables show up as crosses. Since we chose the Every Nth surface point option they crosses should be evenly distributed over the surface. Calculate a QSAR Select the GFA statistical method, and increase the number of cross-overs by selecting Statistical Method from the Preferences pulldown of the Study Table. Increase the parameter Generations to Next we must include non-linear terms in the expression, so click on the Configure GFA... button and when the Configure GFA panel appears, select the Spline, Quadratic, Offset Quad and Quad Spline buttons. Click on RUN to start the calculation. After a few moments the Equation Viewer will appear. Use the 3D-QSAR Labels menu to show the surface points selected by the GFA method as crucial in the QSAR by clicking on the Label Current equation. The most important point will have the largest Loading Coefficient (the number in square brackets). This concludes the tutorial. 52 Cerius 2 Tutorials Life Science/April 1997

67 8 Generating a Receptor Model This chapter explains Before you begin To evaluate new structures for potential biological activity, you must first generate a receptor surface model from the overlay of known active compounds. Generating a receptor surface model Saving and restoring a receptor model Listing a receptor model Mapping property values You must have one or more aligned structures displayed in the model window (see Part One, Alignment, Part Three, Pharmacophore, or the Cerius 2 QSAR+ discussions of Molecular Shape Analysis). If you have a large set of active compounds, it is probably best to use a subset of the most active compounds to build your receptor surface model. Now you can display the properties stored at each vertex (electrostatic potential, partial charge, hydrophobicity, and hydrogen bonding propensity). You can generate structures to align and use in creating a receptor surface model through one of several methods: Build one or more structures using the Cerius 2 3D Sketcher or a Cerius 2 Builder (see the Cerius 2 Modeling Environment and the Cerius 2 Builders) Use the Analog Builder to generate a series of analogous structures (see the Cerius 2 Builders) Import previously built structures using the Load Model control panel accessed from the File pulldown on the menu bar (see the Cerius 2 Modeling Environment) Cerius 2 Hypothesis and Receptor Models/April

8. Generating a Receptor Model To begin the receptor generation process Choose Generate Receptor Model from the Model Receptor menu card in the Drug Discovery card deck.

68 8. Generating a Receptor Model To begin the receptor generation process Choose Generate Receptor Model from the Model Receptor menu card in the Drug Discovery card deck. The Generate Receptor Model control panel appears. Generating a receptor surface model To generate a receptor surface model When you generate a receptor surface model, the receptor surface that you see in the model window is a semi-transparent surface that surrounds the template structures. The molecules within the receptor are visible and can be edited, transformed, or otherwise manipulated using standard Cerius 2 tools. The model and the structures within it also can be manipulated together. 1. Select the structures that you want to use as templates for generating a receptor surface model. To make this selection, use the Apply To Molecule(s) popup on the Generate Receptor Model control panel. You can select all the structures listed in the model table (All), the models that are visible in the viewing 54 Cerius 2 Hypothesis and Receptor Models/April 1999

69 Generating a receptor surface model window (All Visible), or only the current structure (Current). All is the default selection. 2. If you want to adjust generation preferences, click Preferences. The Preferences control panel is displayed. For more information, see Setting preferences. 3. Use the Solvent Atoms popup to specify any structures or parts of structures that are exposed to solvent, You can choose no exposure (None), exposure of selected atoms (Selected), or exposure of a group (Group) of atoms that you have defined. None is the default selection. For information on making atom selections, see the Cerius 2 Modeling Environment. If parts of a structure are accessible to solvent, no surface is generated in that region and the receptor model is open. 4. Make adjustments to the model display. Use the Attributes of Receptor Model popup to specify the model that you want to affect. 5. If appropriate, increase or decrease the tightness of fit of the model with respect to the template structures. Make adjustments using the Surface Fit slider bar. Surface fit is defined by the distance in angstroms of the model from the van der Waals surface of the template molecules. The default distance is 0.10 Å. The slider can be adjusted from (closer fit) to 1.00 Å (looser fit). Fitness can be adjusted anytime after the receptor model is generated by moving the slider. When you change the fit of the receptor model, you affect the outcome of new structure evaluation. The geometry of a new structure in the receptor surface model and the interaction score for a new structure in the receptor can be affected. For example, if you generate a receptor model using the boat geometry of cyclohexane, the chair geometry can sit within the receptor if the surface fit is greater than or equal to zero. However, the interaction score will reflect the strain of this fit. If you lower the surface fit to less than zero, the molecule changes from the chair geometry to the boat geometry in order to fit in the receptor with the tolerance you have specified. The interaction score reflects the change to the more favorable geometry. Cerius 2 Hypothesis and Receptor Models/April

70 8. Generating a Receptor Model 6. Adjust the transparency of the receptor surface model by moving the Transparency slider bar. The percentage transparency is indicated in the field to the right of the slider bar. The default value is 50.0 percent. 7. Specify the property you want to map to the receptor surface by selecting one of the Map Property choices (see Mapping property values). 8. Click Generate. The receptor surface model is generated and displayed in the model window as a smooth semi-transparent surface contoured around the template molecules. If any property is mapped, the surface is colored according to the mapped property rather than the default color. The receptor model name is listed in the Model table and in the Using Receptor Model popup on the Evaluate Receptor Model control panel. Molecules are visible within the receptor when Model Receptor is in overlay mode and transparency is greater than zero. If you have no atoms accessible to solvent, the receptor surface completely encloses the molecules. If some atoms of the molecules are solvent-accessible, the receptor surface is open in the area of these atoms. Setting preferences To set preferences This section describes additional preferences that you can set for generating receptor surface models. 1. Click Preferences on the Generate Receptor Model control panel. The Preferences control panel is displayed. 56 Cerius 2 Hypothesis and Receptor Models/April 1999

71 Generating a receptor surface model 2. In the Generate Preferences section, check the Use QSAR Activity Data check box if you want Receptor to use activity data from the Study Table for the molecules that you have specified as templates to generate a receptor model. The activity data from QSAR is used to weight the relative importance of each template structure in generating the receptor model. If the box is not checked, all template molecules are assumed to have equal activity when the receptor model is generated. This is the default. Note that the Set Independent column (with Activity Data) must be set first, before checking this box. 3. Check the Include Solvation Correction if you want Receptor to add a penalty function when polar atoms are placed in hydrophobic regions of the receptor surface model. This box is not checked by default. 4. Choose the type of receptor surface you want Receptor to generate by making a selection in the Receptor Surface Type subpanel. The choices are: Cerius 2 Hypothesis and Receptor Models/April

72 8. Generating a Receptor Model van der Waals This is the default selection. When you make this selection, the receptor surface characterizes the van der Waals shape of the molecules. Atom positions of the aligned molecules are clearly defined. Soft When you make this selection, the receptor surface is a more abstract representation of shape. The surface is much smoother and hides individual atom details. Generation of this receptor surface is based on the Wyvill field function for soft objects. Saving and restoring a receptor model To save a receptor model To restore a receptor model Receptor models can be saved and reloaded at a later time for further work with the same or different structures. With the model you want to save set in the Model Manager as current and visible, select Save Receptor Model from the Model Receptor menu card. Use the File Selector tools to select and save the model. When you select a model from the browser box, the name of that model is entered in the filename field. Because receptor surface model files can be quite large, it may take some time for the file to be saved. Select Load Receptor Model from the Model Receptor menu card. A File Selector appears. Use the File Selector tools to select and load the model you want to display. Because receptor surface model files can be quite large, it may take some time for the model to be restored and displayed. Listing a receptor model To list a receptor model The property values stored at each point on the receptor surface model may be listed to the textport, to a table, or to a file. Select which receptor surface model you wish to list by selecting the List receptor model parameter. After choosing the destination (TEXTPORT, TABLE, or FILE), click LIST. When written to the textport, the listed data forms columns. 58 Cerius 2 Hypothesis and Receptor Models/April 1999

73 Mapping property values Surface Coordinates point X Y Z mol atom charge esp hbond hydrophob Column 1 is the number of the surface point, followed by its cartesian coordinates in columns 2, 3, and 4. The number of the molecule (in the bundle of aligned molecular models used to construct the receptor surface) followed by the number of the atom in the molecule follow in columns 5 and 6. The next four columns hold the properties partial charge, electrostatic potential, hydrogen bonding strength, and hydrophobicity. Mapping property values This section describes the properties of the receptor surface model that can be mapped, and the procedure for generating a map. Property values of the receptor are mapped onto the displayed receptor surface. One property can be mapped at a time. When a different property is selected, the map is automatically updated. Property maps are displayed as color regions of the receptor surface. These properties reflect the anticipated characteristics of the receptor that is being modeled. The intensity of the color reflects the magnitude of the mapped property at a particular location. Properties that can be mapped include the following: Nothing No property is mapped. The receptor surface is transparent or translucent white. Electrostatic potential When this property is mapped, each surface vertex is colored according to the potential value at the vertex position. Red areas have negative electrostatic potential, blue areas have positive potential, and white areas have neutral potential. Charge When this property is mapped, the surface color is based on the average of the charges of the template atoms closest to the receptor surface. Red areas are positively charged, blue are negatively charged, and white areas are neutral. Cerius 2 Hypothesis and Receptor Models/April

74 8. Generating a Receptor Model To map a property Hydrogen Bonding When this property is mapped, the color indicates the tendency for specific areas of the surface to act as hydrogen bond donors (purple) or acceptors (light blue). Areas of the model with no hydrogen bonding activity are colored white. Hydrophobicity When this property is mapped, the surface is colored brown to map the hydrophobic areas of the model. Areas that are not hydrophobic relative to the scale you have set on the panel are white. 1. Generate a receptor model (see the previous section). 2. Specify the property you want to map to the surface of the receptor by selecting one of the display choices from the Map Property box on the Generate Receptor Model control panel (that is, Nothing, Electrostatic, Charge, H Bonding, or Hydrophobic). The model automatically displays the map. There is no need to click the Generate button again. 3. If you want to adjust the value range of the property map, enter values in the entry boxes of the color legend located to the right of the property list in the control panel, then click on the radio button for that property. The box on the left side of the legend displays the minimum value for the property. The box to the right displays the maximum value. The range you set is relative, and simply adjusts the color display relative to that range. For hydrogen bonding, the acceptor box is on the left and the donor box is on the right. If you want to map another property, select another display choice. A new mapping is automatically displayed using the current minimum and maximum values. 60 Cerius 2 Hypothesis and Receptor Models/April 1999

75 9 Evaluating Structures Against a Receptor Model Receptor models are used to evaluate the biological activity of new molecular models. This chapter explains Evaluating structures Viewing the results Incorporating receptor data in QSARs Before you begin You should have a receptor model generated for evaluating structures. However, you can evaluate structures without a receptor model by running the energy evaluation algorithm alone. You also must have structures created, aligned, and loaded into the model table. You can generate structures using several methods, for example: Build one or more structures using the Cerius 2 3D Sketcher or a Cerius 2 Builder (see the Cerius 2 Modeling Environment and the Cerius 2 Builders) Use the Analog Builder to generate a series of analogous structures (see the Cerius 2 Builders) Import previously built structures using the Load Model control panel accessed from the File pulldown on the menu bar (see the Cerius 2 Modeling Environment book) The evaluation process attempts to fit structures to the receptor model. It adjusts the geometry of structures to fit the model in the orientation in which they are placed within the model. When the evaluation process is complete, you can examine several aspects of the results. Visual examination gives information on structural changes that may improve the fit into the receptor site. Property interaction maps give you additional visual information. Cerius 2 Hypothesis and Receptor Models/April

9. Evaluating Structures Against a Receptor Model To start the evaluation You also can evaluate the quality of the fit for each structure in the receptor site by analyzing the numerical data that is

76 9. Evaluating Structures Against a Receptor Model To start the evaluation You also can evaluate the quality of the fit for each structure in the receptor site by analyzing the numerical data that is generated. Since the evaluation process is a fitting algorithm and does not search for the best orientation of the molecule, you must correctly set the gross orientation of a new structure relative to the receptor model. The PreAlign checkbox in Preferences automatically aligns molecules to a surface prior to evaluation, or you can use alignment procedures to orient your structures properly. Choose Evaluate Molecules from the Model Receptor menu card in the DRUG DISCOVERY card deck. The Evaluate Receptor Model control panel is displayed. Energy data is displayed here Interaction property map choices Evaluating structures Evaluate aligned structures with respect to a receptor model you have generated and selected. If the Minimize Molecules During Evaluation check box is checked, the evaluation procedure also adjusts the geometry of the structures into best-fit configurations based on the constraints imposed by the receptor model. If the Pre- 62 Cerius 2 Hypothesis and Receptor Models/April 1999

77 Evaluating structures To evaluate a structure Align Molecules to Surface box is checked the molecules are first oriented into the surface. 1. Select one or more structures that you want to evaluate. To make this selection, use the Apply To Molecule(s) popup on the Evaluate Receptor Model control panel. You can select all the structures listed in the model table (All), the structures that are visible in the model window (All Visible), or only the current structure (Current). 2. Select the receptor model that you want to use in the evaluation process. To make this selection, use the Using Receptor Model popup on the Evaluate Receptor Model control panel. If you want to run the energy evaluation algorithm without the influence of a receptor model, select None. 3. Indicate if you want model constraints imposed on each structure. To do this, make sure that Minimize Molecules During Evaluation is checked. This is the default setting. If it is not checked, evaluation is of the static conformation of the structure. E inside is the energy of the static structure. 4. If you want to adjust additional evaluation preferences, click Preferences. The Preferences control panel is displayed. For more information, Setting preferences. 5. Specify your choice of structure/receptor interactions to map on the surface of the receptor model. 6. Interaction values of the receptor with the structure are mapped in color onto the displayed receptor surface. The map indicates favorable (magenta) and unfavorable (green) interactions. One property is mapped at a time. When you select a new property from the control panel, the map is not automatically updated. To update the model, click Evaluate. Properties that can be mapped include: electrostatic energy, van der Waals energy, and total energy (electrostatic plus VDW). You can also display no interaction map (Nothing). Additionally, you can adjust the range of mapped values by entering minimum and maximum values in the color legend to the right of the interactions list. Use this feature to highlight small differences in energy. Cerius 2 Hypothesis and Receptor Models/April

78 9. Evaluating Structures Against a Receptor Model 7. Click Evaluate. If you have checked Minimize Molecules During Evaluation, the geometry of the structure is adjusted to fit the constraints of the model. If the checkbox is not checked, no adjustment is made. The property interaction you have chosen is mapped as colored regions on the receptor surface. Also, on the Evaluate Receptor Model control panel, values for Energy, E strain, and E interact are listed along with the molecule name in the table. Setting preferences To set preferences This section describes additional preferences that you can set for generating receptor models. 1. Click Preferences on the Generate Receptor Model control panel. The Preferences control panel is displayed. The Preferences panel is used to change generation and evaluation defaults. (See Hahn and Rogers (1995b) for a description of van der Waals and soft surfaces.) 64 Cerius 2 Hypothesis and Receptor Models/April 1999

79 Evaluating structures 2. Check Include Solvation Correction if you want Receptor to add a penalty function when polar atoms are placed in hydrophobic regions of the receptor surface model. This box is not checked by default. When the box is checked, if the fraction of receptor surface hydrophobic points to total points in proximity to a polar atom is greater than 0.90, then a correction energy of 0.3 kcal/mol Å 2 is added. This energy is proportional to the exposed surface area of the polar atom. The term is added to the total energy after minimization is complete. Use of a solvation correction term is based on the assumption that any electrostatic interactions between a molecule and surrounding water molecules will be replaced by similar electrostatic interactions when the molecule is bound. 3. Choose the type of receptor surface you want Receptor to generate by making a selection in the Receptor Surface Type subpanel. The choices are: van der Waals This is the default selection. When you make this selection, the receptor surface characterizes the van der Waals shape of the molecules. Atom positions of the aligned molecules are clearly defined. Soft When you make this selection, the receptor surface is a more abstract representation of shape. The surface is much smoother and hides individual atom details. Generation of this receptor surface is based on the Wyvill field function for soft objects. 4. Adjust the parameters used to evaluate energy by making a selection in the Evaluate Energy Using subpanel. The choices are: Partial Charge Complementarity Computes electrostatic energy assuming that the receptor surface has a charge that is equal to and opposite in charge to the partial charges of the atoms used in the template molecules. Electrostatic Charge Complementarity Computes electrostatic energy by giving surface charges corresponding to the electrostatic potential (ESP) expected on the surface, assuming that the ESP value will be equal and opposite the ESP of the template molecules. This is the default selection. Cerius 2 Hypothesis and Receptor Models/April

80 9. Evaluating Structures Against a Receptor Model You have completed this activity. Viewing the results Energy evaluation algorithm When the evaluation process is complete, you can examine several aspects of the results. The energy evaluation algorithm attempts to alter the geometry of the structure into a conformation consistent with the surface model. Visual examination of the structures gives information on the structural changes that improve the fit into the receptor site. Property interaction maps generated for each property give you additional visual information about where favorable and unfavorable contacts are located. You also can evaluate the quality of the fit for each structure in the receptor site by analyzing the numerical data that is generated and displayed in the Evaluate Receptor Model control panel. The energy evaluation algorithm uses a fast, approximate force field to produce reasonable geometries and energies quickly. The force field is general and does not rely on specific force field atom types. Only element type, hybridization, and bond type are used to describe the energy of the system. For more information about the energy evaluation algorithm, see Hahn and Rogers (1995b). The Evaluate Receptor Model control panel includes a data table which lists all the structures that have been evaluated and the following data points for each molecule: Energy This term reports the internal strain energy of a molecule as it sits in and is constrained by the receptor site. E strain This term reports the internal strain energy of a molecule as it sits in the receptor site without being subject to receptor model constraints. This value should always be less than or equal to the Energy term. E interact This term reports the interaction energy of the molecule with the receptor. The more negative the value, the greater the interaction between the molecule and the receptor. E interact is the sum of van der Waals and electrostatic interactions. 66 Cerius 2 Hypothesis and Receptor Models/April 1999

81 Incorporating receptor data in QSARs Next steps The difference between Energy and E strain indicates the amount of strain imposed on the candidate structure by the receptor constraints. The value of this difference and the value of E interact together can be used to judge the quality of the fit of each structure in the receptor site. Note that reported values reflect the surface fit that you chose when you generated your receptor model. If the fit that you chose was loose (greater than 0.1 Å), a candidate structure may appear less strained and show less interaction. If the fit was tight (less than 0.1 Å), strain and interaction of a candidate structure both may appear higher. You can obtain reasonable energies by running an evalutation with the Surface Fit parameter set to 0.10 Å. This is the default setting. From this point, there are several options open to you to generate additional information on potentially useful structures. For example, you can: Change the interaction property that is mapped on the receptor surface to get more information for analyzing the interaction between the structure and the receptor. Use the 3D Sketcher to modify an evaluated candidate structure outside the receptor model (unconstrained), then run an evaluation procedure on the new structure. Use other Cerius 2 tools to modify a candidate structure (change orientation, add atoms or functional groups) as it sits in the receptor site (constrained) and then reevaluate the structure. Use the receptor model to build new, reasonable, optimized structures. Generate another receptor model using new or modified structures. You can then evaluate the group of candidate structures against the new model. Incorporating receptor data in QSARs The interaction energy calculated by Receptor between a molecule and a receptor surface model may be used in the QSAR+ application to develop a QSAR, that is, the goodness of fit between a candidate structure and a receptor surface model may be correlated with activity data. Cerius 2 Hypothesis and Receptor Models/April

82 9. Evaluating Structures Against a Receptor Model Before you begin Align all the models that you intend to use in receptor surface modeling. You must know the biological activity of all the molecules you intend to use. Use the most active molecules as templates for the receptor model. Use the model with highest activity as the target model for alignment. Start the process by generating a receptor surface model. For information on this procedure, see Generating a Receptor Model. When you have generated a receptor surface model, save it. For information on saving, see Saving and restoring a receptor model. Building a QSAR using receptor data To build a QSAR using Receptor data When you have generated a receptor surface model and have aligned the models you want to study, you can proceed to build a QSAR using data from the receptor-structure iterations. This section describes the procedure. 1. If it is not displayed in the Cerius2 Models window, load and display the receptor model that you have calculated and the most active model that you used for a template. Make sure that the molecule orientation is the same as that used to construct the receptor surface model. 2. Evaluate all models relative to the receptor surface model. For information on this procedure, see Evaluating structures. 3. Create a Study Table and add the structures you are studying in Receptor. For information on this procedure, see Cerius 2 QSAR+. 4. Add receptor data to the Study Table by selecting the descriptor Receptor_energies. For information on this procedure, see Cerius 2 QSAR+. 5. Calculate a QSAR using the Genetic statistical method (this procedure generates the most reliable QSAR data). For information on generating a QSAR, see Cerius 2 QSAR+. Assuming the receptor model is predictive, the significant QSAR equations that are generated contain one or more of the receptor energy descriptors in them. 6. Save the QSAR equation. For information on this procedure, see Cerius 2 QSAR+. 68 Cerius 2 Hypothesis and Receptor Models/April 1999

83 Incorporating receptor data in QSARs Using a QSAR with receptor data to estimate activity of a structure To estimate biological activity You can use a QSAR in which you have incorporated receptor data to estimate the biological activity of a structure relative to the receptor surface model you have built. This section describes the procedure. It assumes that you have generated a receptor surface model, that you have used evaluation data from the model to generate a QSAR equation, and that the QSAR has been saved. Both the receptor surface model and the QSAR equation are used in the procedure. 1. If it is not displayed in the Model window, load and display the receptor surface model that you have calculated and the most active model that you used for a template. Make sure that the model orientation is the same as that used to construct the receptor model. 2. Create or load the model you want to use to predict activity. 3. Align the new model using the most active analog as the target for the alignment. The orientation of the target model should be the same as the orientation used to construct the original receptor model. 4. Evaluate the new model in relation to the receptor surface model. Make sure that the new model is the current model, then set the Apply To Molecule(s) field of the Evaluate Receptor Model control panel to Current. Make sure the Minimize Molecules During Evaluation checkbox is toggled on, then click the EVALUATE button so that the interaction energy is added to the Energy table of the Evaluate Receptor Model control panel. 5. Add the new model to the Study Table. Make your new structure the current model in the Model Manager and select Add Current from the Molecules pulldown of the Study Table. To determine the predicted activity of the new model, click the ADD button of the Descriptors panel. The energies and predicted activity of the new model are added to the Study Table. Cerius 2 Hypothesis and Receptor Models/April

84 9. Evaluating Structures Against a Receptor Model 70 Cerius 2 Hypothesis and Receptor Models/April 1999

85 10 Theory: Receptor Models It is common to have biological activity data for a set of compounds that bind to an enzyme or antibody but not to have any firm knowledge of the 3D structure of the active site. One can deduce a hypothetical model of the binding site known as a receptor site model, and this model ought to be predictive and sufficiently reliable to guide the medicinal chemist in the design of novel compounds. Receptor surface models Receptor site models are different from pharmacophore models, which postulate a 3D arrangement of atoms recognizable by the active site in terms of the similarity of functional groups common to the set of binding molecules. In contrast, receptor site models do not contain atoms, but try to directly represent the essential features of an active site by assuming complementarity between the shape and properties of the receptor site and the set of binding compounds. The Receptor application uses 3D surfaces that define the shape of the receptor site by enclosing the most active members (after appropriate alignment) of a series of compounds. Note that errors in alignment can lead to incorrect, poorly predictive receptor surface models. The surface is generated from a Shape Field. The atomic coordinates of the contributing models are used to compute field values on each point of a 3D grid. The two functions used to calculate the field values are a van der Waals function (Eq. 16) and a Wyvill soft object function (Eq. 17). Vr ( ) = r VDWr Eq. 16 Cerius 2 Hypothesis and Receptor Models/April

86 10. Theory: Receptor Models Where r = distance from the atomic coordinate to the grid point and VDWr = van der Waals radius of the atom. This allows a surface to be constructed at any arbitrary isovalue of density on the grid. For example, at the van der Waals surface, V(r) = 0. Vr ( ) 4r 2 17r r = 9R R r Eq. 17 The Wyvill function (Eq. 17) is also a function of r but it is bounded and decays completely within the distance R, that is, V(R) = 0. R is set to be twice the van der Waals radius of each atom type. The Wyvill shape function gives smoother surfaces, whereas the van der Waals function produces a molecule-shaped surface. The isovalue chosen is called the surface fit and the Marching Cubes algorithm is used to create a triangulated surface fit with an average surface density of about six points per square angstrom, and average distance between points of about 0.5 angstroms. Mapping properties Properties are stored with each surface point of the receptor model. The first property stored is the partial charge, which would be desirable in that position in the receptor so as to maximize electrostatic attraction between the model and surface, based on the assumption that the charge at any point on the surface is complementary to the partial atomic charge of any atom in contact with the surface. If the receptor model is constructed from a single molecule, each surface point is given a charge that is equal but opposite in sign to the closest atom in the molecule. If the receptor model is constructed from an aligned bundle of models, each surface point is given a charge that is equal and opposite in sign to the average partial charge of the closest atoms in each molecule. This also assumes that each model contributes equally. The second property stored at each vertex is an electrostatic potential. Again the assumption of complementarity is made. If the receptor model is calculated from a single molecule, each surface point is given a potential that is equal but opposite to the potential calculated from summing: 72 Cerius 2 Hypothesis and Receptor Models/April 1999

87 Molecule-Receptor model interactions Q a Q b r 2 Eq. 18 r = distance between point and atom Q a = charge on atom Q b = charge on point for all atoms. As before, if a surface is constructed over a bundle of molecules, the complementary potential is calculated as the average of the potentials exerted by each molecule in the bundle. Third, a hydrogen bonding value is stored at each surface point, found by projecting a cone away from each hydrogen bond donor or acceptor atom. There is no manipulation of proton positions to take into account O-H rotameric conformations. Acceptors are defined at either O or N atoms with a free pair of electrons, and donors are any hydrogens attached to oxygen or nitrogen. Fourth, each point is classified as being either hydrophobic or hydrophilic, based on the other properties. A hydrophobic point is defined as a point with low partial charge (absolute value less than 0.15), a low electrostatic potential (absolute value less than 0.01), and a low hydrogen bond-donating or accepting propensity (absolute value less than 0.1). In the Cerius 2 Model Manager, these four properties can be mapped onto the surface and colored to show regions of potential, partial charge, hydrophobicity, and hydrogen bonding potential. Molecule-Receptor model interactions Molecular models can be minimized inside the receptor surface model not just the models that were used to build the surface, but also new models. The receptor surface and the model are optimized by the fast, approximate forcefield Clean. You can use Clean on models without a receptor surface model. The properties discussed above are used to calculate non-bonded interaction energies between the atoms of a molecular model and all the points of the receptor surface (with a 6-Å cutoff), by means of a van der Waals term (1) and (2), an electrostatic term (3), and a desolvation energy correction term. Cerius 2 Hypothesis and Receptor Models/April

88 10. Theory: Receptor Models Evdw = K RA r 12 2D RA r where RA = VDWr C h Eq. 19 Eq. 20 where: RA is the hybridization-corrected van der Waals radius of the atom VDWr is the van der Waals radius C h is the hybridization factor for sp 2, C h = 0.95; for sp 3, C h = 0.90 r is the distance between the atom and the surface point K is the well-depth constant (set to 0.1 for all atom-point interactions) D is an empirically derived point density scaling factor D scales the vdw energy and forces so that ideal atom/surface interactions yield a typical value of kcal/å 2 of surface contact. The electrostatic term is a monopole-monopole Coulombic function: Eele = Q a Q p r D S ( r ) Eq. 21 r is the separation between the atom and the surface point, Q a is the charge on the atom, Q p is either the potential or the charge at the point. D is the same density correction factor used in the van der Waals term, and S(r) is an atom based switching function employed in CHARMM. Sr ( ) ( r 2 off r 2 ) 2 ( r 2 off + 2r 2 3r 2 on) = ( ) 3 r 2 off r 2 on Eq. 22 for r on < r < r off 74 Cerius 2 Hypothesis and Receptor Models/April 1999

89 RSA (Receptor Surface Analysis) This gives a continuous potential energy and force. The current settings are as follows: r on = 7 Å, r off = 8 Å All pairs of atoms with a separation greater than this cutoff are neglected in the nonbonded calculation. The desolvation energy correction term is a penalty function that attempts to account the loss of solvation when polar atoms are forced into hydrophobic regions of the receptor surface. If the fraction of hydrophobic points to total points in proximity to a polar atom is greater than 90%, then the desolvation correction energy is added. The magnitude of the term is proportional to the exposed surface area of the atom by the constant 0.3 kcal/mol Å 2. We make the assumption that any electrostatic interaction between a molecule and the surrounding solvent will be replaced by similar electrostatic interactions when the molecule is bound. The desolvation term is added to the total energy after the minimization of the model in the receptor model is complete. Evaluation of molecule-receptor model energy interactions allows us to construct a receptor surface model and then calculate the energy of interaction between the receptor model and a molecular model. Medicinal chemists can evaluate compounds whose biological activity has not yet been measured, or entirely new hypothetical structures. Moreover the energies can be incorporated into the Study Table (descriptor Receptor_energies) as independent Y variables to construct a QSAR relationship. RSA (Receptor Surface Analysis) The CoMFA formalism calculates probe interaction energies on a rectangular grid around a bundle of active molecules. Receptor calculates molecule-receptor model interaction energies on a receptor surface, and, like CoMFA, these energies can serve as input for the calculation of a QSAR relationship. Just as each energy associated with an MFA grid point can be used, so can each point on the surface of a receptor model. Hopefully, the Cerius 2 Hypothesis and Receptor Models/April

90 10. Theory: Receptor Models receptor surface is better able to sample the environment of the molecule than a rectangular grid, leading to better results. 76 Cerius 2 Hypothesis and Receptor Models/April 1999

91 Part 3 Pharmacophore Introduction to Generating a Pharmacophoric Hypothesis Generating Conformations Generating and Using Alignment Hypotheses

93 11 Introduction to Generating a Pharmacophoric Hypothesis Catalyst s ConFirm, Hypo, and HipHop and C 2 Receptor are applications that provide tools to generate pharmacophoric hypotheses. The hypotheses are created by generating conformations for a set of study molecules, then using the conformations to find and align chemically important functional groups common to the molecules in the study set. Each hypothesis can also incorporate data on the biological activities of the study molecules. This chapter provides an overview of the Cerius 2 interfaces to the Catalyst applications ConFirm, and HipHop, and Hypo, and the activities that can be performed using them. For detailed information about using these interfaces, see Chapters 12 through 14. For detailed information about ConFirm and HipHop, see the online documents Catalyst ConFirm User s Reference and the Catalyst HipHop User s Reference, supplied on your MSI CD when you buy these products. Using ConFirm and HipHop Before you begin The HYPOTHESIS MODELS card deck in Cerius 2 provides menu cards that contain interface functions for running ConFirm, HipHop, and Hypo. To use ConFirm or HipHop to generate hypotheses, you must have one or more structures loaded into the Model Manager. Molecule input files for this interface are in the standard.msi file format. ConFirm typically handles molecules up to pentapeptide size. Output files are generated in Catalyst CPD format. You can create or import structures for conformation generation using one of several methods: Cerius 2 Hypothesis and Receptor Models/April

11. Introduction to Generating a Pharmacophoric Hypothesis To access ConFirm or HipHop Build one or more structures using the Cerius 2 3D Sketcher or a Cerius 2 builder (see the Cerius 2 Modeling

Model control panel accessed from the File pulldown on the menu bar (see the Cerius 2 Modeling Environment) Choose the HYPOTHESIS MODELS card deck from the deck selector.

paragraphs. The Cerius 2 interface to ConFirm is used to generate conformations for a single molecule or a set of molecules.

94 11. Introduction to Generating a Pharmacophoric Hypothesis To access ConFirm or HipHop Build one or more structures using the Cerius 2 3D Sketcher or a Cerius 2 builder (see the Cerius 2 Modeling Environment book or the Cerius 2 Builders book) Use the Analog Builder to generate a series of analogous structures (see the Cerius 2 Builders book) Import previously built structures using the Load Model control panel accessed from the File pulldown on the menu bar (see the Cerius 2 Modeling Environment) Choose the HYPOTHESIS MODELS card deck from the deck selector. The deck appears: Generate conformations The four activities that you can perform using the Cerius 2 Con- Firm, HipHop, and Hypo interfaces and C 2 Receptor are described briefly in the following paragraphs. The Cerius 2 interface to ConFirm is used to generate conformations for a single molecule or a set of molecules. The number of conformations needed to produce a good representation of a compound s conformational space depends on the molecule. Both conformation-generating algorithms available in ConFirm (Best and Fast) are adjusted to produce a diverse set of conformations, avoiding repetitious groups of conformations all representing local minima. For detailed information on generating conformations, see Chapter 12, Generating Conformations. The conformations generated by ConFirm can be used as input into HipHop and Hypo to align common molecular features and generate pharmacophoric hypotheses. Currently the conformations cannot be viewed in Cerius Cerius 2 Hypothesis and Receptor Models/April 1999

95 Using ConFirm and HipHop Align common molecular features to generate a hypothesis Incorporate biological activity data into a hypothesis Generate receptor models using HipHop- aligned structures HipHop and Hypo use conformations generated in ConFirm to align chemically important functional groups common to the molecules in the study set. A pharmacophoric hypothesis can then be generated from these aligned structures. For detailed information on this alignment activity, see Aligning common molecular features. The Cerius 2 interface to HipHop and Hypo is also used to incorporate biological activity data into the hypothesis-generating process. Each hypothesis is tested by regression techniques to compare estimated activity with actual activity data. The software uses the data from these tests to select the hypotheses that do the best job predicting activity for the set of study molecules. This capability is provided by Catalyst/Hypo. For detailed information on this alignment activity, see Incorporating activity data into a hypothesis. The MODEL RECEPTOR menu card is included in the HYPOTH- ESIS MODELS card deck so that you can use structures that have been aligned in HipHop to generate a receptor surface model. Since structures used in HipHop are aligned by common chemical features, the receptor surface model that is generated for them can be significantly different from a receptor surface model generated from template-aligned structures. For detailed information on performing this activity, please see Using aligned structures to generate receptor models. Cerius 2 Hypothesis and Receptor Models/April

96 11. Introduction to Generating a Pharmacophoric Hypothesis 82 Cerius 2 Hypothesis and Receptor Models/April 1999

97 12 Generating Conformations Before you begin The Cerius 2 interface to ConFirm provides a way for you to generate conformations for a single molecule or a set of molecules. The number of conformations needed to produce a good representation of a compound s conformational space depends on the molecule. Both conformation-generating algorithms available in ConFirm (Best and Fast) are adjusted to produce a diverse set of conformations, avoiding repetitious groups of conformations all representing local minima. For information on conformations generated using these algorithms, see Smellie, Kahn, and Teig. You can use the conformations generated by ConFirm as input into HipHop to align common molecular features and generate pharmacophoric hypotheses. This chapter describes how to use the Cerius 2 ConFirm interface to generate conformations for a set of study molecules. These conformations can then can be used as input into HipHop to generate pharmacophoric hypotheses. Output files for HipHop and Catalyst/Hypo are generated in Catalyst CPD format. You must have one or more structures loaded into the Model Manager. You can generate or import structures for conformation generation through one of several methods: Build one or more structures using the Cerius 2 3D Sketcher or a Cerius 2 builder (see the Cerius 2 Modeling Environment or the Cerius 2 Builders) Use the Analog Builder to generate a series of analogous structures (see the Cerius 2 Builders) Import previously built structures using the Load Model control panel accessed from the File pulldown on the menu bar (see the Cerius 2 Modeling Environment) Cerius 2 Hypothesis and Receptor Models/April

98 12. Generating Conformations Running ConFirm To access ConFirm To generate conformations Display the CONFIRM menu card in the HYPOTHESIS MOD- ELS card deck and select Conformer Generation to display the Conformer Generation panel. You can generate a set of conformations automatically for the current model in the Model Manager by accepting the default settings on the control panel and clicking Generate. ConFirm will generate 255 or fewer conformations and deposit them in files in your current directory using the name format compoundname.cpd where compoundname is the name of the input molecule. To change default settings You can change a number of settings for the conformation generation process by changing one or more entries and popup choices in the Conformer Generation control panel: Models used for conformer generation Specify the models you want to use. The popup choices are Current, Selected and All. The default selection is Current. Conformer search method Choose the method you want to use. The popup contains the following selections: Fast Takes a quick sampling of the conformational space of a molecule. It quickly produces a set of conformations that do not cover conformational space as completely as the Best method. This is the default selection Best Generates the most comprehensive set of conformations. The Best method incorporates a technique in its process for promoting conformational variation. This method can be used to refine and expand a set of conformations found using the Fast method. It is the method you should use for generating conformations to be used in HipHop. Stereochemical treatment of asymmetric centers Choose from the following popup options: Absolute Generates conformations that are based on the known absolute stereochemistry of each model. This choice can also be used for molecules with relative stereochemistry. ConFirm treats any relative stereocenter in a starting molecule as absolute. This is the default selection. 84 Cerius 2 Hypothesis and Receptor Models/April 1999

99 Running ConFirm Relative Generates conformations that maintain the relative stereochemistry of the starting molecule. If the molecule contains only relative stereocenters, ConFirm generates conformations of enantiomers (that is, the starting molecule and its mirror image). Undetermined Attempts to generate all possible stereoisomers and conformations for each molecule when no stereochemical information is available. Because the number of stereoisomers can exceed the maximum number of allowable conformations, ConFirm may not generate a complete set of conformations for molecules with multiple stereocenters. Therefore, use this choice judiciously. Energy range Specifies the acceptable energy range within which conformations should be generated. The default value is 15 kcal. Maximum conformers Specifies the maximum number of conformations you want to generate. Enter a number by typing in the data entry box or by clicking the up- and down-arrows associated with the box. The default number, 255, is the maximum number of conformations for each model that can be handled by HipHop. Save results in Enter the name of the file where you want results saved. If no file is specified, conformations are saved to the file compoundname.cpd where compoundname is the name of the molecule for which conformations have been generated. You can also specify the directory where you want the file stored. The default directory is the current directory (that is,./). Ignore Existing Conformers Indicates whether conformations that were previously generated for the starting molecule should be saved or not. When the box is checked, all existing conformations are discarded. When the box is unchecked, new conformations are added to existing ones. The default condition is checked. Cerius 2 Hypothesis and Receptor Models/April

100 12. Generating Conformations 86 Cerius 2 Hypothesis and Receptor Models/April 1999

101 13 Generating and Using Alignment Hypotheses This chapter describes Before you begin To access HipHop HipHop and Hypo use conformations generated in ConFirm to align chemically important functional groups common to the molecules in a study set. Biological activity data can be incorporated into these hypotheses so that the best hypotheses for predicting activity are generated and selected. Additionally, you can use structures that have been aligned in these programs to generate a receptor surface model in Cerius 2 Receptor. This chapter discusses the following tasks: Aligning common molecular features Setting preferences using the Preferences control panel Incorporating activity data into a hypothesis Using aligned structures to generate receptor models Before you can generate a pharmacophoric hypothesis, you must have conformation files generated by ConFirm or by Catalyst in Catalyst CPD format. Each input file must contain a conformational model that spans the energetically accessible conformational space. The maximum number of molecules allowed in the input set is 32. To avoid consuming large amounts of time and memory, use molecules with fewer than 30 features per conformation. Display the HIPHOP menu card in the HYPOTHESIS MODELS card deck and select Generate Alignments. The Generate Hypothesis control panel is displayed. Cerius 2 Hypothesis and Receptor Models/April

102 13. Generating and Using Alignment Hypotheses Aligning common molecular features If you want to use the default settings for the alignment process, set the Alignments/Hypotheses popup to Alignments and click Generate. HipHop aligns common chemical features, generating 10 alignments for the complete set of molecules listed in the Model Manager. The the alignment information is output to MDL structure-data (SD) files stored in your current directory. You can change the default settings of the Generate Hypotheses control panel as described in the following paragraphs: Alignment You can focus on generating alignment files by selecting Alignments or on generating hypotheses by selecting Hypotheses from the popup. If you select Hypotheses, the Create Output Alignment Files field is not displayed. Models Generate alignments for all or selected models listed in the Model Manager, as listed in the popup. Generate Alignments Enter a number up to 999 in the data entry field to set the number of alignments you want HipHop to generate. Output Alignment Files Choose the type of output file you want for generated alignments. The choices listed in the popup are the MDL structure-data (SD) or MOL2 files. HipHop produces files containing the coordinates of concatenated compounds according to the file type you specify. These files can be used as input for other modeling modules. Results directory Enter the name of the directory where you want to save the alignment files. If you do not put a name in the data entry box, the files are saved to your current directory (/.). Generate input files If you want template input files created, check this box. Template files are the input files used in the alignment process. If you have used ConFirm or Catalyst to generate conformations, the template files are generated automatically. Template files can be edited. 88 Cerius 2 Hypothesis and Receptor Models/April 1999

103 Setting preferences using the Preferences control panel Setting preferences using the Preferences control panel To set preferences This section describes the variety of parameters for which you can set values in HipHop and Hypo. These parameters regulate the program s execution and its output. The software tries to satisfy all of the constraints you impose when you set preferences. Each preference contributes to the results. If it is not possible for HipHop or Hypo to meet all of the criteria you specify, the software produces no generated hypotheses and no aligned molecules. If your specifications conflict, error messages are issued and processing halts. Display the HIPHOP menu card in the HYPOTHESIS MODELS card deck and select Preferences to display The HipHop Preferences control panel. Select the options and adjust parameter values on this control panel to specify the constraints you want to use in generating alignments and hypotheses. The options included on this control panel are briefly described in the following paragraphs. Total Hypothesis Features Specifies minimum and maximum number of features allowed per generated hypothesis. The default value for the minimum number of features is one. The default value for the maximum number of features is 10. Pharmacophore Size Specifies the minimum number of points in a pharmacophore. The default value is 3. Note that some chemical features contain more than one point, for example, a hydrogen bond. Candidate Pharmacophore Size Specifies the minimum number of points in a pharmacophore to be considered as a candidate in the construction of a larger hypothesis. The default value is 3. Note that some chemical features (for example, a hydrogen bond) contain more than one point. Unmapped Molecules Specifies the maximum number of molecules that miss more than one feature of a hypothesis. Such molecules are termed complete misses. The default value is 0. Cerius 2 Hypothesis and Receptor Models/April

104 13. Generating and Using Alignment Hypotheses Partially Mapped Molecules Specifies the maximum number of partially mapped (one feature missing) and unmapped (two or more missing features) molecules in the input set of molecules. The default value is 0. Molecules Missing Some Feature Specifies the maximum number of molecules that can miss the same feature of a hypothesis. The default value is 0. A small value tends to distribute missed features evenly throughout the input set of molecules. Superposition Check Specifies the RMS fit of molecules to generated hypotheses. This parameter is more precise than a distance check. The default condition for this checkbox is checked. RMS Error Sets the allowed deviation in RMS fit of mapped features in each mapped molecule. The default value is Reducing this value from its default tightens the fit. Location Constraint Tolerance Scaling Specifies the factor applied to each location tolerance, constraining each of the features in a hypothesis. The default value is The unscaled tolerances are 1.6 Å for all ligand features and 2.2 Å for all projected points. Inter-feature Distance Specifies the minimum distance between feature locations in input molecules. Only configurations of features in the molecules with at least this distance between the actual feature locations are considered when identifying common features for a hypotheses. The default value is 2.97 Å. Small featurepoor molecules may require a value about 1.5 Å. Maximum Memory (fast algorithm) Specifies the maximum memory to be used by the Fast alignment algorithm. The default value is 60 MB. Total accessible memory is twice the set value. Ideal Hydrogen Bond Geometries Only Specifies that only projected points at positions that reflect ideal hydrogen bond geometries be considered. HipHop considers only staggered hydrogen donor/acceptor interactions ions for a hydroxyl and two lone-pair positions for a carbonyl oxygen. The default condition for this box is checked. Generate Hypothesis (.chm) Files Generates a sequential hypothesis file for each alignment hypothesis with the name HypoOutput.n.chm where n is the sequential number of the file. 90 Cerius 2 Hypothesis and Receptor Models/April 1999

105 Incorporating activity data into a hypothesis These files can be used by Catalyst/Info. The default condition for this checkbox is checked. Incorporating activity data into a hypothesis To generate hypotheses for predicting activity The Cerius 2 interface to HipHop and Hypo is also used to incorporate biological activity data into the hypothesis-generating process. Each hypothesis is tested by regression techniques to compare estimated activity with actual activity data. The software uses the data from these tests to select the hypotheses that do the best job predicting activity for the set of study molecules. This capability is provided by Catalyst/Hypo. Select Generate Alignments from the HIPHOP card to open the Generate Hypothesis control panel. Select Hypotheses from the first popup on the control panel, then click Generate. The software identifies common features among the study molecules, aligns them into a set of alignment hypotheses, compares estimated activity (from the hypotheses) with actual activity, then reports the lowest cost/best fit hypotheses with respect to predication of activity for the set of study molecules. Using aligned structures to generate receptor models To display HipHop aligned structures The MODEL RECEPTOR menu card is included in the HYPOTH- ESIS MODELS card deck so that you can use structures that have been aligned in HipHop to generate a receptor surface model. Since structures used in HipHop are aligned by common chemical features, the receptor surface model that is generated for them can contain significant data that differs from data for a receptor surface model generated from template-aligned structures. You must import the.sd or.mol file that has been generated by HipHop or Hypo into the Model Manager using the following procedure: Cerius 2 Hypothesis and Receptor Models/April

106 13. Generating and Using Alignment Hypotheses 1. Select Load Models from the Visualizer File menu. The Load Models control panel is displayed. 2. Select MACCS from the File Format popup. 3. Change the extension listed in the file data entry box from.m[do]l to.sd or.mol to correspond to the type of files you want to import. A list of files with the appropriate extension appears in the file browser. 4. Highlight the files you want to load and click the Load button. The files are loaded into the Model Manager. 92 Cerius 2 Hypothesis and Receptor Models/April 1999

107 Part 4 Database Query Database Query

108

109 14 Performing a Database Query C 2 DBAccess enables you to construct molecular queries to search specified databases and to retrieve, examine, and save structures that fit your criteria. Cerius 2 DBAccess is an interface to various third-party databases and database search engines. It allows you to search your existing in-house databases from within Cerius 2 and retrieve the resulting hit list directly into Cerius 2. Therefore, it allows you to maintain your corporate and/or project databases in your choice of format and with your choice of chemical information management system (in this release, these choices are MDL's ISIS, and MSI's Catalyst and catshape systems) and provides you the ability to retrieve information from these databases directly into Cerius 2. This chapter explains Available database searches Preparing to work with Database Query Methodology Constructing a query Submitting a query Retrieving and browsing database hits DBAccess benefits The most apparent benefits to you: 1. Cerius 2 DBAccess adds value to your existing investment in activity prediction software tools since it now has a more seamless integration with ISIS and Catalyst. 2. You are not required to convert your corporate database into another format since you can access the database in its native format with Cerius 2 DBAccess. 3. With Cerius 2 DBAccess, it is now easy for you to incorporate corporate structural information into your modeling and analysis environment since DBAccess integrates your corporate Cerius 2 Hypothesis and Receptor Models/April

110 14. Performing a Database Query structural databases with the Cerius 2 visualization and analysis tools. Cerius 2 DBAccess does not itself perform any database searches; instead, it uses the searching capabilities of Catalyst and ISIS to retrieve compounds. All types of structure-based searches are available with Cerius 2 DBAccess that includes 2D substructure and 3D pharmacophoric searches (FAST and BEST algorithms), and 3D shape searches with Catalyst, and 2D substructure and similarity, and 3D pharmacophoric searches (rigid and flexible search algorithms) with ISIS. The ISIS searches are performed through the ISIS/Host API. Once a search is complete, the hit list is written out as an SD file and automatically brought into the Hit List Browser. Available database searches ISIS Catalyst/SHAPE ISIS has the ability to perform searches on 2D and 3D structural databases based on a structural query. The structural query can be developed to mimic a Catalyst Hypothesis, or it can be based on a pharmacophore model generated with C 2 ASearch. Catalyst hypotheses can also be exported as ISIS search queries. With a query in hand, ISIS performs searches on databases and retrieves only those compounds that satisfy the geometric and structural constraints. If the constraints are 3D, then the search can optionally be performed taking the conformational flexibility of the molecules into consideration. The hit list can then be exposed to various analysis capabilities within Cerius 2. For example, you can import a hit list (SD file) into a Study Table and calculate various descriptor properties; then the hit list can be clustered with respect to these properties. Catalyst/SHAPE (catshape) performs 3D shape searching of a flexible 3D database to retrieve compounds that have similar shapes. It also calculates and reports multi-conformational shape descriptors for use in combinatorial library design, analysis, and comparison. 96 Cerius 2 Hypothesis and Receptor Models/April 1999

111 Preparing to work with Database Query Catalyst Catalyst is a Molecular Simulations product that performs hypothesis generation, database searches, and activity prediction. The database search function, Catalyst/Info, is accessed through the Cerius 2 interface and allows 2D or 3D queries to be generated for searching any Catalyst database. Databases available from MSI include the MDL s Available Chemical Database (ACD), Derwent s World Drug Index (WDI), BioByteMaster File, National Cancer Institute, and Maybridge databases. Preparing to work with Database Query Before you begin Before you start Database Query, you must: 1. Have licensed and installed at least one of the following: ISIS 2.0 or Catalyst 2.2 or higher. You also must have the appropriate Cerius 2 modules. Please note that both Catalyst and ISIS/Host must run on an SGI platform either locally or networked to the machine where you are running Cerius 2. For ISIS searches, you also need to have a copy of your ISIS Hview files in your home directory. Creation of the databases in query-accessible form should be completed as part of installation and administration of each database. Additionally, your workstation environment must be set up for access to these databases. If you cannot access the databases you want, see your database system administrator. 2. Be familiar with the Cerius 2 user interface and tools. For an introduction to the user interface and the Visualizer tools, see the book, Cerius 2 Modeling Environment, and the Cerius 2 Tutorials Basics. 3. Displays the current model, in the model window, the target structure for constructing your query. Methodology To open C 2 DBAccess, select the DATABASES card deck from the main Cerius 2 menu. It displays three cards: MDL INTER- Cerius 2 Hypothesis and Receptor Models/April

112 14. Performing a Database Query FACE, CATALYST INTERFACE, and CATSHAPE INTERFACE. Each interface has at least the following three options: Submit Query, Browse Hits, and Reset. The CATSHAPE INTERFACE card has two additional options: Show Study Table and Receptor Model. The Receptor Model option provides a Generate Model and Evaluate Molecules pulldown menu. The Catalyst and ISIS interfaces have an Edit Query option that allows you to develop a search query from within Cerius 2. The Submit Query panel allows you to select the database that you want to search by selecting *.bdb files for Catalyst searches, and the *.hvd files for ISIS searches. Other options in Submit Query include an option to save files in a specific SD file. The Query source can be defined as: for ISIS: Current Model, MDL Clipboard, and Mol File for Catalyst: Current Model or Mol File for CatShape: Current and Selected Model Each of the three interfaces allows you to limit the number of hits to a specific maximum. As for the search options, the Catalyst interface provides options for Fastest or Best fit; ISIS interface has options for Substructure, Similarity and 3D searches. The similarity search requires a similarity value to be entered (between 0 100), and the 3D search requires you to select from Rigid or Flexible search options. The CatShape interface has options to Generate the shape descriptors for the database (this option must be run once), Statistics which provides statistics on shape descriptors for the database, Descriptors, which writes out the descriptor values into a user specified file and display them in a table, and finally the Search option performs the Shape-based search on the specified database. Both the Catalyst and CatShape interface also allows you to select a host machine on your network on which to perform the search. Finally, the CatShape interface allows you to set the Tolerances for your Shape search. You can develop your 3D search queries using the Edit Query panel for both Catalyst and ISIS searches. This interface allows you to develop ISIS queries that include distances between pairs of atoms and ring centroids. In addition, all of the ISIS options for bond and atom definitions are available. 98 Cerius 2 Hypothesis and Receptor Models/April 1999

113 Methodology Bond types: single double triple up down up/down cis/trans aromatic any single/double double/aromatic single/aromatic Bond topology: ring chain Atom types (include or exclude): none any element any element except H any element except C & H any halogen user specified list of elements common organic elements To start Database Query The queries are generated in ISIS molfile format and both Catalyst and ISIS searches can be performed using the same queries. The Hit Browser is activated automatically upon completion of a Catalyst, CatShape, or ISIS search and is loaded with the structures in the search s output SD file. The browser allows you to rapidly browse compounds. A table lists the structures, the compound names, and the number of atoms and bonds. You can perform standard table operations with this hit list table such as sorting, selecting, exporting, etc. Finally, you can select the Display In Model Window option to display the hits in the Model Window using your choice of display options. Browsing in the Hit List Browser automatically updates the image in the Model Window. 1. If the DRUG DISCOVERY card deck is not already displayed, select DRUG DISCOVERY from the Deck Selector popup. 2. Choose the QUERY DATABASE menu card from the deck. The card is displayed. Cerius 2 Hypothesis and Receptor Models/April

Four types of queries can be defined: atom, bond, distance, and centroid. Defining each query type involves specifying parameters specific to the query.

114 14. Performing a Database Query or Select DATABASES from the Deck Selector popup. The CAT- ALYST, CATSHAPE, and MDL (ISIS) INTERFACE menu cards appear: Constructing a query Queries are constructed using the Query Editor. Four types of queries can be defined: atom, bond, distance, and centroid. Defining each query type involves specifying parameters specific to the query. The appearance of the Query Editor control panel changes depending on the query type you select. The simplest type of query is simply a molecular fragment as represented by a Cerius 2 model. The current model, which you have loaded into Cerius 2 or created via the C 2 Builder, is the usual starting point for a query. 100 Cerius 2 Hypothesis and Receptor Models/April 1999

115 Constructing a query When the current model is used as the query, the default behavior is to build the query using the exact atoms, bonds, etc., as they currently exist in the model. If a 3D query is submitted, the relative coordinates of the atoms in your model are used. More commonly, you will want to edit the query to make it more general or more specific to your search. For example, you may wish to specify that an atom currently represented in the model by a carbon atom could also be a nitrogen or sulfur atom for the purpose of the search, or that it could be any atom except carbon. You can specify any combination of specified atom, bond, and centroid types for a Catalyst or ISIS database query. For 3D queries, you can specify desired intra-atomic distances. When you submit a query via the Submit Query command, the query you have defined on the current model is first written to an MDL-format MOL file which is then used by Catalyst or ISIS for the search. You may also submit a query using a previously created query (MOL) directly. You can also export Catalyst hypotheses as ISIS search queries. The following four sections describe how to edit each of the four types of query definitions: atom, bond, centroid, and distance. Constructing an atom query To construct an atom query An atom query can be constructed for any atom in your target structure. The atom you choose in the target is identified by element and number. 1. Click the Atom icon in the Query Editor control panel. 2. Pick an atom in your target model. The subpanel displays the pertinent information for the atom you selected. To specify the same query for multiple atoms, hold down the <Shift> key while you pick additional atoms. Note that only the query for the first atom you selected is displayed. 3. Specify the criteria for the query either by selecting a predefined set of elements (for example, Any Halogen) from the popup, or by using the entry field to type the elements you want to specify. If you type the data, your action overrides the Cerius 2 Hypothesis and Receptor Models/April

116 14. Performing a Database Query predefined information and the popup displays the term User Specified. 4. If the element list is to be used to exclude structures, check the Exclude checkbox. Note that if hydrogens are included in the structures, they are included in the query. 5. If you want to remove data from the data entry box, click Delete This Feature. The current query information is cleared from the panel. If you want to remove all atom queries, click Delete All Atom Queries. 6. To save the query, click Define. You can now select another atom and repeat the process of constructing another atom query. Constructing a bond query A bond query can be constructed for any bond in your target structure. The bond you choose in the target is identified by the atoms in the bond. This means you cannot click a bond to select it. To construct a bond query 1. Click the Bond icon in the Query Editor control panel. 2. Use the mouse to pick two atoms in your target model to define a bond. A rubber band indicator stretches from the first atom to the second. When you pick the second atom, the bond is highlighted. A Query Editor panel (as illustrated) associated with this bond appears. 102 Cerius 2 Hypothesis and Receptor Models/April 1999

117 Constructing a query 3. The Bond Type popup displays only the currently selected bond query type, for example, Single. Use the popup to select the bond type you wish to use in the search, for example, Double Aromatic. You may <Shift>-select further atoms connected to this bond to specify the same type over multiple bonds. An alternative way to perform this selection, which is very useful when, for example, you wish to define all the bonds in part of the model to be aromatic for the query, is to first select Bond from the Feature Type popup. With standard picking mode selected (the arrow icon highlighted), use standard selection techniques (for example, rubber-band selection) to select the atoms defining the bonds you wish to modify for the search. Click the Bond icon note that the Bond Type displayed is the last one used (since the bond type cannot be generally displayed when multiple bonds are selected). However, if you then select a bond type, for example, Double Aromatic, and Cerius 2 Hypothesis and Receptor Models/April

118 14. Performing a Database Query then click the Define button, all these bonds will become of the specified query type. Note This methodology also works for the other query features; Atom, Distance, and Centroid. You can use Bond Topology to further specify your bond query. This further qualifies which bonds will match your bond query during a search and allows you to specify whether or not the bonds are allowed to be in ring systems. 4. When you are satisfied with your selections, click Define to define the query. 5. You can remove the current query by clicking Delete This Feature. The current query information is cleared from the panel. If you want to remove all bond queries, click Delete All Bond Queries. Constructing a centroid query With respect to database queries, a centroid is defined as the geometric average of a set of atoms that are used in a substructure search. Hence the centroid itself does not represent a query but rather a property of a query. For example, you may wish to find all models from a 3D database search with a certain distance between a given metal atom and the center (centroid) of a phenyl group. Your query would be composed of a six-membered aromatic carbon ring, a metal atom, the centroid of the ring, and a distance query defined between the metal atom and the centroid. To construct a centroid query: 1. Click the Centroid icon in the Query Editor panel. 2. Pick an atom you want included in the centroid definition. The Query Editor panel will change to display information the number of atoms used in the centroid s current definition. To add more atoms to the definition simply use the mouse to pick the desired atoms in the Cerius2 Models window. 104 Cerius 2 Hypothesis and Receptor Models/April 1999

119 Constructing a query 3. When you have specified all the atoms in your centroid definition click Define. The centroid is now defined and will appear as a + at the geometric center of the atoms you specified. The + centroid atom may later be selected for the purpose of deleting or using this centroid in a Distance query. 4. To delete the currently selected centroid, click Delete This Feature. Clicking Clear Selection deletes all the atoms in the current centroid definition so that you can edit an existing centroid. 5. Click Delete All Centroid Features to remove all centroid features from the current query. Cerius 2 Hypothesis and Receptor Models/April

120 14. Performing a Database Query Constructing a distance query A distance query feature acts as a query filter on the set of models that would normally be returned for your model query. It does this by specifying a distance inclusion range between a pair of atoms in your query. Hence a distance query is only employed during 3D database queries. Multiple distance queries may be defined for your model query (Atoms and Bonds), in which case all the distances must be satisfied when a database model is matched to your query. To construct a distance query: 1. Click the Distance icon in the Query Editor panel. 2. Pick the first atom of your distance query specification. A rubber line will now trail your cursor so that you know that you 106 Cerius 2 Hypothesis and Receptor Models/April 1999

121 Submitting a query Deleting all query features need to pick another atom. Pick the second atom of your specification. The Query Editor panel will now show distance feature information, including the distance range for your current selection. If this has not already been defined this will show values based on the distance between these atoms (± 0.5 Å). 3. Type in the distance range minimum and maximum values you wish to use with this distance feature. These represent the minimum and maximum distances allowed between these atoms in any model retrieved from a 3D database query. Click the Define button to define this distance feature query. 4. To delete an individual Distance Feature query use the Delete This Feature button after selecting the distance query you wish to delete. To delete all currently defined distance queries, use the Delete All Distance Features button. To reset/delete all query features defined on the current model, use the Clear All Features button on the Query Editor panel. It is a good idea to do this whenever you wish to start defining a new query. This will delete all Distance and Centroid features and reset all Atom and Bond features to their defaults (that is, the atoms/ bonds of the current model as displayed). Submitting a query Accessing the Submit Query control panel Cerius 2 does not act on query information until a query is submitted. Queries are submitted using the Submit Query control panel, which displays different subpanels depending on the search facility and database type you want to use. The default selection for database is None. To access the Submit Query control panel, select Submit Query from the QUERY DATABASE menu card. Cerius 2 Hypothesis and Receptor Models/April

122 14. Performing a Database Query Select the database type you want and continue as described in the following database-specific procedures. Submitting a Catalyst query To submit a Catalyst query: 1. Select Catalyst from the popup in the Database Type field or select Submit Query from the CATALYST INTERFACE menu card to open the Submit Query control panel. 2. Select the database (.bdb) file that you want to search from the Database to Search browser on the left of the control panel, then click Set. The name of the database appears below the data entry field. 3. Name the file to be used to save results either by selecting it from the Save Results In browser on the right of the control panel or by entering a name in the data entry field; click Set. The name of the file appears below the data entry field. 108 Cerius 2 Hypothesis and Receptor Models/April 1999

123 Submitting a query 4. Use the Use... Fit popup to choose the way that you want Catalyst to perform flexible fitting. FASTEST is the default selection but you may also choose BEST. 5. If you want to put an upper limit on the number of structures that can be retrieved in the search, enter the number in the Limit Hits To data entry box. The default value is 999. Check the Limit Hits check box. The default condition is unchecked. 6. Select the model you wish to use as the query using the Use Query popup. You may either specify the current model or a previously defined query stored in a.mol or.chm file. 7. Select the host on which the search is to run by using the Search Database on Host popup. The default host is installation-specific. 8. Click Submit Query to start the search. Information during the search is reported to the text window. When the search is complete, the number of hits is also reported in the text window. Submitting a SHAPE query To submit a SHAPE query: 1. Select SHAPE from the popup in the Database Type field or select Submit Query from the CATSHAPE INTERFACE menu card to open the Submit Query control panel. 2. Select the database (.bdb) file that you wish to search from the Database to Search browser on the left of the control panel, then click Set. The name of the database appears below the data entry field. 3. Choose a name for the output of your database search (.sd) using the Save Results In browser. The name of the output file appears below the data entry field. Note: if a file by this name already exists this will be overwritten by the search. 4. Choose the CatShape search type by clicking the CatShape Run popup. There are four search methods available for Cat- Shape: a. Generate: Searching the database in this mode will not actually submit a query to search. When SEARCH is selected, Cerius 2 Hypothesis and Receptor Models/April

124 14. Performing a Database Query CatShape will produce a shape index database from the database you selected, which is required for subsequent queries on this database. A file db_name.bdb.id.4bdb is produced in your local directory. If this file does not already exist, other CatShape Run type searches will automatically generate this file at the start of the search. b. Statistics: Searching the database in this mode will not actually submit a query to search. When SEARCH is selected, CatShape will produce statistics to the text output window from the shape index database. c. Descriptors: Searching the database in this mode will not actually submit a query to search. When SEARCH is selected, CatShape will produce a table of statistics (descriptors) for each molecule in the database from the shape index database. These are also recorded to an output file that you must specify in the Descriptors Output File parameter in the Submit Query panel. This file can also be loaded into the QSAR+ module. The descriptors produced are based on properties, such as moments of inertia, averaged over all conformations for a molecule in the database. d. Search: This is the value required to actually run a CatShape search. The query in this case will be generalized shape data calculated from your query model, such as molecular volume, principle axes of inertia, etc. Which model to use as the query is set by the Construct Query From popup, where you specify either the current or selected model. Clicking the Search Tolerances button brings up the Catalyst Shape Search Preferences panel where you can adjust the limits to the accuracy of your shape query. For example, Max and Min values for percentage tolerance on principal axes specify the percentage difference that is allowed between a database conformer s axes and the query molecule axes for that molecule to be considered a possible match. You can also set a limit on the maximum number of hits returned by a search by checking the Limit Hits check button and specifying the maximum number of hits allowed. 110 Cerius 2 Hypothesis and Receptor Models/April 1999

125 Submitting a query 5. Choose a Host machine to run your search on. Usually you would only need to change the value from localhost if your Catalyst server is on another machine. 6. When you have all the search parameters set, click the SEARCH button to initiate the database query search. When a query search completes, the results will be loaded into the Browse Hits browser. Submitting an ISIS query To submit an ISIS query: 1. Select ISIS from the popup in the Database Type field or Submit Query from the MDL INTERFACE menu card to open the Submit Query control panel. 2. Select the database (.hvd) file that you wish to search from the Database to Search browser on the left of the control panel, then click Set. The name of the database appears below the data entry field. Note: this may take a couple of minutes because the interface looks at the database to find what properties (database fields) exist. 3. Select the properties you wish to retrieve from the database for your query matches. To do this click the Retrieve Fields button. This brings up the Retrieve Fields control panel (yours may look different from experiment to experiment, depending on the database you specify). To select fields, you simply click the desired field s name in the list. Any field name that is highlighted (inverse text colors) will be retrieved when the query is issued. The Select All and Unselect All buttons on this panel allow you to select all or de-select all fields for retrieval. 4. Choose a name for the output of your database search (.sd) using the Save Results In browser. The name of the output file appears below the data entry field. Note: If a file by this name already exists it will be overwritten by the search. 5. Choose the source of the query for the search using the Use Query popup. The default value, Current Model, means that Cerius 2 Hypothesis and Receptor Models/April

126 14. Performing a Database Query the search query will be formed from the current model with all the query features you added using the Edit Query command. A value of MDL Clipboard specifies that the query be taken from the clipboard of an MDL query editor that is running in the background. This is only recommended for experienced ISIS users. A value of MOL File will bring up a browser from which you can specify the name of an existing MOL file to use as your query source. 6. If you want to put an upper limit on the number of structures that can be retrieved by the search, check the Limit Hits check box and enter the maximum number of hits you wish to be retrieved. If you just want to see how many structures would be retrieved from your search, check the Count only check box. 7. Choose which type of ISIS database query you wish to perform from the Type popup. Substructure and Similarity type searches perform 2D searches. Choose 3D search if your query contains distance features or if you want to retrieve 3D data fields from the database for your query. Substructure search is an exact search, meaning that the query must be exactly present in the hit. Similarity search is a fuzzy search and the hits are retrieved based on a score of how close they are topologically to the query. The Value parameter for Similarity search represents the threshold value for including a match in the search output. For example, if your query is piperidine, the substructure search will retrieve any compound that contains a piperidine; however, similarity search will also retrieve pyridines, etc. Similarity search retrieves the hits based on a Tanimoto calculation using topological keys (fingerprints). 8. When you have all the search parameters set, click the SEARCH button to initiate the database query search. When a query search completes, the results will be loaded into the Browse Hits browser. Defining a Database Domain from Prior Query Hit List ISIS database searches produced by a Cerius 2 query automatically produce a hitlist (.hlist) file in addition to the MACCS SD file. The hitlist file contains the corresponding models and selected database fields data. A hitlist file contains only the references to hits found in a prior query and is specific to the hview (.hvd) file 112 Cerius 2 Hypothesis and Receptor Models/April 1999

127 Submitting a query employed. A hitlist file always contains all the hits that matched a query regardless of the maximum number of hits that were specified for retrieval (which governs how many models are retrieved and loaded into the hitlist browser). There are three primary uses for hitlist files: Retrieving structures and data for the hits of a prior query. You may wish to retrieve more or less models than originally specified for viewing in Hit List Browser and/or retrieve a different set of properties (field data) without having to repeat the same database query. Performing a subset (domain) search. You may wish to refine the results of your first query using a second query. Defining a database domain using combined hitlist files. You may wish to retrieve models which were, for example, hits common to two or more independent database queries. To define a Database Domain for a query: 1. Ensure that the hview file selected from the Database to Search browser is one previously employed to produce a hitlist (that is, a query search performed using Cerius 2 version 4.0). If not, you must redo your original query to produce a hitlist. 2. Click the Define Domain button to bring up the Define Database Subset panel. This panel contains a list of hitlists you have referenced during the current session of Cerius 2. To add to this list, click the Add Hit List button to open the Add Prior Hit Lists panel. This panel is a file browser where you can locate a previously generated hitlist. A hitlist file name will have the form <hview>@<date>- <#hits>.hlist, unless it was generated using a name you specified in the Save Hit list In value on the Submit Query panel. 3. After adding hitlists to the Define Database Subset listbox you should click the prior hitlists you want to use. The Hit list combination logic value gives you three ways to define the search domain from more than one hit list: UNION means that hits in all the selected hitlists are used. Cerius 2 Hypothesis and Receptor Models/April

128 14. Performing a Database Query INTERSECTION means that only hits common to all the selected hitlists define the database domain. SUBTRACT means that only hits in the first hitlist selected, which are not in any of the other selected hitlists, define the database domain. 4. With the database domain defined using prior hitlists, you can now perform the query as if you were querying the whole database. You may also choose to not perform a query and instead retrieve only the models and database field data from this domain. To do this set the Query: selector on the Submit Query panel to Retrieve Domain. Note 1. The database query will fail unless all of the prior hitlists selected originated from queries using the same (or equivalent) hview (.hvd) file. 2. The first selected hitlist when using the SUBTRACT option refers to the top-most selection. You can change the order of the hitlists in the Define Database Subset panel by adding and removing hitlists using the Add Hit List and Remove Selected buttons. 3. You specify a database domain search by selecting one or more hitlists in the Define Database Subset panel. To revert back to querying the entire database, simply deselect all hitlists. You may use the Unselect all button to do this. Database queries in Cerius 2, by default, output both a hitlist and an SD file containing retrieved model data. Generating the SD file can take a considerable time unless you limit the number of results. To generate only a hitlist from a query, you may set the output file name for Save Results In to. If you do not wish to generate a hitlist because, for example, you are only retrieving data from a prior hitlist, uncheck the Save Hit list In option. Retrieving and browsing database hits When a search is complete, you can retrieve and examine the hits (that is, structures) that satisfy your criteria by using the Hit Browser control panel. 114 Cerius 2 Hypothesis and Receptor Models/April 1999

129 Retrieving and browsing database hits Accessing the Hit Browser control panel To access the Hit Browser control panel, select Browse Hits from the QUERY DATABASE menu card or from the MDL (ISIS), CATSHAPE, or CATALYST INTERFACE menu cards. This opens the Hit Browser control panel. The Hit Browser consists of a table to display the list of hits and a pictograph area to display a stick model of the current hit. The name of the current structure is listed as a label in the pictograph area. In addition, tools are supplied to manipulate the hit list. If hits are found, they are sorted according to the classification scheme you have specified. Each row of the hit list includes the following information: Hit name The name of the molecule that has been identified as a hit. Atoms The number of atoms in the structure. Bonds The number of bonds in the structure. In addition to the stick model in the pictograph area, the Hit Browser creates a model of the currently displayed hit. The model is listed in the Model Manager and behaves as a standard model. If your search generates no hits, the Hit Browser displays no pictograph and no listing. The message No Hits Found is displayed in the middle of the pictograph area. Manipulating a hit list To browse the hit list If you want to examine structures in the current hit list or a hit list from a previous search, use the Hit Browser control panel to complete the following procedures: Use the buttons above the pictograph area to move sequentially through the hit list or to go to the first or last structure on the list. When you click a button, the structure that is displayed in the pictograph area depends on the button you have clicked as indicated in the illustration: Cerius 2 Hypothesis and Receptor Models/April

14. Performing a Database Query Current structure number First hit One lower than current hit Last hit One higher than current hit To display a specific structure To change structures listed in the

130 14. Performing a Database Query Current structure number First hit One lower than current hit Last hit One higher than current hit To display a specific structure To change structures listed in the table To display a structure in the Models window The current structure number is displayed in the data box in the middle of the scroll buttons field. The Total Hits field above the pictograph table tallies the total number of structures retrieved from a search. Highlight a row in the hit list table, enter a structure number in the data entry box above the pictograph, or scroll through the list using the tools above the pictograph until you reach the desired structure number. The structure you have specified is displayed as a pictograph. Make a selection from the View popup. You can choose to list All Hits or only leaders (that is, structures with the highest score in each group). Leaders is the default selection. If you want the structure currently selected and displayed in the pictograph area to also be displayed in the Models window, check the Display in Model Window check box. To display a different hit list 1. Click Select Hit List. The Select Hit List control panel is displayed: 116 Cerius 2 Hypothesis and Receptor Models/April 1999

131 Retrieving and browsing database hits To delete hits 2. Specify the type of list to load from the Hit Results popup. The options are Hitlist (created from a Cerius 2 session) and MDL (MACCS SD file created by Catalyst, CatShape, or ISIS). a. If you select Hitlist, highlight the list you want to use in the browser and click View. The list is loaded into the Hit Browser control panel. b. If you select MDL, the Select Hit List control panel changes slightly to display a standard Cerius 2 file browser. Locate the file you want to use and click Load. The MDL file is loaded in the Hit Browser control panel. Click Delete All Hits to delete the hits from the current hit list. In addition to the tools described in the preceding paragraphs, you can use the standard buttons in the table tool bar of the Hit Browser control panel to manipulate the data in the hit list table. For more information on these tools, see the discussion of tables in the Cerius 2 Modeling Environment. Cerius 2 Hypothesis and Receptor Models/April

Crystal Builder, Surface Builder, Interface Builder, Polymer Builder, Amorphous Builder. Release 4.0 April 1999 (last full revision March 1997)

Cerius 2 Builders Crystal Builder, Surface Builder, Interface Builder, Polymer Builder, Amorphous Builder Release 4.0 April 1999 (last full revision March 1997). 9685 Scranton Road San Diego, CA 92121-3752