Introduction to Chemoinformatics

Similar documents
Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Dictionary of ligands

Introduction to Chemoinformatics and Drug Discovery

Data Mining in the Chemical Industry. Overview of presentation

Bioinformatics Workshop - NM-AIST

Chemistry 123: Physical and Organic Chemistry Topic 1: Organic Chemistry

Organic Chemistry. Chemical Bonding and Structure (2)

Organometallics & InChI. August 2017

Analyzing Small Molecule Data in R

Chapter 5 Stereochemistry

Chapter 5 Stereochemistry

On InChI and evaluating the quality of cross-reference links

CSUS - CH6B Fischer projection and R/S configurations Instructor: J.T., P: 1. a) Fischer Projection can be rotated by 180 only!

CHAPTER 5. Stereoisomers

Chapter 5 Stereochemistry. Stereoisomers

Imago: open-source toolkit for 2D chemical structure image recognition

10/4/2010. Sequence Rules for Specifying Configuration. Sequence Rules for Specifying Configuration. 5.5 Sequence Rules for Specifying.

Web Search of New Linearized Medical Drug Leads

Dock Ligands from a 2D Molecule Sketch

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009

Enantiomers 2:22 PM 1

Similarity Search. Uwe Koch

Pipeline Pilot Integration

Canonical Line Notations

Alkenes and Alkynes 10/27/2010. Chapter 7. Alkenes and Alkynes. Alkenes and Alkynes

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Stereochemistry. Based on McMurry s Organic Chemistry, 6 th edition

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

CHM1321 Stereochemistry and Molecular Models Assignment. Introduction

JCICS Major Research Areas

Chapter 2. Alkanes and Cycloalkanes; Conformational and Geometrical Isomerism

Basic Stereochemical Considerations

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

9. Stereochemistry. Stereochemistry

Why am I learning this, Dr. P?

Exercises for Windows

Calculate a rate given a species concentration change.

(2/94)(6,7,9/95)(8,9/97)(12/99)(1/00) Neuman Chapter 4

STEREOISOMERISM - GEOMETRICAL ISOMERISM

240 Chem. Stereochemistry. Chapter 5

Molecular Modelling for Medicinal Chemistry (F13MMM) Room A36

Solved and Unsolved Problems in Chemoinformatics

Organic Chemistry. M. R. Naimi-Jamal. Faculty of Chemistry Iran University of Science & Technology

ORGANIC - BRUICE 8E CH.3 - AN INTRODUCTION TO ORGANIC COMPOUNDS

Tautomerism in chemical information management systems

Docking. GBCB 5874: Problem Solving in GBCB

OpenDiscovery: Automated Docking of Ligands to Proteins and Molecular Simulation

Comprehensive Chemoinformatics since Web-based, client/server, and toolkit approaches. Native Oracle (cartridge) and Microsoft technology.

BIOINF Drug Design 2. Jens Krüger and Philipp Thiel Summer Lecture 5: 3D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

If you like us, please share us on social media. The latest UCD Hyperlibrary newsletter is now complete, check it out.

Fast similarity searching making the virtual real. Stephen Pickett, GSK

CH 3 C 2 H 5. Tetrahedral Stereochemistry

ORGANIC - BROWN 8E CH.3 - STEREOISOMERISM AND CHIRALITY.

C h a p t e r S e v e n : Haloalkanes: Nucleophilc Substitution and Elimination Reactions S N 2

Pipeline Pilot Integration

Loudon Chapter 7 Review: Cyclic Compounds Jacquie Richardson, CU Boulder Last updated 8/24/2017

Learning Organic Chemistry

Chapter 25 Organic and Biological Chemistry

Three-Dimensional Structures of Drugs

Chem 341 Jasperse Ch. 9 Handouts 1

Chapter 25: The Chemistry of Life: Organic and Biological Chemistry

Organic Chemistry 321 Workshop: Spectroscopy NMR-IR Problem Set

CHE1502. Tutorial letter 201/1/2016. General Chemistry 1B. Semester 1. Department of Chemistry CHE1502/201/1/2016

Lecture 8: September 13, 2012

Similarity methods for ligandbased virtual screening

Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

International Chemical Identifier for Reactions (RInChI)

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

To learn how to use molecular modeling software, a commonly used tool in the chemical and pharmaceutical industry.

Chemistry 11 Hydrocarbon Alkane Notes. In this unit, we will be primarily focusing on the chemistry of carbon compounds, also known as.

Use of CTI Index for Perception of Duplicated Chemical Structures in Large Chemical Databases

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Command-line tools of ChemAxon: tips and tricks

ORGANIC - EGE 5E CH. 5 - ALKANES AND CYCLOALKANES.

An introduction to Molecular Dynamics. EMBO, June 2016

STEREOCHEMISTRY AND STEREOELECTRONICS NOTES

Ligand Scout Tutorials

The IUPAC Chemical Identifier

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

RInChI. International Chemical Identifier for Chemical Reactions (RInChI) Guenter Grethe, Jonathan Goodman, Chad Allen

BENZENE & AROMATIC COMPOUNDS

Learning Guide for Chapter 13 - Alkynes

Chemistry 102 Organic Chemistry: Introduction to Isomers Workshop

Carbon Compounds. Chemical Bonding Part 2

Clustering Ambiguity: An Overview

Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm

Molecular Graphics. Molecular Graphics Expt. 1 1

CDK & Mass Spectrometry

Chapter 4: Amino Acids

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Transcription:

Introduction to Chemoinformatics www.dq.fct.unl.pt/cadeiras/qc Prof. João Aires-de-Sousa Email: jas@fct.unl.pt

Recommended reading Chemoinformatics - A Textbook, Johann Gasteiger and Thomas Engel, Wiley-VCH 23. Handbook of Chemoinformatics, Johann Gasteiger, Wiley-VCH 23. An Introduction to Chemoinformatics, Andrew R. Leach, Valerie J. Gillet, Springer 27. 2

CHEMOINFORMATICS Definition (wikipedia) Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery. In the U.S., recent NIH emphasis has been placed on developing public domain Cheminformatics research by creating six Exploratory Centers for Cheminformatics Research (ECCRs) as part of the NIH Molecular Libraries Initiative. 3

Size of the domain 4

Types of information Molecular structures (compounds) Properties (physical, chemical, biological) m.p., viscosity, solubility, spectra, electrophilicity, stability, toxicity, pharmacological activity, Reactions 5

Types of learning Deductive learning (quantum methods, molecular mechanics) Inductive learning (model building from experimental data): artificial intelligence, machine learning, statistics, structure-property relationships 6

Introduction to Chemoinformatics. Representation of molecular structures 7

A hierarchy of structure representations Name (S)-Tryptophan 2D Structure 3D Structure Molecular surface 8

Storing molecular structures in a computer 9

Storing molecular structures in a computer Information must be coded into interconvertible formats that can be read by software applications. Applications: visualization, communication, database searching / management, establishment of structure-property relationships, estimation of properties,

Coding molecular structures A non-ambiguous representation identifies a single possible structure, e.g. the name o-xylene represents one and only one possible structure. A representation is unique if any structure has only one possible representation (some nomenclature isn t, e.g.,2-dimethylbenzene and o-xylene represent the same structure).

IUPAC Nomenclature IUPAC name : N-[(2R,4R,5S)-5-[[(2S,4R,5S)-3-acetamido-5-[[(2S,4S,5S)-3- acetamido-4,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]methoxymethyl]-4- hydroxy-6-(hydroxymethyl)oxan-2-yl]methoxymethyl]-2,4-dihydroxy-6- (hydroxymethyl)oxan-3-yl]acetamide 2

IUPAC Nomenclature Advantages: standardized systematic classification stereochemistry is included widespread unambiguous allows reconstruction from the name Disadvantages: extensive rules alternative names are allowed (non-unique) long complicated names IUPAC name : N-[(2R,4R,5S)-5- [[(2S,4R,5S)-3-acetamido-5-[[(2S,4S,5S)- 3-acetamido-4,5-dihydroxy-6- (hydroxymethyl)oxan-2-yl]methoxymethyl]- 4-hydroxy-6-(hydroxymethyl)oxan-2- yl]methoxymethyl]-2,4-dihydroxy-6- (hydroxymethyl)oxan-3-yl]acetamide 3

Linear notations Represent structures by linear sequences of letters and numbers, e.g. IUPAC nomenclature. Linear notations can be extremely compact, which is an advantage for the storage of structures in a computer (particularly when disk space is limited). Linear notations allow for an easy transmission of structures, e.g. in a Google-type search, or in an email. 4

The SMILES notation Example: SMILES representation : CCCO. Atoms are represented by their atomic symbols. 2. Hydrogen atoms are omitted (are implicit). 3. Neighboring atoms are represented next to each other. 4. Double bonds are represented by =, triple bonds by #. 5. Branches are represented by parentheses. 6. Rings are represented by allocating digits to the two connecting ring atoms. Example : SMILES: CCC(Cl)C=C 5

The SMILES notation. Atoms are represented by their atomic symbols. 2. Hydrogen atoms are omitted (are implicit). 3. Neighboring atoms are represented next to each other. 4. Double bonds are represented by =, triple bonds by #. 5. Branches are represented by parentheses. 6. Rings are represented by allocating digits to the two connecting ring atoms. b e a c f a b c d e f SMILES: CCC(Cl)C=C d 6

The SMILES notation. Atoms are represented by their atomic symbols. 2. Hydrogen atoms are omitted (are implicit). 3. Neighboring atoms are represented next to each other. 4. Double bonds are represented by =, triple bonds by #. 5. Branches are represented by parentheses. 6. Rings are represented by allocating digits to the two connecting ring atoms. SMILES: CCCCCC 7

The SMILES notation. Atoms are represented by their atomic symbols. 2. Hydrogen atoms are omitted (are implicit). 3. Neighboring atoms are represented next to each other. 4. Double bonds are represented by =, triple bonds by #. 5. Branches are represented by parentheses. 6. Rings are represented by allocating digits to the two connecting ring atoms. 7. Aromatic rings are indicated by lower-case letters. SMILES: Ncccccc 8

The SMILES notation Is unambiguous (a SMILES string unequivocally represents a single structure). Is it unique?? SMILES: Ncccccc but also ccccccn or ccc(n)ccc Solution: algorithm that guarantees a canonical representation (each structure is always represented by the same SMILES string) More at: http://www.daylight.com/dayhtml_tutorials/index.html 9

SMILES notation in MarvinSketch Paste 2

SMILES notation in MarvinSketch 2

The InChI notation (IUPAC International Chemical Identifier) Example: A digital equivalent to the IUPAC name for a compound. Five layers of information: connectivity, tautomerism, isotopes, stereochemistry, and charge. An algorithm generates an unambiguous unique notation. Official web site : http://www.iupac.org/inchi/ 22

The InChI notation (IUPAC International Chemical Identifier) Example: Each layer in an InChI string contains a specific class of structural information. This format is designed for compactness, not readability, but can be interpreted manually. The length of an identifier is roughly proportional to the number of atoms in the substance. Numbers inside a layer usually represent the canonical numbering of the atoms from the first layer (chemical formula) except H. 23

Graph theory A molecular structure can be interpreted as a mathematical graph where each atom is a node, and each bond is an edge. Such a representation allows for the mathematical processing of molecular structures using the graph theory. H 3 C H 3 C CH 3 24

25 Matrix representations A molecular structure with n atoms may be represented by an n n matrix (H-atoms are often omitted). Adjacency matrix : indicates which atoms are bonded. 6 5 4 3 2 6 5 4 3 2 2 3 4 5 6

26 Matrix representations A molecular structure with n atoms may be represented by an n n matrix (H-atoms are often omitted). Adjacency matrix : indicates which atoms are bonded. 6 5 4 3 2 6 5 4 3 2 2 3 4 5 6

Matrix representations A molecular structure with n atoms may be represented by an n n matrix (H-atoms are often omitted). Adjacency matrix : indicates which atoms are bonded. 2 3 4 5 6 2 3 5 6 2 3 4 4 5 6 27

28 Matrix representations Distance matrix : encodes the distances between atoms. The distance is defined as the number of bonds between atoms on the shortest possible path. 3 2 3 4 6 2 2 3 5 3 2 2 3 4 2 2 3 3 2 2 2 4 3 3 2 6 5 4 3 2 2 3 4 5 6 Distance may also be defined as the 3D distance between atoms.

29 Matrix representations Bond matrix : indicates which atoms are bonded, and the corresponding bond orders. 2 6 2 5 4 3 2 6 5 4 3 2 2 3 4 5 6

Connection table A disadvantage of matrix representations is that the matrix size increases with the square of the number of atoms. A connection table lists the atoms of a molecule, and the bonds between them (may include or not H-atoms). 2 3 4 5 6 List of atoms C 2 C 3 C 4 Cl 5 C 6 C List of bonds st 2 nd order 2 2 3 3 4 3 5 5 6 2 3

The MDL Molfile format ( http://www.mdli.com/downloads/public/ctfile/ctfile.jsp ) Nr of bonds Description of an atom Nr of atoms Description of a bond 2 3 5 6 4 3

The MDL Molfile format 32

The atom block 33

The atom block 34

The atom block 35

The atom block 36

The atom block 37

The MDL Molfile format 38

The bond block 39

The bond block 4

The bond block 4

The bond block 42

The MDL Molfile format 43

The properties block 2 charged atoms 44

The properties block 2 charged atoms atom 4: charge + atom 6: charge - 45

The properties block entry for an isotope 46

The properties block entry for an isotope atom 3: mass=3 47

The SDFile (.SDF) format Includes structural information in the Molfile format and associated data items for one or more compounds. Molfile Associated data $$$$ Molfile2 Associated data $$$$ 48

The SDFile (.SDF) format Molfile Associated data $$$$ Molfile2 Associated data $$$$ Example Associated data (molecular) 49

The SDFile (.SDF) format Molfile Associated data $$$$ Molfile2 Associated data $$$$ Example Associated data (atomic) 5

The SDFile (.SDF) format Molfile Associated data $$$$ Molfile2 Associated data $$$$ Example Associated data (molecular) 5

The SDFile (.SDF) format Molfile Associated data $$$$ Molfile2 Associated data $$$$ Example Delimiter Beginning of Molfile2 52

The SDFile (.SDF) format Molfile Associated data $$$$ Molfile2 Associated data $$$$ Example 53

The ChemAxon Standardize program Conversion of file formats Generation of unique SMILES strings Standardization of structures Addition of H-atoms, removal of H-atoms, assignment of aromatic systems, cleaning of stereochemistry, 54

The ChemAxon Standardize program 55

Markush structures A Markush structures diagram is a type of representation specific for a SERIES of chemical compounds. The diagram can describe not only a specific molecule, but several families of compounds. It includes a core and substituents, which are listed as text separately from the diagram. R= H, halogen, OH, COOH R2= H, CH 3 X= Cl, Br, CH3 These are mostly used in databases of patents. 56

Representation of molecular fragments Just like a text document may be indexed on the basis of specified keywords, a chemical structure may be indexed on the basis of specific chemical characteristics, usually fragments. Fragments may be, e.g., small groups of atoms, functional groups, rings. These are defined beforehand. It is an ambiguous representation: different structures may have common fragments. Fragments: -OH -COOH >C=O -NH2-3-indole 57

Fingerprints Fingerprints encode the presence or absence of certain features in a compound, e.g., fragments. If 2 fragments are defined, the fingerprint has a length of 2. It is an ambiguous representation. Allows for similarity searches. 58

Hashed Fingerprints Encode the presence of sub-structures. These are not previously defined. All patterns are listed consisting of atom 2 bonded atoms and their bond Sequences of 3 atoms and their bonds Sequences of 4 atoms and their bonds Patterns up to 3 atoms C, N, O C-C, C-N, C=O, C-O C-C-C, C-C-N, C-C=O, C-C-O, O=C-O 59

Hashed Fingerprints Each pattern activates a certain number of positions (bits) in the fingerprint, in the following example two bits / pattern: C-N C-C-C C-C=O An algorithm determines which bits are activated by a pattern. The same pattern always activates the same bits. The algorithm is designed in such a way that it is always possible to assign bits to a pattern. There may be collisions. Pre-definition of fragments is not required. But it is not possible to interpret fingerprints. 6

Hashed Fingerprints C-N C-C-C C-C=O H-atoms are omitted. Stereochemistry is not considered. Parameters to define: fingerprint length, size of patterns, and number of bits activated by each pattern. Main application: similarity search in large databases. 6

Hashed Fingerprints Influence of parameters Length of fingerprint: too short almost all bits=, poor discrimination of molecules. too large too many bits=, too much disk space required. Maximum size of patterns: too short poor discrimination of molecules. too large ability to discriminate molecules, but many bits=. Nr of bits a pattern activates: too few poor ability to discriminate between patterns. too many ability to discriminate between patterns, but many bits=. More at: http://www.daylight.com/dayhtml/doc/theory/theory.finger.html 62

Hashed Fingerprints or Daylight fingerprints Can be calculated with several software packages, e.g. the generfp command of the program JCHEM (Chemaxon). Length (in bytes) Nr of bits activated by a pattern Maximum size of patterns Output file Input file 63

Hashed Fingerprints or Daylight fingerprints Can be calculated with the generfp command of the program JCHEM (Chemaxon). 64

Similarity measures based on fingerprints Similarity between compounds X and Y can be calculated from the similarity between their fingerprints. a = nr of bits on in X but not in Y. b = nr of bits on in Y but not in X. c = nr of bits on both in X and in Y. d = nr of bits off both in X and in Y. n = ( a + b + c + d ) is the total number of bits Euclidean coefficient : ( c + d ) / n (common bits in X and Y) Tanimoto coefficient : c / (a + b + c) 65

Hash codes Hash codes result from an algorithm that transforms a molecular structure into a sequence of characters or numbers encoding the presence of fragments in the molecule. They have a fixed length. Hash codes are not interpretable. They re used as unique identifiers of structures, e.g. in large databases of compounds hash codes allow for the fast perception of an exact match between two molecules. Hash codes can also be defined for atoms, or bonds. 66

Representation of stereochemistry The Cahn-Ingold-Prelog (CIP) rules Useful for nomenclature but difficult to implement: assignment of priorities. 2 3 3 2 But in a Molfile Atoms are ranked. Priorities can easily be assigned corresponding to the atoms ranks in the Molfile. CIP priorities : OH > CO 2 H > CH 3 > H 67

Representation of stereochemistry Parity in Molfiles. Number the atoms surrounding the stereo center with, 2, 3, and 4 in order of increasing atom number (position in the atom block) (a hydrogen atom should be considered atom 4). 2. View the center from a position such that the bond connecting the highest-numbered atom (4) projects behind the plane formed by atoms, 2, and 3. 3. Parity if atoms -3 are arranged in clockwise direction in ascending numerical order, or parity 2 if counterclockwise. 68

Representation of stereochemistry Molfile Chiral center: atom. Ligands: atoms 2, 3, 4 and H. H is the last. Looking at the chiral center with the H-atom pointing away (as in the figure) atoms 2, 3, and 4 are arranged counterclockwise. Therefore parity = 2. 69

Representation of stereochemistry Molfile. Number the atoms surrounding the stereo center with, 2, 3, and 4 in order of increasing atom number (position in the atom block) (a hydrogen atom should be considered atom 4). 2. View the center from a position such that the bond connecting the highestnumbered atom (4) projects behind the plane formed by atoms, 2, and 3. 3. Parity if atoms -3 are arranged in clockwise direction in ascending numerical order, or parity 2 if counterclockwise. Chiral center: atom 4. Ligands: atoms, 3, 5, and H. H is the last. Looking at the chiral center with the H-atom pointing away (as in the figure) atoms, 3, and 5 are arranged clockwise. Therefore parity =. 7

Representation of stereochemistry Molfile - bond block 7

Representation of stereochemistry in SMILES notation Chirality in a tetrahedral center is specified by @ (anticlockwise direction) or @@ (clockwise direction). Looking to the chiral center from the ligand appearing first in the SMILES string, the other three ligands are arranged clockwise or counterclockwise in the order of appearance in the SMILES string. >( st H 3 C 2nd NH 2 3rd O 4th OH 2nd 3rd 4th Chiral center C[C@H](N)C(O)=O @ st 2nd 3rd 4th 72

Representation of cis-trans stereochemistry in double bonds Stereochemistry around a double bond (cis/trans) is specified with characters \ and /. Example: trans-,2-dichloroethene - Cl/C=C/Cl (starting at the st Cl, a bond goes up (/) to C=C, and from here goes up (/) to the 2nd Cl). Cl Cl cis-,2-dicloroeteno - Cl/C=C\Cl (starting at the st Cl, a bond goes up (/) to C=C, and from here goes down (\) to the 2nd Cl). Cl Cl 73

Representation of cis-trans stereochemistry in double bonds Stereochemistry around a double bond (cis/trans) is specified with characters \ and /. H 3 C CH 3 Two cis substituents F Cl C\C(F)=C(/C)Cl Bond goes down Bond goes up 74

Representation of the 3D structure The most obvious (and common) representation consists of a Cartesian system, i.e. the x, y, and z coordinates of each atom. For a given conformation the coordinates depend on the orientation of the structure relative to the reference axes. In a Molfile, 3D coordinates can be listed. 75

Representation of the 3D structure in a Molfile 76

Representation of the 3D structure It is also possible to represent only coordinates, with no specification of bonds. Bonds may be inferred with reasonable confidence from the 3D interatomic distances. But demands some kind of computer processing. 77

Representation of the 3D structure Another representation of the 3D structure is the Z matrix, in which internal coordinates are specified (bond lengths, bond angles and dihedral angles). It is mostly used for the input to quantum chemistry software. Example for cyclopropane: dist. to at. dist. to at. 2 ang -2-3 C... C.35.. C.35 6.. 2 H.. 2. 3 2 H.. 24. 3 2 H.. 2. 2 3 H.. 24. 2 3 H.. 2. 2 3 H.. 24. 2 3 ang 9--2-3 78

Generation of a 3D structure Theoretical methods : ab initio (e.g. Gaussian) semi-empirical (e.g. Mopac, ArgusLab) molecular mechanics (e.g. Chem3D, ArgusLab) Empirical methods (e.g. CONCORD, CORINA) : use fragments with predefined geometries use rules use databases of geometries use simple optimizations 79

Generation of the 3D structure Chemaxon s Marvin 8

Generation of the 3D structure - CORINA http://www.mol-net.com/online_demos/corina_demo.html 8

Representation of molecular surfaces The 3D structure presented up to here is just the skeleton of the molecule, but a molecule also has a skin the molecular surface. The molecular surface divides the 3D space in an internal volume and an external volume. This is just an analogy with macroscopic objects, since molecules cannot rigorously be approached with classical mechanics. The electronic density is continuous, and there are probabilities of finding electrons at certain locations (it tends to zero at infinite distance from nuclei). The electronic distribution at the surface determines the interactions a molecule can establish with others (e.g. docking to a protein). 82

Representation of molecular surfaces A molecular surface can express different properties, such as charge, electrostatic potential, or hydrophobicity, by means of colors. Such properties may be experimentally determined (2D NMR, x-ray crystallography and electronic cryomicroscopy give indications about 3D molecular properties), or theoretically calculated. There are several ways of defining a surface. The most used are: van der Waals surface, surface accessible to a solvent, and Connolly surface. 83

van der Waals surface It is the simplest surface. It can be determined from the van der Waals radius of all atoms. Each atom is represented by a sphere. The spheres of all atoms are fused the total volume is the van der Waals volume, and the envelop defines the van der Waals surface. It is fast to be calculated. 84

Connolly surface It is generated by simulating a sphere rolling over the van der Waals surface. The sphere represents the solvent. The radius of the sphere may be chosen (typically it is set at.4 Å, the effective radius of water). The Connolly surface has two regions: the convex contact surface (it is a segment of the van der Waals surface) and the concave surface (where the sphere touches two or more atoms). 85

Surface accessible to the solvent The path of the center of the sphere that generates the Connolly surface defines the surface accessible to the solvent. 86

Molecular surfaces with ChemAxon MarvinSpace 87

Molecular surfaces with ChemAxon MarvinSpace 88

Molecular surfaces with ChemAxon MarvinSpace 89

Molecular surfaces with ChemAxon MarvinSpace 9