Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller

Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics The study of small-molecular-weight drug candidates on gene/protein function.

Chemogenomic Chemogenomics defines, in principle, the screening of the chemical universe, i.e., all possible chemical compounds, against the target universe, i.e., all proteins and other potential drug targets. Mission impossible!!

Chemogenomic The solution the method defines the screening of congeneric chemical libraries against certain target families, e.g., the G proteincoupled receptors, nuclear receptors, different protease families, kinases, phosphodiesterases, ion channels, transporters, etc.

Chemogenomic Requirements A compound library A representative biological system Target library Single cell Organism A reliable readout Gene/protein expression High-throughput screening (binding or functionality assays)

Chemogenomic Completion of a two-dimentional matrix, representing the interaction of tragets/genes and compounds by values of binding affinities (Ki) or functional effect (IC50).

Ligand and Target Spaces Assumptions for any chemogenomi-based approach (a) Compounds sharing some chemical similarity should also share targets. (b) Targets sharing similar ligands should share similar patterns (binding sites). Question How do we measure the distances between two ligands or two targets?

Ligand Space Distance measuring between two compounds is done by solving a similarity matrix The compounds properties are often described using descriptors Descriptor classification One-dimensional Two-dimensional Three-dimensional

Ligand Space (Descriptors) 1D descriptors Easy and fast to compute Describe global properties (MW, atom and bond counts) Based on the chemical formulae Prediction of physicochemical properties Polar surface area Solubility Rings Discrimination between compound sets Drugs vs. nondrugs Ligands from targetfamilies 1D linear representations of compounds SMILES (Simplified Molecular Input Entry System)

Ligand Space (Descriptors) 2D descriptors Most common ligand descriptors Describe topological properties (maximum common substructure, structural keys) Encode both atomic and bond properties 2D sketch figure Scanning libraries for similar substructures or fragments Graph-based method Molecular graph (subfamily clustering) Computational slow Fingerprint-based method Bit strings (0 and 1 = atoms, fragments, rings..) Fingerprints easy for comparison Also used in receptor-ligand recognition

Ligand Space (Descriptors) 3D descriptors Describe conformational properties (atomic coordinates, potentials, fields, shapes) Necessities for proper alignment Comparison in same 3D Cartesian space Conformational space accessible to each ligand Bit strings vs. structure comparison Structure comparison can produce false positives 3D information is stored in bit strings Binary representation of 2D or 3D properties Tanimoto coefficient (simple similarity indicies)

Ligand Space (Descriptors)

Target Space Chemoproteomics Traget = proteins Dimension Classification scheme Databases 1D 2D 3D By sequence By patterns By secondary structure, fold By atomic coordinates By binding site UniProt, Pfam PRINTS, PROSITE SCOP, CATH PDB, MODBASE BindingMOAD, sc-pdb

Target Space The amino acid sequence (1D) Clustering of targets into target-families Large variation in sequence length even among family members e.g., human GPCRs range from 290 to 6200 residues Structural motifs (2D) Mapping of a-helices, b-sheets, coils and random structures 3D Structure Atomic coordinates derived by X-ray diffraction or NMR Structural fold Ligand-binding site, higher similarity among related targets Pharmacological profile Binding affinity for a panel of ligands Modifying pharmacological profiles of druges are widely used in drug design

Protein-Ligand Space Full matrices (affinity or structural information) Experimental data are stored in the matrices Affinity of a new compound to a known target Measuring structure-activity relationships Prediction of a global pharmacological profile Advantages Based on experimental data Superior to computed descriptors Disadvantages Enormous amount of data is necessary Highly cost consuming (not realistic in academic environments) Interaction fingerprints (IFPs) Replacement of affinity with molecular interaction descriptors Conversion of atomic coordinates of protein-ligand complexes into bit strings.

Ligand-based Chemogenomic Annotating ligand libraries Molecules sharing enough similarity to existing ligands for which a target profile is known have enhanced probability of sharing the same biological profile. Ligand libraries Targets In vitro affinity data ADME properties Biological annotated compound libraries AurSCOPE (160.000 GPCR ligands and 77.000 kinase inhibitors) MedChem database (Biological and pharmacological information of 650.000 compounds) ChemBank (50.000 compounds in 441 high-troughput screening assays) Natural product-oriented chemical libraries Evolutionary pressure Highly specific binding mechnisms

Ligand-based: Privileged Structures Coined by Evans et al. (1,4-benzodiazepine scaffold) A privileged structure is defined as a substructure or scaffold exhibiting strong preference for a particular area of the target space. Suitable to orient design of trageted compound libraries Biphenyl: protein-binding motif No particular preference for target family 2-tetrazolo-biphenyl GPCRs Only few are really selective

Ligand-based In silico Screening Target fishing Reference compounds set (known 2D or 3D descriptors) Screening procedure (QSAR, Bayesian analysis or pharmacophore) Screening collection for identification of new compounds

Ligand-based In silico Screening Mestres et al. Library of molecules targeting nuclear hormone receptors NHR 2000 ligands 25 receptors Easily distinction between selective and promiscuous scaffolds SHannon Entropy Descriptors SHED Novartis Prediction of target profiles from extended connectivity fingerprints Machine learning algorithm based on Bayesian statistics Wombat database (1230 unique SMILES) Bayesian models was produced (trained) for each activity class Prediction is done by calculating the probability of each test compound to become a ligand for each of the tragets Improvement by concatenate all target-associated probabilities Bayes affinity fingerprint 2D descriptors was more predictive than 3D (not for singletons)

Ligand-based In silico Screening Drawback Categorization of training set compounds according to their molecular target, without checking: Does it really bind? Where it binds? How it binds? Training a machine learning algorithm with incorrect data Alternatives 3D pharmacophores from protein-ligand complexes Experimentally determined atomic coordinates Experimentally determined pharmacological activities Limited chemical diversity observed among PDB ligands

Target-based Chemogenomic Selectivity control Selectivity of ligands among family related targets Proteome-wide comparative modeling Structural data (X-ray or NMR) Sequence-based comparison Structure-based comparison Comparing Molecular fields Comparing 3D structures

Sequence-based Comparison Multiple alignment of all targets Comparison of any kind of target families Lack of high-resolution structural data GPCRs are ideal candidates for sequence-based comparison Only bovine rhodopsin has been crystallised Important target family for drug design Key residues are extracted and concatenated Ungapped sequence (30 residues) Distance matrix based on: Sequence identity Sequence similarity Physicochemical properties Cavity-based clustering of 372 human GPCRs Reproduced a perfect full sequence based tree Target comparison across a family is possible using only few residues Applications Simple analysis of binding site regions by residue conservation Target hopping used to discover receptor ligands to a particular receptor

Sequence-based Comparison

Structure-based Comparison High-resolution structural data is crucial for homology modeling, however only the ligand-binding site are compared Comparing Molecular Fields Molecular interaction fields (MIFs) Structural alignment of targets Interaction energies Probe atoms at each point of a 3D grid (binding site) MIFs placed in a global matrix Rows: Targets Columns: Interaction energies Analysis either by: Principal component analysis Hierarchical clustering Highly dependent on: Structural alignment, grid resolution and probe atoms

Structure-based Comparison Comparing 3D Structures Global structural alignment methods GASH DaliLite CE Alignment of predefined structural motifs Matching templates to a reference protein Not all proteins sharing binding sites for a particular ligand share any structural template similarities Structural alignment by physicochemical property description Surface-based comparison Relatively slow and thus incompatible with proteome-wide comparison SuMO, Cavbase, SiteEngine, SitesBase and CPASS Emerged in the last years Represent active site by pseudocenters encoding physicochemical properties (H-bonding, capacity, aromaticity, hydrophobicity and charge) Pseudocentres are linked by edges providing a molecular graph Detection of maximal common subgraphs (clique detection) Detection of local similarities at ligand-binding subpockets for proteins with totally different fold and catalytic activities

Structure-based Comparison Comparing 3D Structures Interpretation of computated similarity scores often difficult Active sites of different dimensions Larger sites tend to present more matches even if the smallest is more similar Surgand et al. projected an active site on a dimensionless 80- triangled sphere of cavity descriptors Measuring normalized distance in descriptor space

Target-Ligand-based Chemogenomic Chemical annotation of target binding sites Various chemical compound libraries exist Binding information is crucial Protein/binding site must annotated by ligand chemotype SMID (Small Molecule Interaction Database) annotate protein sequence by domain-specific ligands Browse likely ligands to a protein of unknown 3D structure Ligand-annotated binding sites from PDB BindingMOAD and sc-pdb Pharmacological point of view Prioritize ligands for designing targeted compound libraries

Target-Ligand-based Chemogenomic To browse and predict protein-ligand complexes, one needs to set up simple descriptors for both ligands and proteins from knowledge databases and concatenate them into a single protein-ligands description. Two dimensional searches Use experimental binding affinity matrices and define appropriate QSAR models to predict affinity of new compounds Three-dimensional searches Dock each ligand of compound library into each active site of target library Molecular inverse docking approach Scoring functions cannot quantify very heterogeneous proteinligand complexes Computation of IFP strings Converts 3D information about protein-ligand interaction to 1D

Target-Ligand-based Chemogenomic Three-dimensional searches 3D-based docking-independent methods Retrieving ligand from protein and vice versa Encode protein and ligand properties with similar descriptors CoLiBRI (complementary ligands based on receptor information) Ligand and protein described using same molecular descriptors (TAE-RECON) Shape and electronic properties of isolated atoms Mapping patterns of active sites onto patterns of their complementary ligands and vice versa High test results when similar training set

Final remarks High-troughput data (structure, binding affinity, etc.) Ligand Target Linking data either by ligand or target focusing Target-based Chemogenomics Ligand-based Chemogenomics Target-Ligand-bases Chemogenomics Selectivity profiles for therapeutic usage Not more selective ligands In silico approach

Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller