Plan. Day 2: Exercise on MHC molecules.

Plan Day 1: What is Chemoinformatics and Drug Design? Methods and Algorithms used in Chemoinformatics including SVM. Cross validation and sequence encoding Example and exercise with herg potassium channel: Use of SVM in WEKA program Day 2: Exercise on MHC molecules.

Material Lectures and compendium are available on the course program site at CBS. Data set for the herg exercise and MHC exercises will have to be downloaded on your directory or your machine. WEKA program can be installed in your own machine or can be run from life server at CBS.

Drug discovery

Drug and drug design A drug is a key molecule involved in a particular metabolic or signaling pathway that is specific to a disease condition or pathology. Action of activation (agonist) or inhibition (antagonist) to a biological target (protein, receptor, enzymes, cells ). Drug design is the approach of finding drugs by design, based on their biological targets. An important part of drug design is the prediction of small molecules binding to a target protein (pharmacophore, docking, QSAR, )

Chemoinformatics- definition F.K. Brown (1998): Annual Reports in Medicinal Chemistry, 33, 375-384 The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.

Virtual screening- Chemoinformatics Exploit any knowledge of target(s) and/or active ligand(s) and/or gene family. It involves computational technique for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates. Structure based Docking Ligand based QSAR, similarity search VS Pharmacophore based knowledge based Chemical-Biology network

Drug-likeness with a simple counting method, rule of five Octanol-water partition coefficient (logp) 5 Molecular weight 500 No. hydrogen bond acceptors (HBA) 10 No. hydrogen bond donors (HBD) 5 If two or more of these rules are violated, the compound might have problems with oral bioavailability. (Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.) Rules have always exception. (antibiotics, antibacterials and antimicrobials, )

What the chemist sees. Aspirin Sildenafil

1D structure, line notation SMILES CC(=O)OC1=CC=CC=C1C(=O)O CCCC1=NN(C2=C1NC(=NC2=O)C3=C (C=CC(=C3)S(=O)(=O)N4CCN(CC4)C) OCC)C SMILES Simplified Molecular Line Entry System http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

Aspirin What a protein sees. Sildenafil Electrostatic fields (red are negative and blue positive)

Aspirin What a protein sees. Sildenafil Green are hydrophilic area and red are hydrophobic areas

Ligand based- QSAR Based on a set of experimental data (biological activity, solubility, toxicity, permeability, ) one tries to correlate these data with some descriptors. But what can be these descriptors?

Descriptors 1D descriptors: MW, number of features, sequence based MACCS key 2D descriptors: Topological, physichochemical, BCUT,

Descriptors QSAR based on 3 D interaction energies (GRID, CoMFA...) GRID: Determines a total interaction energy. Etot = Evdw + Eelec + Ehb Structural model Molecular interaction field (GRID) PCA and PLS model 004 003 002 001 000 DRY O N1 H2O

Blue areas represent the favorable electronegative region and red the unfavorable electronegative regions (based on CoMFA) Green areas represent the favorable steric region and yellow the unfavorable steric regions (based on CoMFA).

The most common method used to correlate data are usually, PLS, SVM, ANN, K-means.

Example with Antimicrobial Peptides (AMPs) AMPs are small (10-40 amino acids), cationic and amphipatic molecules, which are encoded directly from DNA. They are ubiquitous in nature, appearing in such diverse organisms as fungi, bacteria, plants, amphibians, insects, and mammals, where they establish a first-line defence mechanism against invading pathogens or competing organisms. CAP1 8 Leucocin A Tritrpticin Defensin

Novispirin variant1 63 variant2 82 Activity S001 S002 S003 S131 Activity = y + as001 + bs002 + + zs131

Classification from a sequence based analysis Amino Acid z-scales Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables by Principal Component Analysis (PCA) Hellberg et al., J.Med.Chem. 30 (1987) 1126 Extended to non-natural amino acids Sandberg et al., J. Med. Chem. 41 (1998) 2481 Classification of GPCRs Lapinsh et al., Prot. Sci. 11 (2002) 795

Amino Acid z-scales Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables by Principal Component Analysis (PCA) z1 (lipophilicity) z2 (steric properties) z3 (electrostatic properties) Ala Phe Lys Principal Property Translation 0.07-1.73 0.09-4.92 1.30 0.45 2.84 1.41-3.14 Hellberg et al., J.Med.Chem. 30 (1987) 1126

Novispirin: From sequences to data Principal property translation (z-scales) & multivariate data analysis 52 x 20 (52+n) x 60 z-scales PCA scores and QSAR models

training set of 52 Novispirin variants with 60 variables. q 2 loo =0.40 ; r2 =0.72 sdep =0.14

Novispirin: From structures to data 131 descriptors: - CPSA descriptors (charged partial surface area) computed with the Tripos force field, dipole moment, energies,surface, weight (33). - VolSurf descriptors defined interaction of molecules with biological membranes which is mediated by surface properties such as shape, electrostatic, hydrogen-bonding and hydrophobicity (94). - theoretical descriptors according of the hydrophobicity of each amino acid defined by Engleman-Steitz (4).

48 variables selected with a fractional factorial design (FFD). q 2 loo =0.64 ; r2 =0.79 sdep =0.11

Single mutation Novispirin prediction with a QSAR model. Each position of the Novispirin sequence is mutated with the 20 naturally amino acids. The residual activity of peptide variants is defined according of the wild type activity. Then, all point superior to 0 represent a peptide mutant with a better predict activity compared to the original novispirin.

Taboureau Methods Mol. Biol. 2010 Taboureau et al. Chem.Biol.Drug.Des. 2006. Raventos et al. CCHTS. 2005 Mygind et al. Nature. 2005 position 4

Pharmacophore Definition A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure.

Pharmacophore Pharmacophore: chemical features The chemical features can be hydrogen bonds acceptors, hydrogen bond donors, charge interactions, hydrophobic areas, aromatic rings, positive or negative ionizable group.) The shape or volume is also considered. Hyd Acc Acc Aro Acc & Don Start to be complex!!! Pharmacophores represent chemical functions, valid not only for the curretly bound, but also unknown molecules. The steric hindrance may explain lack of activity.

Pharmacophore Example 1 Atom is acceptor if it s a nitrogen, oxygen or sulfur and not an amide nitrogen, aniline nitrogen and sulfonyl sulfur and nitro group nitrogen Acceptor Donor Acceptor Aro ring center

Pharmacophore Example 2 with 3 inhibitors Agonist at D2 receptor Dopamine (2 rotations and 2 OH groups) Apomorphine (no rotations) 5-OH DPAT (one OH group and many rotation)

Pharmacophore Example 2 Active agonists define important groups: -Aromatic ring -meta OH group -N atom, righ distance from aromatic ring -other molecular scaffolding does NOT get in the way at the receptor

Pharmacophore

Are features of the site unique to herg?

Structural based design: Docking

Docking

Structural based design: Docking Comparison of some of the docking tools. Kellenberger E. et al. Proteins 2004, 57(2): 225-242

Structural based design: Docking Induced fit docking Substrate (ligand) + Enzyme Substrate (receptor) + (ligand) Enzyme (receptor) Lock and Key Induced Fit

Structural based design: an example with antidepressant Zhou, Sciences 2007

Chemogenomics and Pharmacogenomics Chemogenomics: Studied the biological effect of a wide array of small molecules on a wide array of macromolecular targets. Pharmacogenomics: Design drugs according of the genetic variation.

Chemical network: example with neurotransmitter transporters

Chemogenomics and Pharmacogenomics: an example with citalopram A441G CIT: 2.5x gain of function CIT with ears : 2x gain of function A441G V343S V343N V343S CIT: 4.5x gain of function Des-CN: 4.9x gain of function [Des-F: 7.4x gain of function] V343N CIT: 3.5x gain of function Des-CN: 35.5x gain of function [Des-F: 3.8x gain of function] A173S A173S CIT: 4.5x gain of function Des-F: 16.5x gain of function S438T S438T CIT: 175x loss of function Monomethyl: 3.5x loss of function

Conclusion According of the information you have, different strategies can be used. If you can develop different strategies which come to the same conclusion, that will reinforce your hypothesis. The most information (experimental) you have, the better your validation will be.

Time for a break!!!