Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

The road to new drugs. How to find new hits? High Throughput Screening (HTS) Virtual Screening (VS) Integration HTS and VS Molecular docking. Considering protein flexibility. Structure-based drug design in practice: Influenza case study.

WHAT DO WE WANT? The goal of a drug is to modulate the function of its target receptor which result in a pharmacological effect in the human body. HOW DO WE GET THERE? Developing a new drug: is extremely difficult. takes a lot of time (> 10 years). is very expensive (~ 1 billion $). target compound pharmacological effect

CN F O N H S Target Discovery Lead Discovery Lead Optimization Pre-Clinical Development Clinical Development Registration Marketing & Sales RESEARCH DEVELOPMENT

CN F O N H S Bioinformatics Computer Aided Drug Design Target Discovery Lead Discovery Lead Optimization Target Validation In Vitro In Vivo Target Identification Assay Development Hit / Lead Candidate Development Candidate

Computer-Aided Drug Design (CADD) refers to the application of informatics methods within rational drug design, to discover, design and optimize biologically active compounds.

Target Discovery Target gene Ligands unknown TWO SCENARIOS Ligands unknown Target protein structure unknown high-throughput screening Target protein structure known

HTS is the most important source of new hits. Compound collection Pharmaceutical companies have screening libraries up to a few million compounds. Chemical space of drug-like molecules is > 10 80. Building a good screening collection is crucial! Active compounds

In-house collection Available to buy Screening set? Everything Diverse selection Focused selection

DIVERSE SELECTION FOCUSED SELECTION Diverse selection: identify dissimilar compounds Focused selection: Identify similar compounds

Chemical Descriptor 1 DIVERSE SELECTION Chemical Descriptor 2 HOW CAN WE MEASURE SIMILARITY?

In the selection of screening compounds the Rule of Five is often used. It summarizes typical properties of known drugs. These rules are often used as a first filtering step. In 1997 Chris Lipinski observed for many drugs: molecular weight < 500 lipophilicity (LogP) < 5 H-bond donors < 5 H-bond acceptors < 10 rotatable bonds < 10

Chemical Descriptor 1 Chemical Descriptor 1 DIVERSE SELECTION FOCUSED SELECTION Chemical Descriptor 2 Chemical Descriptor 2

Sampling around known active sub-structures or structural fragments can improve the quality of the library. dopamine derivative

Chemical Descriptor 1 The use of chemical (or molecular) descriptors is based on the similar property principle. Molecules with similar structures and similar properties should also exhibit similar activity. Chemical Descriptor 2

Fingerprints consist of various descriptors encoded into bit strings. These descriptors can be fragments or the presence or absence of other properties. dopamine derivative

HOW CAN WE MEASURE SIMILARITY?

Tanimoto coefficient S AB x A x x AB B x AB x A = 8 x B = 6 x AB = 5 Note: this is just one of many different similarity measures! S AB 5 8 6 5 5 9 0.56

BEB HIV Protease inhibitor Tanimoto Similarity 0.60 PZQ NOT HIV Prot. inhibitor Tanimoto Similarity 0.49 VAC HIV Protease inhibitor MK1 HIV Protease inhibitor Tanimoto Similarity 0.63 A high Tanimoto Similarity can be useful for prioritization. However, no guarantees! TI3 NOT HIV. inhibitor TS Tanimoto Similarity 0.48 XN1 HIV Protease inhibitor Tanimoto Similarity 0.63 BEH HIV Protease inhibitor Tanimoto Similarity 0.60 1IN HIV Protease inhibitor Tanimoto Similarity 0.47

O 2 N CF 3 H N O OH Br O 2 N CF 3 O N H OH Br flutamide retro-flutamide flutamide retro-flutamide progesterone receptor 4 nm 6 nm glucocorticoid receptor 25 nm 38 nm androgen receptor 0.5 nm 55 nm

Despite being the major source of new hits, HTS has its drawbacks: It s expensive. In practice only accessible to industry. Logistical errors. e.g. frequent hitters Measurement errors. e.g. suboptimal readout Strategic errors. e.g. assay variability Compound collection Active compounds

Target Discovery Target gene Ligand(s) unknown TWO SCENARIOS Ligand(s) unknown Target protein structure unknown Target protein structure known high-throughput screening virtual screening

In Virtual Screening (VS) compounds are selected using computer programs to predict receptor binding. Compound database VS is much cheaper and is able to process much more compounds in less time. Experimental validation is however always required! Active compounds

A few success stories from virtual screening

STRUCTURE-BASED VS Predict the orientation (and affinity) of a small molecule binding to a protein target. Requires the availability of a 3D target structure! Structure-based virtual screening Compound database Compounds to purchase. Compounds from in-house library. Virtual compounds. Active compounds

Target protein Compound database Docking program Docking program Target-Compound complexes Active compounds

Despite its advantages VS also has its drawbacks: Experimental validation is always required. Protein structure errors. e.g. induced fit Sampling errors. e.g. faulty poses due to solvent Scoring errors. e.g. false positives / negatives Compound database Active compounds

Screening library Focused and sequential screening Virtual Screening High Throughput Screening Focused library Hits Hypothesis generation

Screening library Parallel and independent screening Virtual Screening VS hits Analysis High Throughput Screening Hits HTS hits

The docking problem involves many degrees freedom: Translational. Rotational. Configurational (Ligand + Receptor!) Since the early eighties several docking algorithms have been devised. Target protein Docking program Compound These can be characterized by the number of degrees freedom that they ignore. Target-Compound complex

Ligand rotations Ligand translations Ligand flexibility Receptor flexibility Rigid body docking Flexible ligand docking Induced fit docking Fully flexible docking

A number of flexible ligand docking programs: Dock [Kuntz et al, J Mol Biol, 161:269-288, 1982] Autodock [Morris et al, J Comput Chem, 19:1639-1662, 1994] FlexX [Rarey et al, J Mol Biol, 261:470-489, 1996] Gold [Jones et al, J Mol Biol, 267:727-748, 1997] Glide [Friesner et al, J Med Chem, 47:1739-1749, 2004]

Molecular docking typically consists of two separate stages: Target protein Compound 1. Exploration of conformational and configurational space. Sampling Docking program 2. Evaluation of the strength of the receptor-ligand interaction. Scoring Target-Compound complex

Prior to ligand placement, most docking programs will create a simplified description of the target binding site. Receptor This is typically done using simple geometry descriptors, like spheres or triangles. These geometrical descriptors are usually combined with chemical and electrostatic descriptors to guide ligand placement. Ligand

Example 1 Example 2

Docking programs generate a large number of different docking poses. In general one can distinguish two different scenarios: 1. Many different poses of the same ligand need to be ranked for accuracy. 2. Different poses of different ligands need to be ranked based on their receptor affinity. 1 2 3 4 5 The ideal scoring function works well in both cases... 1 2 3 4 5

First principles scoring functions generally use a Molecular Mechanics force field. Such force fields typically contain intra-molecular terms: Bond lengths Bond angles Dihedral terms And inter-molecular terms: Van der Waals contacts (non-polar) Electrostatic interactions (polar) E bind = E intra + E nonpolar + E polar

Empirical scoring functions have been developed to score ligands very rapidly. ΔG bind = ΔG 0 + ΔG polar Σ f(complex) + ΔG non-polar Σ f(complex) + ΔG rot N rotatable-bonds ΔG 0, ΔG polar, ΔG non-polar, and ΔG rot empirically parameterized weights. are f(complex) is a penalty function aimed at penalizing any unfavorable interaction geometries.

In practice molecular docking is generally used to answer two different types of questions: + docking 1. Which compounds in my compound collection could be active on receptor A? Receptor A Compound collection Actives? 2. How does the complex look that is formed by receptor A and compound B? + docking Receptor A Compound B Complex?

Drug targets are flexible biomolecules and their dynamics play an important role in ligand binding. Insight in receptor flexibility can be valuable when interpreting structure activity relationships (SAR) and optimizing lead compounds. 1 ligand 1 receptor conformation 10 ligands 10 receptor conformations Predicting ligand binding in flexible binding sites is however problematic!

LOCK AND KEY INDUCED FIT Receptor A + X Receptor A + Y Complex A-X Complex A -Y

Introduce flexibility Generate complexes Optimize complexes Compound Target Induced fit complex

flexible residue selection binding site rotamer sampling binding site His/Gln/Asn sampling Single receptor structure Selection is based on structural analyses of: apo structures other holo structures temperature factors Gln Asn His flexible ensemble

The Fleksy approach docks into an ensemble of receptor structures. The approach is based on a united protein description generated from an ensemble of protein structures. In our case the ensemble contains the generated set of side chain rotamers and sampled Asn/Gln/His side chains.

crystal structure apo form? crystal structure holo form identify flexible residues sample Asn/His 2 Asn states 8 His states 15 side chain rotamers

RMSD 0.6 Å

Target Discovery Target gene Ligand(s) unknown known! TWO SCENARIOS Ligand(s) unknown Target protein structure unknown Target protein structure known high-throughput screening virtual screening by docking

SBDD Structure based drug design relies on structural knowledge of the target protein to design and optimize lead compounds. This knowledge is obtained from either experimental structures or computational predictions. A requirement is the availability of receptor structures: NMR spectroscopy X-ray crystallography In practice protein X-ray crystallography is the major source of structural information.

Protein Expression and Purification Crystallisation Data Collection Structure Building Refinement Analysis / Design

A receptor structure can often explain: Binding Specificity Inhibition Flexibility Reaction mechanism And it allows predictions to be made!

1918 Influenza Epidemic Influenza Virus

NEURAMIDASE POCKET SIALIC ACID

RELENZA SIALIC ACID

RELENZA 5.000.000 doses in NL TAMIFLU

Trouw 3 maart 2009

RELENZA SIALIC ACID TAMIFLU WT K i = 1.0 H274Y K i = 1.9 WT K i = 1.0 H274Y K i =265 H274Y H274Y

UMC St. Radboud Jacob de Vlieg Gijs Schaftenaar Software BiosolveIT Accelrys Molecular Networks Schering-Plough Scott Lusher Markus Wagener Ross McGuire Hans Raaijmakers Funding Dutch Organization for Scientific Research (NWO)