FlexPepDock In a nutshell

Similar documents
Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction, Engineering & Design CHEM 430

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Assignment 2 Atomic-Level Molecular Modeling

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Supporting Online Material for

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Protein Structure Prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Basics of protein structure

April, The energy functions include:

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Rosetta Density-fitting Tutorial Frank DiMaio, January 2010

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

User Guide for LeDock

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

CS612 - Algorithms in Bioinformatics

Building 3D models of proteins

Why Do Protein Structures Recur?

ALL LECTURES IN SB Introduction

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Molecular Modeling Lecture 11 side chain modeling rotamers rotamer explorer buried cavities.

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Prediction and refinement of NMR structures from sparse experimental data

Protein Structure Prediction

Flexibility and Constraints in GOLD

Template-Based Modeling of Protein Structure

DOCKING TUTORIAL. A. The docking Workflow

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Protein Structures: Experiments and Modeling. Patrice Koehl

Abstract. Introduction

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Protein Structure Prediction

CAP 5510 Lecture 3 Protein Structures

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Structural Bioinformatics (C3210) Molecular Docking

Bioinformatics. Macromolecular structure

Build_model v User Guide

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Introduction to Structure Preparation and Visualization

OpenDiscovery: Automated Docking of Ligands to Proteins and Molecular Simulation

Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

Protein Structure Determination

Life Science Webinar Series

Tutorial: Structural Analysis of a Protein-Protein Complex

The Schrödinger KNIME extensions

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

ICM-Chemist-Pro How-To Guide. Version 3.6-1h Last Updated 12/29/2009

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

From Amino Acids to Proteins - in 4 Easy Steps

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

Supplementary Information

Protein Structure Prediction and Protein-Ligand Docking

Introduction to" Protein Structure

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Pymol Practial Guide

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Bioinformatics: Secondary Structure Prediction

Contact map guided ab initio structure prediction

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Introduction to Computational Structural Biology

NMR, X-ray Diffraction, Protein Structure, and RasMol

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Lysozyme pka example - Software. APBS! >!Examples! >!pka calculations! >! Lysozyme pka example. Background

Kd = koff/kon = [R][L]/[RL]

Getting To Know Your Protein

1 Introduction. command intended for command prompt

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

HOWTO, example workflow and data files. (Version )

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Bioinformatics: Secondary Structure Prediction

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Packing of Secondary Structures

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Review. CS/CME/BioE/Biophys/BMI 279 Nov. 30 and Dec. 5, 2017 Ron Dror

Syllabus BINF Computational Biology Core Course

We used the PSI-BLAST program ( to search the

Ligand Scout Tutorials

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure Prediction

Structure to Function. Molecular Bioinformatics, X3, 2006

LysinebasedTrypsinActSite. A computer application for modeling Chymotrypsin

Modeling for 3D structure prediction

Transcription:

FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step 1 - Preparing the structure for modeling Step 3 - Ab-initio run Step 4 - Selection of models FlexPepBind Step 1 - Preparation of the model Step 3 - Threading the peptide sequence Step 4 - FlexPepDock runs Step 5 - Specificity prediction & analyzing results Glossary FlexPepBind Fit measures Options (aka flags ) Refinement Ab-initio FlexPepdock refinement Raveh B, London N, Schueler-furman O. Sub-angstrom modeling of complexes between flexible peptides and globular proteins. Proteins. 2010;78(9):2029-40. The Refinement protocol is intended for cases where an approximate, coarse-grain model of the interaction is available. The protocol iteratively optimizes the peptide backbone and its rigid-body orientation relative to the receptor protein, in addition to on-the-fly side-chain

optimization. The final result is a high resolution model of the peptide-receptor complex with side chains of binding motifs modeled at nearly atomic accuracy. The protocol is able to account for a considerable diversity of peptide conformations within a given binding site. However, it is not intended for cases where the peptide backbone is unknown (the ab-initio protocol is best optimized for these cases). Here we show a successful example starting from an extended conformation, but the protocol is intended for cases where the peptide s backbone is more or less in the correct position and the protocol brings it to high resolution accuracy. Let s start with a refinement run to model a high resolution structure of the Serine-Threonine kinase Chk1 (bound complex PDB 1nvr, free receptor PDB 2x8d). Our starting structure will be the bound receptor with the extended peptide located in its binding site. We want to use the receptor s side chains in our run: they are an important part of the input starting structure since many receptors undergo only minor changes upon peptide docking. Usually if a native structure is not supplied, the starting structure is used as reference instead. Step 1 The first thing you should do is to modify a file called paths to set the relevant paths necessary for the execution of Rosetta. ROSETTA_DB - the path to Rosetta Database ROSETTA_BIN - the path to the bin directory in the Rosetta installation A first preliminary step in our protocol involves the packing of the side-chains in each monomer to remove internal clashes that are not related to inter-molecular interactions The pre-packing stage guarantees a uniform conformational background in non-interface regions, prior to molecular docking. The main flags in the prepacking step are: -unboundrot - takes extra rotamers from the unbound structure. It is worth noting that although our example uses a native.pdb file as an input for unboundrot, the file is actually the native unbound structure and not the bound receptor. -flexpep_prepack - The prepacking flag To prepack the starting structure, simply run prepack_example. For further details on the flags used in this run you can take a look at the prepack_flags file. The resulting structure is start.ppk.pdb. You may want to open it in PyMOL and compare it to the start.pdb structure, and see the changes that have occurred. This will be the starting structure for the next step. Step 3 - Refinement This is the main part of the protocol. In this step, the peptide backbone and its rigid-body orientation are optimized relative to the receptor protein using the Monte-Carlo with Minimization approach, in addition to on-the-fly side-chain optimization. An optional low-resolution (centroid) pre-optimization may improve performance further. When using the FlexPepDock server, results sent to the user are based on a combination of 100 decoys created by using the low-resolution

pre-optimization and 100 that are created without using it. For this reason, in this run we provide you with two flag files that enable you to create results similar to the ones provided by the server. For this step use these two commands: run_example_nolowresolution run_example_withlowresolution Notice that the only difference between the provided flag files is the flag - flexpepdocking:lowres_preoptimize. Also notice the unique flags such as -flexpep_score_only (rescores the input PDB structures, and outputs elaborate statistics about them in the score file). For a production run it is recommended to produce 200 decoys. In this demonstration only 4 will be created by you, but we provide you a folder with 200 decoys resulting from the runs that have been described here. Step 4 - Selection of models A typical analysis of a Rosetta simulation usually involves plotting RMS vs total score. Attached is a general total_score Vs. all atom RMSD plot and a plot of reweighted_score Vs. all atom RMSD. Here are some additional ways to analyze the results we get: 1. Interface score (I_sc) - calculation of scoring based only on interface residues. 2. Peptide score (pep_sc) - calculation of scoring based on the peptide and the interface residues 3. Reweighted score (reweighted_sc) - Experience has shown that a calculated reweighted score (=I_sc + pep_sc + total_score), works better than the general score12 function in the case of flexible peptides docking onto their receptors. Measure of fit 1. rms(ca,bb,all)_if -RMSD between output model and the native structure, over all peptide interface (heavy/backbone/c-alpha) atoms 2. startrms(all,ca,bb) - RMSD between start and native structures, over all peptide (heavy/backbone/c-alpha) atoms FlexPepdock Ab-initio Raveh B, London N, Zimmerman L, Schueler-furman O. Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS ONE. 2011;6(4):e18934. The Ab-initio protocol extends FlexPepDock by incorporating fragment-based sampling during the sampling phase. For this protocol, no initial information of the peptide backbone is required, only a placement of an arbitrary peptide starting structure of the peptide within the approximate binding pocket. In the preliminary phase of the protocol, a library of trimer, pentamer and nonamer (where available) backbone fragments is generated, using the general fragment picker protocol. At the end of the fragment picking process, the library should contain 500 fragments from each

category of secondary structure type, i.e., a helix, extended beta strand and coiled-coil loop (with a total of 1,500 fragments for a given query peptide)]. Similar to the refinement protocol, the input to the protocol is an initial model of the peptideprotein complex. It is assumed that the receptor backbone is approximately correct, and that the peptide is initially positioned close to the correct binding site, albeit with arbitrary backbone conformation. Most of our studies, as well as this example, are based on results starting from extended peptide backbone conformations superimposed on a randomly selected anchor residue, but the protocol is designed to work for any arbitrary peptide starting conformation Step 1 - Preparing the structure for modeling First, extract the file abinitio.tar.gz to obtain the necessary files for the tutorial. Since the ab-initio protocol doesn t require the approximate structure of the peptide - only the approximate binding site and the sequence - we start with a peptide-receptor complex in which the peptide is extended. Specifically, in this example we use the unbound structure of 1NVR (pdb-id 2x8d) with its original substrate peptide extended (the 5th residue serves as an anchor point). The first thing you should do is to modify a file called paths to set the relevant paths necessary for the execution of Rosetta. ROSETTA_DB - the path to Rosetta Database ROSETTA_BIN - the path to the bin directory in the Rosetta installation The first step in the Ab-initio protocol as we ve previously mentioned, is the generation of fragment libraries. Since this step requires several more applications (such as blastpgp, psipred, etc) that aren t usually installed on conventional laptops, we ll provide you these libraries. If you do want to generate the fragment libraries yourself, from the root dir of the tutorial, run: scripts/prep_abinitio.sh, to automate the process of the fragment creation. Notice that the pre-generated fragment libraries are located in the frag directory. Same as the refinement protocol. To prepack the starting structure, simply run prepack_example. The output of this prepacking step serves as the input to the main Ab-Initio protocol. Step 3 - Ab-initio run A typical ab initio run includes - 1. Fast, low resolution modeling - the peptide is folded and docked over the surface of the receptor protein using a low-resolution representation of the complex. Results in a coarse-grained model of the peptide-protein complex. 2. High-resolution optimization of the coarse-grained models with the Rosetta FlexPepDock refinement protocol you ve already seen. The recommended amount of sampling for an ab-initio protocol is ~50,000 decoys. Indeed, this protocol is computationally intensive and could take several days on a typical cluster. If you still

wish to experience the joy of running the simulation (even for a smaller amount of decoys) you can try and run the run_example script (in the flags file, change the number of structures you would like to generate). The main flags we re using in the ab-initio protocol are: -flexpepdocking:lowres_abinitio - the Ab initio protocol flag -flexpepdocking:pep_refine - include this flag if you want to include a refinement step after modeling in low resolution -flexpepdocking:flexpep_score_only - include additional scoring terms unique to FlexPepDock. Step 4 - Selection of models A typical analysis of a Rosetta simulation usually involves plotting RMS vs total score. In addition to the scoring terms mentioned above, FlexPepDock abinitio includes several more scoring terms; one of them is bestrms_5mer_all, which indicates the RMS of the 5mer fragment that was closest to the native. FlexPepBind London N, Gulla SV, Keating AE, Schueler-furman O. In--silico and in--vitro elucidation of BH3 binding specificity towards Bcl--2. Biochemistry. 2012; Rosetta FlexPepBind is a protocol for the prediction of peptide binding specificity. It evaluates the binding potential of different peptides, based on structural models of the corresponding peptide-receptor complexes. For a given peptide, we thread the desired sequence onto the peptide in the native structure, while keeping the side chains of the receptor fixed. We then use Rosetta FlexPepD ock in full-atom mode (i.e. without centroid pre-optimization; see above) to refine the structure of the complex with the threaded target peptide (All of the peptide s sidechains as well at the receptor s interface side-chains are flexible). We create 100-1000 models and score the sequence (see below) to evaluate the binding ability of the specific peptide sequence. As an example, we will use the interaction between a Bcl-2 protein and a BH3 only peptide (See London N, Gulla S, Keating A, Schueler--Furman O (2012). In silico and in vitro elucidation of BH3 binding specificity towards Bcl--2. Biochemistry, doi:10.1021/bi3003567). The Bcl-2 protein family contains key regulators of programmed cell death, tumorigenesis and cellular responses to anti-cancer therapy. Note that this protocol might be simplified for specific systems. For calculation of Farnesyltransferase specificity for example, minimization of the peptide backbone only instead of a full run of FlexPepDock was also able to distinguish peptide substrates (see London et al. (2011) PLoS CB 7:e1002170 for more details). Step 1 - Preparation of the model The first input of the protocol is a peptide-protein complex. We will use the supplied template for Mcl-1 (pdb: 2pqk) as our example.

Check that the paths you are directed to are : ROSETTA_DB - the path to Rosetta Database ROSETTA_BIN - the path to the bin directory in the Rosetta installation (as described in the ab-initio section). As described before, the second preliminary step in the protocol involves the packing of the side-chains in each monomer to remove internal clashes that are not related to inter-molecular interactions. The pre-packing stage guarantees a uniform conformational background in non-interface regions, prior to molecular docking. The main flags in the prepacking step are: -unboundrot - takes extra rotamers from the receptor structure and biases towards these side chain conformations (note: in this case unbound, the template for side chain conformations, is the receptor structure). -flexpep_prepack - The prepacking flag To prepack the starting structure, simply run prepack_example, also supplied in scripts: $ROSETTA_BIN/FlexPepDocking.linuxgccrelease -s 2pqk.pdb -native 2pqk.pdb -databa se $ROSETTA_DB -ex1 -ex2aro -unboundrot 2pqk.pdb -flexpep_prepack -nstruct 1 The output of this step will be used as input for step 4 in the protocol. An example output can be found in input files - 2pqk.prepacked.pdb. Step 3 - Threading the peptide sequence In this step, the different peptide sequences for which we wish to examine binding affinity are threaded onto the model. The sequence we wish to use for the peptide can be supplied in a text file, and will be threaded unto the structure using design. The supplied script makeresfile_example can be used to create a resfile for any desired sequence. However, for the purpose of this example we will use the supplied resfile resfile.mcl1 that designs a specific sequence - the peptide NOXA. For this step use the design_example, also supplied in scripts: $ROSETTA_BIN/fixbb.linuxgccrelease -database $ROSETTA_DB -s 2pqk.pdb -resfile resfile.mcl1 -nstruct 1 The output of this step will also be used for the next step in the protocol. An example output of this step can be found in input files - 2pqk.designed.pdb. However, the designed peptide should be cut and pasted into the prepacked receptor, which will be acquired by extracting the receptor from the output of step 2. To save time, the prepacked receptor and the designed peptide were extracted and used to create the starting structure for the next step. It can be found in input files under the name 2pqk.start.pdb, and will be used as input for the next step, step 4 - running FlexPepDock. Step 4 - FlexPepDock runs In this step, FlexPepDock is used for the different threaded peptide sequences. For this step use the FlexPepDock_example, also supplied in scripts: $ROSETTA_BIN/ FlexPepDocking.linuxgccrelease -database $ROSETTA_DB -rbmcm -torsionsmcm -ex1 - ex2aro -s 2pqk.start.pdb -native 2pqk.designed.pdb -unboundrot 2pqk.designed.pdb -nstruct 1000 The recommended amount of sampling for a FlexPepBind protocol is anywhere between 100 and 1,000 decoys, depending on the case at hand. In this case, 1,000 decoys were used for thorough sampling. Of course, it is possible to change the number of structures you would like

to generate. Since this run takes a while, an example of top 5 decoys of this run are supplied in the output files, as well as a.pse file showing the top 5 decoys. Note - the top 5 decoys supplied here are the output of the runs made in London et al. (2012). Biochemistry, doi:10.1021/bi3003567. Step 5 - Specificity prediction & analyzing results FlexPepBind can be used for the straight-forward purpose of assessing binding affinities of a set of peptides to a receptor. However, an interesting expansion of this protocol can be assessing binding specificity. For example, the NOXA BH3 peptide binds to Mcl-1 but not Bcl-xL, while the BAD peptide binds to Bcl-xL but not Mcl-1 (see Table 3 in London et al., Biochemistry...). Therefore, the protocol described can be applied on the different receptors with the different peptides, to determine binding specificity in addition to binding affinity. We ve performed it with Mcl-1 and NOXA, and now it can be performed the same way for Bcl-xL and BAD. The template used for Bcl-xL is supplied and named 3io8. The template used for Mcl-1 is, as mentioned, 2pqk. The resfiles used to thread the peptides NOXA and BAD are supplied and named resfile.mcl1 and resfile.bclxl, respectively. It is worth mentioning that every receptor-peptide complex has its own range of energetic scores, and therefore, when comparing two different receptor-peptide complexes, the score threshold that distinguishes binders from non-binders might very likely be different for each complex. Meaning, a specific peptide might get a score x in complex with receptor A, which in the context of that complex indicates it s a binder, but get a better score y in a complex with receptor B, but still be considered as a non-binder in that context - depending on the score ranges of the different complexes A & B. For the relevant thresholds of Bcl-xL and Mcl-1, see Table 2 in London et al. (2012). Biochemistry, doi:10.1021/bi3003567). Below are details about scores by which it is possible to analyze the results: In order to evaluate the binding ability of different peptides to a given receptor, the resulting scores can be compared to the scores obtained for a range of peptides with known binding ability (see Figure 2 in London et al. (2012). Biochemistry, doi:10.1021/bi3003567). The main scoring functions used are detailed in the Glossary below. The scoring function was slightly adjusted for each system. Bcl-2 receptor-bh3 peptide interactions: Two minor adjustments to score12 were made (in order to obtain results as in London et al. (2012). Biochemistry, doi:10.1021/bi3003567, these adjustments need to be included): The penalty for the burial of the carboxyl oxygen atom of the aspartate side chain is increased (the Gfree parameter of the Lazaridis-Karplus solvation potential was modified from -10 to -13.5 for this atom type). This was added to better account for a group of sequences that received poor experimental binding values in the TRAIN set, but good peptide scores. Inspection of the structural models detected a buried Asp that explains the discrepancy. A weak short-range coulombic electrostatic energy term was added with weight 0.5 (added option []). In addition, we assessed each model by its weighted score (see Glossary below).

Farnesylation peptide substrates: Peptide_score_NoRef: Same as peptide score (see Glossary below), less a constant reference energy for each amino acid. This scoring function was found to perform well in our studies of Farnesyltransferase peptide substrate specificity (London et al. 2011. Identification of a novel class of Farnesylation targets by structure-based modeling of binding specificity. PLoS CB 7:e1002170) Glossary Interface score (I_sc) - calculation of scoring based only on interface residues. the energy of pair-wise interactions across the peptide-protein interface. Peptide score (pep_sc) - calculation of the score of the peptide (including internal peptide energy as well as interactions with interface residues) Reweighted score (reweighted_sc) - Experience has shown that a calculated reweighted score (=I_sc + pep_sc + total_score), works better than the general score12 function in the case of flexible peptides docking onto their receptors. Fit measures rms(ca,bb,all)_if -RMSD between output model and the native structure, over all peptide interface (heavy/backbone/c-alpha) atoms startrms(all,ca,bb) - RMSD between start and native structures, over all peptide (heavy/backbone/c-alpha) atoms Options (aka flags ) Refinement -flexpepdocking:lowres_preoptimize - low-resolution pre-optimization. -flexpep_score_only - rescores the input PDB structures, and outputs elaborate statistics about them in the score file). Ab-initio -flexpepdocking:lowres_abinitio - the Ab initio protocol flag -flexpepdocking:pep_refine - include this flag if you want to include a refinement step after modeling in low resolution -flexpepdocking:flexpep_score_only - include additional scoring terms unique to FlexPepDock.