Building 3D models of proteins

Similar documents
Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein Structure Prediction

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Protein Structure Prediction, Engineering & Design CHEM 430

Template Free Protein Structure Modeling Jianlin Cheng, PhD

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Week 10: Homology Modelling (II) - HHpred

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

CAP 5510 Lecture 3 Protein Structures

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

CS612 - Algorithms in Bioinformatics

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Protein Structures. 11/19/2002 Lecture 24 1

Modeling for 3D structure prediction

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

ALL LECTURES IN SB Introduction

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Template-Based Modeling of Protein Structure

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Basics of protein structure

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Bioinformatics. Macromolecular structure

Protein Structure Determination

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

FlexPepDock In a nutshell

Course Notes: Topics in Computational. Structural Biology.

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

The Structure and Functions of Proteins

Physiochemical Properties of Residues

Protein Structure Prediction and Display

Molecular Mechanics, Dynamics & Docking

Supporting Online Material for

Protein Structure Bioinformatics Introduction

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Protein Structure Prediction 11/11/05

Packing of Secondary Structures

Type II Kinase Inhibitors Show an Unexpected Inhibition Mode against Parkinson s Disease-Linked LRRK2 Mutant G2019S.

Orientational degeneracy in the presence of one alignment tensor.

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Syllabus BINF Computational Biology Core Course

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Bioinformatics: Secondary Structure Prediction

Study of Mining Protein Structural Properties and its Application

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Structural Alignment of Proteins

Protein Structure Prediction

Template-Based 3D Structure Prediction

Mass Spectrometry Coupled Experiments and Protein Structure Modeling Methods

DOCKING TUTORIAL. A. The docking Workflow

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Computational Molecular Biology. Protein Structure and Homology Modeling

Scoring functions. Talk Overview. Eran Eyal. Scoring functions what and why

Docking. GBCB 5874: Problem Solving in GBCB

Getting To Know Your Protein

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Protein Structure Prediction and Protein-Ligand Docking

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Large-Scale Genomic Surveys

Sequence analysis and comparison

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Ab-initio protein structure prediction

Bio nformatics. Lecture 23. Saad Mneimneh

Structure of proteins modeling and drug design

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

CSCE555 Bioinformatics. Protein Function Annotation

Computational protein design

Improving De novo Protein Structure Prediction using Contact Maps Information

Protein structure alignments

Bioinformatics: Secondary Structure Prediction

Protein Structure Prediction

Transcription:

Building 3D models of proteins

Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier to guess the location of active sites With a structure we can plan more precise experiments in the lab We can apply docking algorithms to the structures (both with other proteins and with small molecules)

Basic princples for structural modeling Use any piece of information the you have from the existing data bases regarding the protein you wish to model and its family Choose the most suitable algorithm according to the amount if information that you have

Building by homology (Homology modelling) - - - G - - - Y - - - M A A A A K S T A A G G G Y F F Y L E D A V V V V L V I L S E D S Alignment with proteins of known structure structural model

Fold recognition (Threading) The sequence: M A A G Y A V L S + Known protein folds structural model

Ab initio The sequence M A A G Y A V L S structural model

Building by homology There are hundreds of thousands of protein sequences but only several thousand protein folds For every second protein that we randomly pick from the structural data base there is close homolog (identity > 30%). This homolog almost always has the same fold. In the current projects for experimental determination of protein structures, priority is given to determine structures of protein without homologs in the structural databases (proteomics) We believe that in several years we will have almost all the basic folds

Steps of buiding by homology Look for homology with the given sequence against the structural databases. Known algorithms such as Blast and FastA can be used. If no hits were obtained, it is possible to use multiple alignment of the family for the search. This might be more sensitive. Construct accurate alignment between the query seqence and all the hits that will be serve as the template during the building Correct alignment is crucial for this step. Any mistake can lead to significant errors in the final model

A possible algorithm is for example ClustalW. Manual intervention is often required, especially for weak homology. More sequences within the protein family increase the chance for correct alignment. Therefore, it is sometimes recommended to also search sequence databases (such as swissprot) and not only structural databases.

Find proteins with known structure which are similar to your sequence build alignment Build structural model Check the model Finish

Building the model itself Determine the secondary structures according to the alignment Determine structural reference according to the coordinates of the known structures. For every conserved region we can take the coordinates of the most similar homolog in that particular region.

The main methods to determine structural reference during homology modeling are: Fragment based homology Build the conserved regions (usually secondary structures) and then build the loops Distance constraints Building in one step the entire model based on distance constraints derived from the known structures and from the chemical nature of the molecule.

Construction of loops might be done by: Using database of loops which appear in known structures. The loops could be catagorised by their length or sequence Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them.

Several web pages for homology modeling COMPOSER felix.bioccam.ac.uksoft-base.html MODELLER guitar.rockefeller.edu/modeller/modeller.html WHAT IF www.sander.embl-heidelberg.de/whatif/ SWISS-MODEL www.expasy.ch/swiss-model.html

Swiss-Model http://www.expasy.ch/swissmod/swiss-model.html

Modeller http://guitar.rockefeller.edu/modeller/about_modeller.shtml Advanced program for homology modeling Based on distance constraints Implemented in several popular modelling packages such as InsightII The source is available for unix platforms at the above URL

Threading (fold recognition) The input sequence is threaded on different folds from library of known folds Using scoring functions we get a score for the compatability between the sequence and the structure Statisticaly significant score tells that the input protein adopts similar 3D structure to that of the examined fold

This method is less accurate but could be applied for more cases When the real fold of the input sequence is not represented in the structural database we can not get correct solution by this method The most important part is the accuracy of the scoring function. The scoring function is the major difference between different programs for fold recognition

Input: sequence H bond donor H bond acceptor Glycin Hydrophobic Library of folds of known proteins

H bond donor H bond acceptor Glycin Hydrophobic S=-2 Z= -1 S=5 Z=1.5 S=20 Z=5

Scoring functions for fold recognition There are 2 basic methods to evaluate sequence-structure (1D-3D) compatibility In methods based on structural profile, for every fold a profile is built based on structural features of the fold and compatibility of every amino acid to the features. The structural features of each position are determined based on the combination of secondary structure, solvent accessibility and the property of the local environment (hydrophobic/hydrophlic) The profile is a defined mathematical structure, adjusted for pair-wise comparisons and dynamic programming

Amino acid type A C D Y G op G ext Position on sequence 1 2 : 10-24 : -50 87 : 101-99 : : -80 167 : 100 100 : 10 10 : N 100 10

Contact potentials This method is based on predefined tables which include pseudo-energetic scores to each pair-wise interaction of two amino acids. This method makes use of distance matrix for representation of different folds For each pair of amino acids which are close in space the interaction energy is summed. The total sum is the indication for the fitness of the sequence into that structure

Amino acid index 1 N Amino acid index 1 N

Web sites for fold recognition Profiles: 3D-PSSM - http://www.bmm.icnet.uk/~3dpssm Libra I - http://www.ddbj.nig.ac.jp/htmls/e-mail/libra/libra_i.html UCLA DOE - http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html Contact potentials 123D - http://www-immb.ncifcrf.gov/~nicka/123d.html Profit - http://lore.came.sbg.ac.at/home.html

Ab initio methods for modelling This field is of great theoretical interest but, so far, of very little practical applications. Here there is no use of sequence alignments and no direct use of known structures The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts If we will have perfect function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold

Algorithms for Ab initio prediction include: A. Searching procedure that scans many possible structures (conformations) B. Scoring function to evaluate and rank the structures Due to the large search space, heuristic methods are usually applied The parameters in the searching procedure are the dihedral angles which specify the exact fold of the polypeptide chain

A B C D E

A B C E D

Methods to evaluate structures are based on Force fields- collection of terms that simulate the forces act between atoms Terms based on probabilities to find pairs of amino acids or atoms within specific distances Terms based on surface area and overlapping volume of spheres representing atoms

Side chain construction In homology modeling, construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer. In spite of the huge size of the problem (because each side chain influences its neighbors) there are quite successful algorithms to this problem.

p Conformation - a given set of dihedral angle which defines a structure. Asn Rotamer - energetically favourable conformation. Phe

Example to library of rotamers SER 59.6 41.0 SER -62.5 26.4 SER 179.6 32.6 TYR 63.6 90.5 21.0 TYR 68.5-89.6 16.4 TYR 170.7 97.8 13.3 TYR -175.0-100.7 20.0 TYR -60.1 96.6 10.0 TYR -63.0-101.6 19.3

Model evaluation After the model is built we can check it by various methods. If the model turns out to be bad, it is necessary to repeat several stages of the model building The main approaches for model evaluation are: A. Use of internal information (such as the one that used for the model construction) B. Use of external information derived from the databases

Usually algorithms are checked by building models for proteins which have already solved structure and comparison between the model and the native structure It is always possible that information from the native structure will be used in direct or indirect ways for model building A more objective test is prediction of structures before they are publicly distributed (this is the idea of the CASP competitions)