Introduction to Comparative Protein Modeling. Chapter 4 Part I

Similar documents
Physiochemical Properties of Residues

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

CAP 5510 Lecture 3 Protein Structures

Properties of amino acids in proteins

Section Week 3. Junaid Malek, M.D.

Peptides And Proteins

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Secondary and sidechain structures

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Supersecondary Structures (structural motifs)

Basics of protein structure

Protein Structure Bioinformatics Introduction

B O C 4 H 2 O O. NOTE: The reaction proceeds with a carbonium ion stabilized on the C 1 of sugar A.

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Protein Structures: Experiments and Modeling. Patrice Koehl

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Central Dogma. modifications genome transcriptome proteome

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Details of Protein Structure

CS612 - Algorithms in Bioinformatics

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Modeling for 3D structure prediction

Introduction to" Protein Structure

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

1. Amino Acids and Peptides Structures and Properties

Model Mélange. Physical Models of Peptides and Proteins

Packing of Secondary Structures

Problem Set 1

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Building 3D models of proteins

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Sequence analysis and comparison

ALL LECTURES IN SB Introduction

Protein Structure Basics

Overview. The peptide bond. Page 1

Solutions In each case, the chirality center has the R configuration

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

The Structure and Functions of Proteins

Protein Struktur (optional, flexible)

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Conformational Analysis

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Orientational degeneracy in the presence of one alignment tensor.

Protein Structure Prediction, Engineering & Design CHEM 430

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Bulk behaviour. Alanine. FIG. 1. Chemical structure of the RKLPDA peptide. Numbers on the left mark alpha carbons.

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

D Dobbs ISU - BCB 444/544X 1

Computational Protein Design

Biomolecules: lecture 10

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

1. What is an ångstrom unit, and why is it used to describe molecular structures?

Protein structure analysis. Risto Laakso 10th January 2005

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

From Amino Acids to Proteins - in 4 Easy Steps

Protein Structure Prediction

Protein Secondary Structure Prediction

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Week 10: Homology Modelling (II) - HHpred

Analysis and Prediction of Protein Structure (I)

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Resonance assignments in proteins. Christina Redfield

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Useful background reading

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Basic Principles of Protein Structures

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

Molecular Modeling lecture 2

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Translation. A ribosome, mrna, and trna.

Chemistry Chapter 22

Lecture 15: Realities of Genome Assembly Protein Sequencing

Figure 1. Molecules geometries of 5021 and Each neutral group in CHARMM topology was grouped in dash circle.

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Basic structures of proteins

Course Notes: Topics in Computational. Structural Biology.

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Protein Structure Prediction and Display

Protein Structure Prediction

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Transcription:

Introduction to Comparative Protein Modeling Chapter 4 Part I 1

Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature May find 3D structure on the protein of interest When you have a sequence Compare it to other to find similarities or differences Several algorithms have been developed Comparison within a few minutes Differences between a healthy and a diseased individual. 2

Databases Nucleotide and Protein Sequences: EMBL (European Molecular Biology Laboratory) Nucleotide Sequence Database. Universal Protein Resource (UniPort) Database. 3D structural Information: Protein Data Bank, PDB. Search on author's name, journal name or a part of a sequence. 3

PDB file The format of the data files are similar. PDB files are widely used I will describe the standard format of a protein file Header General information about the protein Includes official name, references, resolution of the crystal structure and other useful remarks Atomic coordinates Atoms belonging to standard amino acids are labeled ATOM To distinguish between individual peptide chains ATOMS are separated by TER Non standard amino acids are labeled HETATMS 4

PDB file When the file is read in a modeling program bonds are built between ATOMS but not between HETATMS An additional connectivity table is at the end of the data file Atom type of HETATMS is often incorrect when reading into a modeling program necessary to check all atom types PDB do not include hydrogen atoms. Keep in mind: The resolution of the crystal structure should be least between 2.5 and 1.5 Å As NMR measurements are performed in solution the results are highly dependent on the solvent 5

Protein Structure The 3D structure of proteins is characterized by 4 levels of structural organization: The primary structure represents the linear arrangement of the amino acids in the protein sequence The secondary structure describes the local architecture of the linear fragments of the chain (α-helix, β-sheet). The supersecondary structure is a new level, which describes the association of the secondary elements through the side chain interactions. Also called a motif (hairpins, Greek key..) The tertiary structure shows the overall topology of the folded peptide chain The quaternary structure describes the arrangement of separate subunits in the functional protein 6

Conformational Properties of Proteins 20 different amino acids found in nature Physicochemical properties of their side chains (size, shape, hydrophobicity, charge and hydrogen bonding) span a considerable range They have strongly restricted degrees of freedom The dominant influences on protein conformation is: Hydrogen bonding capabilities Chirality of the amino acids. All 19 chiral amino acids (glycine are not chiral) possess a L-configuration Linear connectivity Steric volume 7

Conformational Properties of Proteins Central carbon, C α Backbone: amino N, C α and the carbonyl C. i is the number of residue, starting from the amino end of the chain Main chain torsion angles: Φ: N-C α Ψ: C α -C ω: C -N (peptide bond) χ: side chain 8

Conformational Properties of Proteins The peptide bon is planar. Nearly always trans configuration (ω = 180 ) which is more energetically favorable than cis (ω = 0 ) The imino Proline is sometimes found in cis 9

Conformational Properties of Proteins Rotation about ϕ and ψ: Makes the peptide chain flexible Constrained geometrically due to steric hindrance between neighboring atoms Conformations of ϕ and ψ Ramachandran plot White: Disallowed region Red: Favored region Yellow: Allowed region Sub-regions: α and β 10

Types of Secondary Structural α-helix: Best known and most easily recognized structure Repetitive structure: C α -atoms in identical relative positions Thus the ϕ and ψ angles are the same for each residue in the helix Repeats itself every 5.4 Å 3.6 amino acids per turn Hydrogen-bonds between carbonyl of residue n and NH of residue n+4 regular and favored state Right-handed helix due to L-amino acids Side chains point outwards Length: 10-15 residues Elements 11

Types of Secondary Structural β-sheet: Second most regular and recognizable structure Periodic element formed from β- strands Hydrogen-bonds are intermolecular therefore are β-sheets less favorable Parallel: All strands run in the same direction Anti-parallel: Run in the opposite direction. Most common Side chains are perpendicular to the plane of hydrogen-bonds Length of stand: 3-10 residues Elements 12

Turns: Types of Secondary Structural Elements 1/3 of all residues of glubular peoteins are involved in turns General function: Reverse the direction of the peptide chain Located on the protein surface charged and polar amino acids Turns often connect anti-parallel β-strands named β-turns or hairpin bends. They often only contain 2 residues Schematic form: Very helpful tool to se and understand the overall structure of a protein Side chains are often omitted to give a more clear picture 13

Homologous Proteins Proteins that have evolved from a common ancestor are said to be homologous. The 3D structure for homologous proteins are more preserved than the identity The structure is crucial for the function Trypsin and α-chymotrypsin belongs to the serine protease family. Has only 44 % identity but they are very homolog Dissimilarities: Loop regions. The core is more preserved Homologous proteins appear to be highly conserved during evolution basis for comparative protein modeling 14

Comparative Protein Modeling Known sequences vs. known 3D structures: Protein sequence determination is much faster than determining the 3D structure from X-ray or NMR theoretical procedures for predicting the 3D structure on the basis of the sequence is needed No general rule for folding of a protein base structural predictions on the conformation of available homologous reference proteins Use Comparative Protein Modeling approach when: A sequence is found homologous to another with a known 3D structure, then this method is used to predict the structure for the unknown protein Also called Homology Modeling Approach 15

Process: Comparative Protein Modeling 1. Determination of proteins which are related to the protein being studied Sequence alignment 2. Identification of structurally conserved regions (SCRs) and structurally variable regions (SVRs) 3. Alignment of the sequence of the unknown protein with those of the reference protein(s) within the SCRs 4. Construction of SCRs of the target protein using coordinates from the template structure(s) 5. Construction of the SVRs 6. Side chain modeling 7. Structural refinement using energy minimization methods and molecular dynamics 16

Sequence Alignment Sequence alignment by data base search: Major methods: FASTA and BLAST Used in many available software: HOMOLOGY, MODELLER Sequence alignment important because: Find related sequences Identify conserved regions Find amino acids of the known reference protein that correspond with those of the protein to be modeled basis for transferring the coordinates of the reference protein to the new protein. Need more sensitive and selective alignment procedures Needleman and Wunsch algorithm (align two sequences) 17

Sequence Alignment Optimal local alignment: Best local identity between two sequences Only consider relatively conserved subsequences Important tool for comparing sequences 18

Sequence Alignment Scoring Scheme: Indicates the weight for substituting one amino acid with another matrices High no: Substitution is likely Low no: Substitution is unlikely Different kind of Matrices: Identity matix: Most simple, gives 1 to identical pairs and 0 to all others Codon substitution matix: Scoring values are derives from codons. Identical amino acids get 9, íf one mutation is required the score is 3 and if two is required the score is 1 Mutation or Dayhoff matrix: Obtained by counting the number of substitutions from one amino acid by others observed in related proteins, across species. Larges scores are given to substitutions that are frequently, and low scores to substitutions which are not observed. 19

Dayhoff matrix: Larger scores for some non-identical mutations than for some identical one Statistic method Sequence Alignment 20

Gabs: Sequence Alignment If there is differences in the sequence length or variations in the location of conserved regions it complicate the alignment gabs are introduced An additional factor is introduced in the alignment algorithms gap penalty function The balance between the number of aligned amino acids and the smallest number of required gaps leads to an optimal alignment Combination of an alignment algorithm, a scoring matrix and a gab function: Optimal alignment of two or more sequences The quality of the alignment is described by an alignment score The derived alignment can only be used as a basis for a protein model if it agrees with all known structural data 21

Problems: Sequence Alignment Sequence similarity is lost more quickly during evolution than the structural similarities This makes it difficult to makes some simple rules Investigations to solve problems: Doolittle et al.: Rules of thumb. Sequences are longer than 100 residues and are found to be more than 25% identical very likely to be related. If the identity is 15-25% the sequences may still be related and if the identity is less than 15% they are probably not related Chothis and Lesk: To have success in modeling the structure of a protein from its sequence, using the 3D structure of a homologous protein as template depends very much on the sequence identity above 50% 22

Determination and Generation of Structurally Conserved Regions (SCRs) When building a model protein using the homology approach, it is based on the fact that there are regions in all proteins that belong to the same family, that are nearly identical in there 3D structures These regions tend to be located at the inner core of the protein SCRs: These regions in strongly related proteins have the same relative orientations of their secondary structural units in space (α-helices and β-sheets) throughout the whole family Used as a natural framework for the atomic coordinates for another protein in the family 23

Determination and Generation of Structurally Conserved Regions (SCRs) Find SCRs within a family: Depends on the number of available crystal structures of related proteins If more than one crystal structure is available superimpose them relative to each other. Done by a least-square fitting method Problem to selection the fitting atoms Fit by the C α -atoms. This method can then be optimized by using only matching points located in the secondary structure The resulting superimposed 3D structures tend to show that large parts of the two proteins are very similar and they appear to be the SCRs, while other regions differ vary much Keep in mind: The definition of SCRs is that a SCR must be terminated at the end of a secondary structural unit, so therefore the secondary structural elements must be assigned for the protein Crystal files Programs like DSSP or STRIDE (based on the H-bonding pattern or the main chain dihedral angle) 24

25 Determination and Generation of Structurally Conserved Regions (SCRs) Find SCRs within a family: If only one crystal structure is available detect SCRs manually using both sequence and structural information of the proteins Residues in the core are more conserved than residues on the surface Amino acids involved in hydrogen bonds and disulfide bridges are most likely to be conserved within the protein family. Also the residues in the active site tend to be conserved If the SCRs of the reference protein are known: Find the regions on the model protein that corresponds to the SCRs Done by an alignment of the target sequence with the sequences of the SCRs No gabs are allowed in the SCRs, so different programs are needed After the alignment the coordinates for the SCRs can be assigned, by use of the coordinates of the reference protein as a basis Segments with identical side chains: All coordinates are used Segments with non identical side chains: Only backbone coordinates

SVRs: Construction of Structurally Variable Occur normally in loop regions Construction of these are more difficult Regions (SVRs) Insertions and deletions make the modeling procedure complicated due to variations in the number of amino acids A good guide for modeling a missing region: Use a segment of similar length in a homologous protein Studies have showed that when loops has the same length and amino acid character, their conformation will be the same The coordinates can then be transferred to the model protein 26

Construction of Structurally Variable Regions (SVRs) If no comparable loops exits in the protein family The coordinates for the SVRs can be retrieved from: Loop search method: A peptide segment which are found in other proteins and fit into the model s spatial environment are used De novo generation: Generation a loop segment de novo. A peptide chain is built between two conserved segments using randomly generated values for all backbone dihedral angles. Rather complex so can only be used when the loop is smaller than seven residues All loops should be refines by an energy minimization in order to remove steric hindrance and relax the loop conformations 27

Side-Chain Modeling Backbone is constructed, the next step is to add the side chains: The predictions of the side chain conformations is a much more complex process Many of the side chains have one or more degrees of freedom can adopt many energetically allowed conformations It has been generally assumed that identical residues in homologous proteins adopt similar conformations Side chain with amino acids that shows high similarity (Isoleucine and Leucine or Valine) are also assumed to adopt the same orientation in the protein Difficult when the substituted amino acids are not related Showed that side chains usually adopt only a small number of the many possible conformations Statistical rotamer libraries Still difficult due to the conformations depends on the local environment 28

Final Model A refinement of the final model is often desirable due to: Regions where SCRs and SVRs are connected often have a lot of steric strain and need to be minimized Several side chains also adopt positions which has a bad van derwaals contact A stepwise refinement is needed, because an approach on all the residues at once will destroy important internal hydrogen bonds 29

Secondary Structure Prediction In the case where a homologous protein does not exist, methods for predictions the secondary structure have been developed 90% of the residues are either in α-helices, β-sheets or reverse turns If these are predicted it seems possible to combine the segments complete protein structure Not as reliable as homology modeling Three different methods: Statistical Stereochemical Neutral network-based 30

Secondary Structure Prediction Statistical method: First to be developed Idea: Many of the 20 amino acids have preferred secondary structures Ala, Arg, Gln, Glu, Met, Leu and Lys: α-helix Cys, Ile, Phe, Thr, Tyr and Val: β-sheets The most simple method is proposed by Chou and Fasman Calculating the probability of which secondary structure an amino acids is in by its frequency in the different structures found in the PDB Limitations: Below 56% accurate in predicting helix, sheets and loops 31

Secondary Structure Prediction Stereochemical: Based on the hydrophobic, hydrophilic and electrostatic properties of the side chains The method of Lim: Takes into account the interactions between side chains separated with up to 3 residues in the sequence, in view of their packing behavior If a sequence have alternating hydrophobic and hydrophilic side chains likely to be found in a β-sheet hydrophilic residues exposed to the solvent and the hydrophobic residues buried in the interior of the protein Neutral-based: Uses neutral networks, which can be trained rules are not need in advance but they are formed by the network itself, based on known facts More than 70% accuracy in the prediction of three classes of secondary structures on the basis of just one known homologous sequence 32

33 H

Fold recognition/threading Methods Use when: The structural similarity is limited to only the part of the structure having a common structural motif, and the rest is completely different First methods: Recognize folds in the absence of sequence similarity. Now: Comparative modeling and threading approaches are done simultaneously Close related to ab initio methods, but are limited to search for conformations of known structures Thus, threading methods fail for any protein that adopts a new fold 34