Modeling for 3D structure prediction

Similar documents
Introduction to Comparative Protein Modeling. Chapter 4 Part I

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Introduction to" Protein Structure

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Week 10: Homology Modelling (II) - HHpred

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

CAP 5510 Lecture 3 Protein Structures

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Bioinformatics. Macromolecular structure

Molecular Modeling lecture 2

Protein Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Sequence analysis and comparison

Protein Structure Prediction, Engineering & Design CHEM 430

Biomolecules: lecture 10

CS612 - Algorithms in Bioinformatics

Basics of protein structure

Bulk behaviour. Alanine. FIG. 1. Chemical structure of the RKLPDA peptide. Numbers on the left mark alpha carbons.

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Supporting Online Material for

ALL LECTURES IN SB Introduction

NMR, X-ray Diffraction, Protein Structure, and RasMol

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

The protein folding problem consists of two parts:

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Basics

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Building 3D models of proteins

Useful background reading

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein structure analysis. Risto Laakso 10th January 2005

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Orientational degeneracy in the presence of one alignment tensor.

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Visualization of Macromolecular Structures

Secondary and sidechain structures

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Protein Structure Determination

Bi 8 Midterm Review. TAs: Sarah Cohen, Doo Young Lee, Erin Isaza, and Courtney Chen

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Conformational Geometry of Peptides and Proteins:

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Structural Bioinformatics (C3210) Molecular Docking

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Physiochemical Properties of Residues

SUPPLEMENTARY INFORMATION

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Protein Structures. 11/19/2002 Lecture 24 1

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Preparing a PDB File

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

Better Bond Angles in the Protein Data Bank

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Gerd Krause, Structural bioinformatics and protein design Leibniz-Institute of molecular Pharmacology

Central Dogma. modifications genome transcriptome proteome

Prediction and refinement of NMR structures from sparse experimental data

Figure 1. Molecules geometries of 5021 and Each neutral group in CHARMM topology was grouped in dash circle.

RNA and Protein Structure Prediction

Summary of Experimental Protein Structure Determination. Key Elements

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

NMR Assay of Purity and Folding

We used the PSI-BLAST program ( to search the

Details of Protein Structure

Protein Structure Prediction and Protein-Ligand Docking

1) NMR is a method of chemical analysis. (Who uses NMR in this way?) 2) NMR is used as a method for medical imaging. (called MRI )

Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1.1-Kv1.6

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Protein Structure & Motifs

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Supplementary Information

Docking. GBCB 5874: Problem Solving in GBCB

Nature Structural and Molecular Biology: doi: /nsmb.2938

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

BCMP 201 Protein biochemistry

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Template-Based Modeling of Protein Structure

Student Questions and Answers October 8, 2002

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Theory and Applications of Residual Dipolar Couplings in Biomolecular NMR

Transcription:

Modeling for 3D structure prediction

What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing of preexisting information (data mining/prior knowledge) and newly obtained experimental data is also common in obtaining experimental structures. It is mostly the relative weight of experimental versus computer generated information that determines whether the structure is considered «experimental» or «model»

Example: protonation in X-RAY JBC 2011 286 3587

Predicted structures Input needed: amino acid sequence only Which data are going to enter the modelling? What is the expected quality of the structure? How can I identify problems in the structure? How can I improve the structure? What do I need the structure for?

100% Predicted structures

Data entering the modelling Experimental structures of proteins that differ in sequence from the protein of interest Close sequence homology: homology modelling More distant sequence relationship/shared fold with no obvious sequence relationship : threading. Aim at recognizing a related fold. No fold assigned: Folding from sequence alone.

modelling by homology or knowledge-based modelling

Why in silico modelling of protein structure? it is the only way to obtain structural information if experimental techniques fail. insoluble proteins like membrane receptors proteins too large for NMR analysis proteins that could not be crystallized for X-ray diffraction slow increase of number of solved 3D-structures, yet ca. 28000 structures available in the PDB (templates) fast increase of number of protein sequences with no structural informations available

Why it is usefull to know 3D structure of a protein, not only its sequence? biochemical function = interaction with other molecules biological function = consequence of these interactions 3D structure is more informative than the sequence because interactions are determined by amino acids that are close in space yet frequently distant in sequence. Function depends more directly on structure than sequence.

Examples of 3D structure uses identification of active and binding sites (substrates, co-factors, inhibitors, modulators, proteins, nucleic acids..); designing and improving ligands of a given binding site (e.g., structure-based drug design); understanding functional /physical aspects of multimeric assemblies; testing function hypothesis (e.g. site-directed mutagenesis); rational design of proteins with increased stability or novel functions; characterizing new sequences (e.g., detecting remote relationships between proteins); antigenic behavior (prediction of epitopes); characterization of biological networks

Why not all structures experimentally? Many more sequences than structures New structures often «look like» old ones redundancy of structural information in the PDB : less than 3000 unique folds 2 peptidic chains share the same fold whether RMSD < 3.0 Å number of aligned positions 70% total chain lenght

The purpose of the Protein Data Bank is to collect and organize 3D structures of proteins, nucleic acids, protein-nucleic acid complexes and complexes with drug molecules and inhibitors. The PDB is the recognized worldwide repository for 3D structures. After validation of deposited structures, the coordinates are made available to all researchers worldwide and free-of-charge. http://www.pdb.org

PDB : fold number new (blue) & old (red)

Homology modelling is based on 2 observations: 1. The structure of a protein is determined by its amino acid sequence. Knowing the sequence should, at least in theory, suffice to obtain the structure. 2. During evolution, the structure is more stable and changes much slower than the associated sequence: similar sequences adopt practically identical structures distantly related sequences may still fold into similar structures

SSAP score: measure of structural similarity Protein stucture Alignment, WR Taylor and C Orengo, J. Mol. Biol. 208 1-22 (1989)

When the sequence identity is 30% between 2 sequences, and experimental 3D information is available for one of the sequences, it is possible to build in silico a 3D model for the sequence of unknown structure LIMITS OF IN SILICO MODELS accuracy of a structure predicted from its sequence is much lower than the results achieved experimentally: no homology modelling structures in the PDB! theoretical protein 3D structures represent "low-resolution" models, and hence should be used ACCORDINGLY

high sequence similarity results in structure similarity: rmsd of the Cα co-ordinates for protein cores sharing 50% residue identity is expected to be around 1Å two sequences of identical lenght which share more than 30% sequence identity virtually adopt a similar structure. Comparative model building consists of the extrapolation of the structure for a new (target) sequence from the known 3Dstructure of related family members (templates).

homology modelling workflow 1. Template recognition and initial alignment 2. Alignment correction 3. Conserved regions modelling 4. Loops modelling 5. Side-chain modelling MODEL CONSTRUCT ION 6. Model optimization 7. Model validation

Identification of modelling templates query using FastA and BLAST to retrieve from the PDB sequence of target homologous proteins A template must share at least 30% residue identity with the target. whether many template are available best template structure (highest sequence similarity to the target) = reference all structures used to generate structurally correct multiple alignment of the sequences Choose the best structure: the quality of the model could not be better than that of the template

Check the template structure The PDBREPORT database (C. Sander, EMBL) indicates problems in structures for every PDB entry. http://www.cmbi.kun.nl/gv/pdbreport/ Generally X-ray structures better than NMR structures X-ray structure quality attested by the resolution factor Yet also pay attention to temperature (conformation freezed at low temperature, see local flexibility indicated by B-factor) External conditions: ph, ions, ligand,...

Aligning the target sequence with the template sequence Some residues should not be used for model building, for example those located in non-conserved loops Analyzing the alignement Identification of conserved regions no insertion / deletion in conserved regions The quality of alignment determine the quality of the modelled structure!

Muliple Steps in constructing sequence a Multiple alignment Alignment

BAliBASE reference 3: aldehyde dehydrogenase-like * * * * * * * GSVPTG E C GSTKVG GETRTG E C GSTEVG GSVSAG GSRDVG GSRDVG GSRDVG GSTNVF GSTNVF GSTAVF E E C C Uniprot Annotation NAD binding Active site Active site

identity substitution (mutation) PDB template Model target Model insertion deletion PDB PDB Model

Lessons from evolution... Upon evolution mutation are observed rather than insertion or deletion (indel) shorter indels are easier to make than longer ones active site residues are conserved Cysteines involved in disulfide bridges are conserved Core residues are better conserved than those exposed at the protein surface

Lessons from evolution... Upon evolution residues tend to mutate into similar residues (e.g. V <-> I; S <-> T; see scoring matrix like PAM and BLOSUM ) residues mutate more easily to residues encoded by similar codons Some residues cannot be easily mutated because of their unique structural features (e.g., accessible phi-psi values for Glycine, Proline)

Additional informations that can be used to correct sequence alignment... Knowledge about proteolytic cleavage sites metal binding residues ligand binding residues Structure: - structure of homologous protein - predicted secondary structure In any cases, use multiple alignment rather than pairwise alignment

building the model " 1. Conserved regions Conserved regions are built using template structural informations, All methods keep the structure of the template in conserved regions with few adjustments (adjustements are necessary when there are deletions in target with respect to template) 2. Non conserved regions 2a. Backbone of inserted sequences (mostly loops), need to be constructed or refined 2b. Side chain of mutated residues need to be reconstructed/refined. This may involve local changes around the mutation in conserved regions (repacking)

Example : modelling by spatial restraints Spatial restraints molecular mechanics using distance restraints extracted from the structures of one or several targets PDB template MODEL target

Modeller (Andrej Sali)

The 3D structure of ECD1 CRF-R2β. First extracellular domain of a type B1 G protein-coupled receptor MODEL (A) A ribbon diagram of the lowest energy conformer highlighting the β- sheets in cyan and the disulfide bonds in yellow. EXPERIMENTAL MODEL (B) Superposition of 20 conformers representing the 3D NMR structure. Only amino acid residues 44 119 are shown. The bundle is obtained by superimposing the backbone Cα carbons of residues 58 83 and 99 113 EXPERIMENTAL Grace C R R et al. PNAS 2004;101:12836-12841

Molecular dynamics as a tools for structure determination Molecular dynamics as a means for finding structures that satisfy restraints. MD explores many structures, and finds stable ones Energy Minimum Minimum Conformations

Energy surface: classical MM force field E Total (X N ) = E bonded (X N ) + E nonbonded (X N ) Energy surface with restraints X N : atomic positions! E Total (X N ) = E bonded (X N ) + E nonbonded (X N ) + E RESTRAINTS (X N ) X N : atomic positions! Example of restraint: spring between two atoms that are known to be close in space, from some independant observation (artificial bond) i j i and j are NOT chemically linked, but the artificial bond will pull them close together: advantage of computer simulation: the potential surface can be «manipulated».

Example : modelling by spatial restraints «Artificial bonds» will pull atoms close in space, even if there is no real chemical bond between them PDB template MODEL target

Incorporate restraints into potential energy function E Total (X N ) = E bonded (X N ) + E nonbonded (X N ) + " i i E RESTRAINTS (X N ) Then return to the formalism developed for unbiased molecular dynamics simulations F i = m i! a i = m i! dv i dt = m! d 2 r i dt 2 F i =!" i E

MD with restraints To satisfy experimental observations Most noted example is NMR structure determination: Many experimentally derived restraints: the structure is MOSTLY determined by this data Few experimental restraints: the structure is a «mix» of computational and experimental information Restraints from other source than experiment, expl, from a sequence alignment : HOMOLOGY MODELLING Protein A should «look» like protein B, because the sequence of A and B are similar. If 3D structure of B is known, then A can be modelled «by homology» to B.

Restraints obtained from an Nuclear Overhauser effect (NOE) Chemical shifts Hydrogen bonds 3 J scalar couplings... NMR experiment

The 3D structure of ECD1 CRF-R2β. First extracellular domain of a type B1 G protein-coupled receptor MODEL (A) A ribbon diagram of the lowest energy conformer highlighting the β- sheets in cyan and the disulfide bonds in yellow. EXPERIMENTAL MODEL (B) Superposition of 20 conformers representing the 3D NMR structure. Only amino acid residues 44 119 are shown. The bundle is obtained by superimposing the backbone Cα carbons of residues 58 83 and 99 113 EXPERIMENTAL Grace C R R et al. PNAS 2004;101:12836-12841

spatial restraints from sequence alignment template PDKNI SRA target PDKNALTRA restraint = harmonic energy P#A E C"#C" [ ] 2 (t arget) = 1 2 k r P#A C"#C" (t arget) # r P#A C"#C" (template) PDB: template MODEL : target P A

Non conserved parts In modelling by spatial restraints, the missing loops and mutated side chain are incorporated automatically in the structure. Yet, since there is no information on these parts in the PDB template, their position is modelled in a less reliable way. Specialized loop/side chain reconstruction algorithms can be applied afterwards, to improve the model.

Programmes segment matching / rigid body Software SYBYL (tripos) MOE (CCG) Freeware SwissPDBViewer Serveurs WWW Swiss-model(ExpasY) Spatial constraints Freeware MODELLER (accelrys) WWW tools for structure analysis and validation WHATIF PROCHECK

structure evaluation" = validation step geometry analysis: angles length, bonds & torsion values, planarity of aromatic rings, steric clashes pseudo energetical analysis experimental validation : choice of experiments based on structural assumption (e.g., site directed mutagenesis)

General Criteria for protein structure validation local Geometry : bond lenght, angle values,chirality, planarity general quality : Ramachandran plot rotamers (χ angle) bumps h-bonds and salt bridges network buried hyrophobic aromatic stacking

planarity backbone Torsion Angle Omega Trans-& Cis-conformation side chains Ex: ARG good bad

Ramachandran Plot good bad

Reminder Ramanchandran Two degrees of freedom (dihedral angles phi et psi) along the protein backbone)

Phi! Backbone conformations

Psi! Backbone conformations

Ramachandran plot In experimental structures, not all values of phi/psi! are observed.! Some combinations are never seen because they correspond to unstable structures (steric clash - high energy)!

Ramachandran Plot Each point correspond to ONE AA in the structure Good: most of the AA in Allowed (low energy) regions. Bad: many AA in disallowed (high energy) regions

rotamers Bad Good

bumps Steric clashes between 2 atoms of the backbone

H-bond network Ex: Crambine

salt bridge network bad good

quality of the packing bad good

backbone Conformation normal strange!!

pseudo-energy analysis protein folding ΔG < 0 The best model has the lowest free energy ΔG unfolded What is difficult: native native To compute the free energy (NOT THE SAME AS ENERGY) Absolute free energy : very large, yet variation of free energy between 2 folded conformations very small (5-10 kcal / mol )

Empirical energy functions e.g. Bowie, Luthy, Eisenberg Science 253 164 (1991) Definition of 18 classes that describe amino acid properties: Nature 356 83 (1992) accessibility & secondary structure E: solvant exposed α helix P1, P2: partly exposed * β sheet B1, B2, B3 : buried other For the 20 amino acid types: Score s assigned to the18 classes values from statistical analysis of known 3D structures High score good environment e.g., W: B1. α = 1.00, E α = -1.35 K: B1. α = -1.82, E α = 0.13

Empirical energy functions Application to 3D model evaluation For every amino acid i: Side chain buried surface Percentage of the buried surface facing polar atoms (including water) Local secondary structure score: S(model) = Σ s i N N: number of amino acids S1 > S2 s i Problems! Best model 1 N Sequence (i)

Empirical energy functions NOTE that the scale is arbitrary:!!in the e.g. in Bowie, Luthy, Eisenberg energy function, a high score is good!!!in the Modeller DOPE score, which is also an empirical function, although with different parameters, a LOW score is good.!

In practice Databases of already built models ModBase, A. Sali

In practice Databases of already built models Swiss-Model Repository, T. Schwede