Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Similar documents
Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure Prediction

Protein Structures: Experiments and Modeling. Patrice Koehl

Identification of correct regions in protein models using structural, alignment, and consensus information

Modeling for 3D structure prediction

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Basics of protein structure

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Building 3D models of proteins

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Physiochemical Properties of Residues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Protein Structure Prediction, Engineering & Design CHEM 430

Analysis and Prediction of Protein Structure (I)

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Protein structure analysis. Risto Laakso 10th January 2005

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Useful background reading

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Molecular Modeling lecture 2

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

CAP 5510 Lecture 3 Protein Structures

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Sequence analysis and comparison

Protein Structure Prediction and Display

Summary of Experimental Protein Structure Determination. Key Elements

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Protein Structure Determination

Packing of Secondary Structures

CS612 - Algorithms in Bioinformatics

Protein Structures. 11/19/2002 Lecture 24 1

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Introduction to" Protein Structure

Template Based Protein Structure Modeling Jianlin Cheng, PhD

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Week 10: Homology Modelling (II) - HHpred

Protein Secondary Structure Prediction

IT og Sundhed 2010/11

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein structure alignments

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Get familiar with PDBsum and the PDB Extract atomic coordinates from protein data files Compute bond angles and dihedral angles

Orientational degeneracy in the presence of one alignment tensor.

Properties of amino acids in proteins

SUPPLEMENTARY MATERIALS

Computational Molecular Biology. Protein Structure and Homology Modeling

Model Mélange. Physical Models of Peptides and Proteins

Contact map guided ab initio structure prediction

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein quality assessment

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction

The Structure and Functions of Proteins

Bioinformatics Practical for Biochemists

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Supersecondary Structures (structural motifs)

Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Protein Structure Bioinformatics Introduction

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

DATE A DAtabase of TIM Barrel Enzymes

Secondary and sidechain structures

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Structural Alignment of Proteins

Report of protein analysis

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Prediction and refinement of NMR structures from sparse experimental data

Protein Structure Prediction

HSQC spectra for three proteins

NMR, X-ray Diffraction, Protein Structure, and RasMol

Bioinformatics. Macromolecular structure

SUPPLEMENTARY INFORMATION

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Details of Protein Structure

Transcription:

Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO!

Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality Procheck, Whatif, ProsaII, Verify3d Prediction of protein model accuracy ProQ server

Why is it so important Reliable fold recognition P-value, E-value, Z-score Tells you if you should believe in the fold!! Alignment (model construction) No obvious method to estimate reliability of alignment Number of gaps, length of gaps Amino acids in protein core and loops % id is too conservative Many low homology models are accurate, and some high homology model are wrong Correct fold, wrong alignment => Terrible model How to gain confidence in a protein model?

Model accuracy. Swiss-model. 1200 models sharing 25-95% sequence identity with the submitted sequences (www.expasy.ch/swissmod)

What is protein model accuracy Model quality (correctness) Does the model look like a protein? Hydrophobic residues in core, hydrophilic on surface Backbone geometry (phi/psi angles, bond-length) Amino acid environment A correct model can be completely wrong Accuracy (if we know the answer) RMSD Fraction of correct modeled residues

Model accuracy Rmsd = sqrt(1/n S (d ij ) 2 ) Fraction correct = N c /N Nc = number correct Blue model Yellow structure d ij

Evaluation of model quality Check for proper protein stereochemistry ProCheck (http://biotech.ebi.ac.uk:8400/cgi-bin/sendquery) Ramachandran plot, bond-length, Whatif (http://www.cmbi.kun.nl/gv/servers/wiwwwi) Packing quality Both web-servers Fitness of sequence to structure ProsaII (http://lore.came.sbg.ac.at/services/prosa.html) Program runs on Linux and Unix Verify3D (http://www.doe-mbi.ucla.edu/services/verify_3d/) Web-server

Amino acid environment 1.000.000 of different protein sequences 10.000 different solved protein structures 600 different protein folds Typical amino acid environment Sequence space large Structure space small

CaNCCa y, f = -60 degrees b strand Dihedral angles y, f y, f = 180 degrees Peptide planes Peptide backbone geometry l l l l a helix From speedy.st-and.ac.uk/.../lectures/ 3014/lecture/dars1.htm

Ramachandran plot B. Beta strand A. Right handed helix L. Left handed helix Color coding White. Disallowed Red. Most favorable Yellow. Allowed region Glycine triangles A B L

Wrong structure 1RIP Ribosomal protein. NMR structure in PDB database 17- Aug, 1993

Procheck. Bond length

What-if. Fine packing Quality Statistical description of local chemical environment in high quality protein structures Superimpose tryptophans and find average local environment. Same for other amino acids Full atom model G. Vriend and C. Sander, 1992

Example. Casp Model T0133 T0133 Casp5 target Modeled by X3M (Lund, O., 2002) RMSD=7.3

Casp Model - Fine packing quality ---Residue----- State AllAll BB-BB BB-SC SC-BB SC-SC ------------------------------------------------------------------------- 1 ILE ( 33 ) 2-0.737-0.462 0.331-1.312-0.865 2 SER ( 34 ) 2-0.241 0.209-0.021-1.437-1.421.. 245 ALA ( 296 ) 2-1.919-1.770-1.264 0.000 0.000 246 GLU ( 297 ) 3-1.384-0.641-1.400 0.070-1.132 247 HIS ( 298 ) 3-1.476-1.211-1.736-0.874-1.427 ============================================================ All contacts : Average = -0.459 Z-score = -3.05 BB-BB contacts : Average = -0.155 Z-score = -1.14 BB-SC contacts : Average = -0.445 Z-score = -2.94 SC-BB contacts : Average = -0.221 Z-score = -1.39 SC-SC contacts : Average = -0.701 Z-score = -4.10 ============================================================ Average protein values ("Z-score for all contacts") can be read as follows: -5.0 Guaranteed wrong structure. Bad structure or poor model -3.0 Probably bad structure or unrefined model. Doubtful structure or model -2.0 Structure OK or good model. Good structures 0.0 Good structures. 2.0 Good structures. Unusually Good structures 4.0 Probably a strange model of a perfect helix Bad model

T0133 structure - Fine packing quality ---Residue----- State AllAll BB-BB BB-SC SC-BB SC-SC ------------------------------------------------------------------------- 18 ILE ( 33 ) A 2 0.781 1.018-0.116 0.661-0.291 19 SER ( 34 ) A 2 1.435 1.467 0.077 2.284 0.134.. 281 ALA ( 296 ) A 2-2.272-2.504-0.404 0.000 0.000 282 GLU ( 297 ) A 2-0.778-1.601-1.256 0.137 1.471 283 HIS ( 298 ) A 3-0.836-0.801-0.948-1.094 0.351 ============================================================ All contacts : Average = 0.001 Z-score = -0.04 BB-BB contacts : Average = -0.040 Z-score = -0.40 BB-SC contacts : Average = 0.139 Z-score = 0.90 SC-BB contacts : Average = -0.196 Z-score = -1.23 SC-SC contacts : Average = -0.024 Z-score = 0.02 ============================================================ Average protein values ("Z-score for all contacts") can be read as follows: -5.0 Guaranteed wrong structure. Bad structure or poor model -3.0 Probably bad structure or unrefined model. Doubtful structure or model -2.0 Structure OK or good model. Good structures 0.0 Good structures. 2.0 Good structures. Unusually Good structures 4.0 Probably a strange model of a perfect helix Good model

ProsaII (Potential of Mean Force) Likelihood of amino acid packing Exposure potential for D Method developed by Manfred Sippl., 1993 Works for Ca-models For high quality protein structure estimate nearest neighbor counts for all aa E = -log(p(n a)/p(n)) Hydrophobic residues tend to have many neighbors (buried) Hydrophilic residues tend to have fewer N (exposed) Sippl, J.M. (1990) J. Mol. Biol. 213,859-883 (1990).

ProsaII (Potential of Mean Force) Likelihood of amino acid packing Pair potential for D, E. s=3 E = - log(p(r abs)/p(r s)) s a b r Sippl, J.M. (1990) J. Mol. Biol. 213,859-883 (1990).

Verify 3D (Eisenberg et al. 1997) Closely related to ProsaII exposure potential. How well does aa fit its local environment (hydrophobic/hydrophilic) T0133 Casp5 target Modeled by X3M (Lund, O., 2002) RMSD=7.3 Red: Crystal structure, Blue: Model

Sequence has poor match to structure Model T0133. Verify 3D

ProQ. Prediction of Model accuracy Neural network to identify correct protein models. B. Wallner and Arne Elofsson, 2003 http://www.sbc.su.se/~bjorn/proq Input, a pdb structure/model Output, accuracy measure LGscore Maxsub score

ProQ Input to neural net Atom-atom contacts C, N, O How often is C in contact with N? Residue-residue contacts How ofter is E in contact with D? Solvent accessibility surface Average exposure of L s Secondary structure prediction How consistent is prediction with model?

Casp model T0113

Structure 1RIP

LifeBench data 11000 Models 220 targets Modeled by Pcons Incorrect model Lgscore <1.5 Maxsub < 0.1

Conclusions Correct protein models cannot reliably be identified!! Protein fold on the other hand can! Many methods from the protein crystallography world are useful to identify wrong models Bad models can pass all filters ProQ is a first attempt of an accurary prediction server Can integrate information from many sources Future will show if this approach can provide reliable prediction of model accuracy