Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Similar documents
Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

X-ray Crystallography

Protein Struktur (optional, flexible)

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Figure 1. Molecules geometries of 5021 and Each neutral group in CHARMM topology was grouped in dash circle.

Molecular Modeling lecture 2

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Prediction and refinement of NMR structures from sparse experimental data

Introduction to" Protein Structure

Administration. ndrew Torda April /04/2008 [ 1 ]

CAP 5510 Lecture 3 Protein Structures

Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description. Version Document Published by the wwpdb

NMR, X-ray Diffraction, Protein Structure, and RasMol

Orientational degeneracy in the presence of one alignment tensor.

Lecture 3: Markov chains.

Volume vs. Diameter. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph

Computational Molecular Modeling

Full wwpdb X-ray Structure Validation Report i

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Structure: Data Bases and Classification Ingo Ruczinski

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

Example questions. Z:\summer_10_teaching\bioinfo\Beispiel_frage_bioinformatik.doc [1 / 5]

Summary of Experimental Protein Structure Determination. Key Elements

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Problem Set 1

Enduring Understanding Paper Post-course Evidence Essay

Quantifying sequence similarity

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Computing RMSD and fitting protein structures: how I do it and how others do it

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Basics of protein structure

Pymol Practial Guide

Esser et al. Crystal Structures of R. sphaeroides bc 1

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

fconv Tutorial Part 2

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Achilles: Now I know how powerful computers are going to become!

Please bring the task to your first physics lesson and hand it to the teacher.

PHYSICS 107. Lecture 27 What s Next?

April, The energy functions include:

Omega notation. Transitivity etc.

Graphing Radicals Business 7

EQ: How do I convert between standard form and scientific notation?

Introductory Quantum Chemistry Prof. K. L. Sebastian Department of Inorganic and Physical Chemistry Indian Institute of Science, Bangalore

Potential Energy (hyper)surface

1. Draw a picture or a graph of what you think the growth of the Jactus might look like.

(Refer Slide Time: 00:10)

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Supporting Online Material for

Week 10: Homology Modelling (II) - HHpred

Details of Protein Structure

Lecture 3: Big-O and Big-Θ

Knots, Coloring and Applications

2. Prime and Maximal Ideals

Why do We Trust X-ray Crystallography?

Analysis and Prediction of Protein Structure (I)

Secondary and sidechain structures

This semester. Books

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

Course Notes: Topics in Computational. Structural Biology.

Presenter: She Zhang

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Midterm sample questions

Advanced Quantum Mechanics, Notes based on online course given by Leonard Susskind - Lecture 8

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

ml. ph 7.5 ph 6.5 ph 5.5 ph 4.5. β 2 AR-Gs complex + GDP β 2 AR-Gs complex + GTPγS

MATH 22 FUNCTIONS: ORDER OF GROWTH. Lecture O: 10/21/2003. The old order changeth, yielding place to new. Tennyson, Idylls of the King

Experimental Techniques in Protein Structure Determination

Advanced Partial Differential Equations with Applications

Full wwpdb X-ray Structure Validation Report i

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

Finding Bonds, H-bonds

Computational Modeling of Protein-Ligand Interactions

Name Date Partners. Lab 4 - GAUSS' LAW. On all questions, work together as a group.

Section III - Designing Models for 3D Printing

One-to-one functions and onto functions

Modeling for 3D structure prediction

NMR in Structural Biology

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

1) NMR is a method of chemical analysis. (Who uses NMR in this way?) 2) NMR is used as a method for medical imaging. (called MRI )

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supporting Protocol This protocol describes the construction and the force-field parameters of the non-standard residue for the Ag + -site using CNS

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

CS612 - Algorithms in Bioinformatics

Building small molecules

Experiment 1: The Same or Not The Same?

Protein structure analysis. Risto Laakso 10th January 2005

An analogy from Calculus: limits

SG 4 Elements and Chemical Bonds 5 States of Matter

Transcription:

Protein structures and comparisons ndrew Torda 67.937 Bioinformatik, Mai 2008 Ultimate aim how to find out the most about a protein what you can get from sequence and structure information On the way.. remote similarities between proteins sequence versus structural similarity Detour protein coordinates representation, accuracy measures for similarity of coordinates Later classifications of proteins 19/05/2008 [ 1 ]

Sequence and structure similarity 19/05/2008 [ 2 ] Claim from before if two sequences are similar they are related structures are similar Question if two sequences are different - are their structures different?

Remote similarities 19/05/2008 [ 3 ] 1cbl & 1eca (haemoglobin & erythrocruorin) 14 % sequence id 1fyv & 1udx, TLR receptor and nucleotide binder, 9 % sequence id

No sequence similarity similar structures ost, B.Prot Eng, 12,85 94, 1999 chain length (residues) 19/05/2008 [ 4 ] Are these rare? easy to find 100s of examples does this agree with previous claims? dot in diagram two structures seem different if sequences are similar structures will be similar if sequences are different one does not know % sequence ident random similar

Structure versus sequence similarity 19/05/2008 [ 5 ] Clear statement sequence changes faster than structure Reason? Unclear possibility.. protein function depends on having groups in orientation in space

j9m, 2cdk + aminopyridine Why can sequence change 19/05/2008 [ 6 ] change here residue changes? OK structure changes? Bad a view of molecular evolution

Simple view of molecular evolution mutate continuously mutations which are not lethal may be passed on (fixed) if structure changes protein probably will not function not passed on mutate structure changes? Andrew torda diagram yes no Result evolution will find many sequences compatible with structure compatible with function how else would we see this? 19/05/2008 [ 7 ]

Sequence vs structure evolution 19/05/2008 [ 8 ] Sayings.. Sequence and structure space sequence space is larger many different sequences map to similar structure sequence evolves faster than structure sequences Andrew Torda diagram Truths structures

Practical Consequences 19/05/2008 [ 9 ] Sequences of proteins are nearly always known similar sequence usually similar structure, similar function sequences not (obviously) related maybe similar structure maybe similar function What if structures are known?

Sequence and structure similarity 19/05/2008 [ 10 ] sequence similar different structures similar different frequency always never function similar yes frequency often normal function similar sometimes no summarise from a different point of view

Sequence vs structure similarity When comparing proteins more information is always better (sequence, structure,function) Similar sequences structure and function will be similar remember threshold graphs from earlier Similar structures, different sequences evolutionary relationship implied but bigger evolutionary distance not enough to be confident about function % sequence ident random similar chain length (residues) what do we mean by similar structures? 19/05/2008 [ 11 ]

vhm chain 0 Comparing proteins 19/05/2008 [ 12 ] Representation of proteins comparison classification (later) Representation Proteins are not as smooth as we draw them very discrete set of atoms

Protein coordinate files 19/05/2008 [ 13 ] Important detour - Protein data bank (www.rcsb.org) only significant database of protein coordinates deposition of coordinates often requirement of publication > 50 10 3 structures huge redundancy (> 500 T4 lysozyme) X-ray crystallography 85 % NMR 14 % (more in smaller proteins) File formats standardisation - boring but important all programs agree on a format exchange of information two PDB formats one common flat files..

Protein coordinate files 19/05/2008 [ 14 ] What would you expect? Define the chain direction N to C terminus within each residue order of atoms backbone sidechain going away from backbone unit Å usually no Hydrogens

PDB File ATOM 1 N ARG A 1 26.465 27.452-2.490 1.00 25.18 N ATOM 2 CA ARG A 1 25.497 26.862-1.573 1.00 17.63 C ATOM 3 C ARG A 1 26.193 26.179-0.437 1.00 17.26 C ATOM 4 O ARG A 1 27.270 25.549-0.624 1.00 21.07 O ATOM 5 CB ARG A 1 24.583 25.804-2.239 1.00 23.27 C ATOM 6 CG ARG A 1 25.091 24.375-2.409 1.00 13.42 C ATOM 7 CD ARG A 1 24.019 23.428-2.996 1.00 17.32 C ATOM 8 NE ARG A 1 23.591 24.028-4.287 1.00 17.90 N ATOM 9 CZ ARG A 1 24.299 23.972-5.389 1.00 19.71 C ATOM 10 NH1 ARG A 1 25.432 23.261-5.440 1.00 24.10 N ATOM 11 NH2 ARG A 1 23.721 24.373-6.467 1.00 14.01 N ATOM 12 N PRO A 2 25.667 26.396 0.708 1.00 10.92 N ATOM 38 N CYS A 5 23.095 22.004 2.522 1.00 7.84 N ATOM 39 CA CYS A 5 22.106 21.863 1.467 1.00 9.61 C ATOM 40 C CYS A 5 22.192 20.518 0.830 1.00 10.97 C ATOM 41 O CYS A 5 21.230 20.068 0.167 1.00 9.33 O ATOM 42 CB CYS A 5 22.358 22.904 0.371 1.00 10.97 C ATOM 43 SG CYS A 5 22.145 24.592 0.888 1.00 12.56 S x y z Note coordinates three decimal places often 5 significant digits come back to this in a few Folien 19/05/2008 [ 15 ]

PDB File 19/05/2008 [ 16 ] ATOM 1 N ARG A 1 26.465 27.452-2.490 1.00 25.18 N ATOM 2 CA ARG A 1 25.497 26.862-1.573 1.00 17.63 C ATOM 3 C ARG A 1 26.193 26.179-0.437 1.00 17.26 C ATOM 4 O ARG A 1 27.270 25.549-0.624 1.00 21.07 O ATOM 5 CB ARG A 1 24.583 25.804-2.239 1.00 23.27 C ATOM 6 CG ARG A 1 25.091 24.375-2.409 1.00 13.42 C ATOM 7 CD ARG A 1 24.019 23.428-2.996 1.00 17.32 C ATOM 8 NE ARG A 1 23.591 24.028-4.287 1.00 17.90 N ATOM 9 CZ ARG A 1 24.299 23.972-5.389 1.00 19.71 C ATOM 10 NH1 ARG A 1 25.432 23.261-5.440 1.00 24.10 N ATOM 11 NH2 ARG A 1 23.721 24.373-6.467 1.00 14.01 N ATOM 12 N PRO A 2 25.667 26.396 0.708 1.00 10.92 N ATOM 38 N CYS A 5 23.095 22.004 2.522 1.00 7.84 N ATOM 39 CA CYS A 5 22.106 21.863 1.467 1.00 9.61 C ATOM 40 C CYS A 5 22.192 20.518 0.830 1.00 10.97 C ATOM 41 O CYS A 5 21.230 20.068 0.167 1.00 9.33 O ATOM 42 CB CYS A 5 22.358 22.904 0.371 1.00 10.97 C ATOM 43 SG CYS A 5 22.145 24.592 0.888 1.00 12.56 S residue mobility

How accurate are coordinates? X-ray 19/05/2008 [ 17 ] 3 decimal places? (10-3 Å) 5 significant digits? typical resolution in a protein file? 1 to 2.5 Å what is x Å resolution? two points < x Å separated, they seem to be one point mobility? at room temperature atoms some atoms move by some Å what does 0.001 Å mean? where do the numbers come from? Later NMR?..

pta How accurate are coordinates: NMR 19/05/2008 [ 18 ] NMR precision / accuracy not well defined Most common method generate 100 s structures based on NMR data keep best 20 or 50 submit these to protein data bank How accurate are they? clearly the individual Å mean little

19/05/2008 [ 19 ] Where do the digits come from? We cannot resolve atoms to Å; where do the digits come from? If every atom had an error of 1 or more Å structures would not look so regular but the measurements do have this error.. How to reconcile this start with an idea of covalent geometry CC bonds 1.1 Å, CN bonds 1.3 Å.. standard values for angles treat protein as a chain with these restrictions think of all the rotatable angles fit every atom to where the experiment says it should be result?

Resulting coordinates 19/05/2008 [ 20 ] coordinates look perfect not as accurate as they seem coordinates in PDB are mixture of experiment (where NMR / crystallography puts them) what we expect all bond lengths, angles Given some coordinates, how can we compare them?

Comparing coordinates 19/05/2008 [ 21 ] These are very similar These are clearly related, less similar We want to put numbers on this property First some notation We have spoken of x, y, z coordinates. Easier.. vector r v or for atom i, r v i for two proteins let us have position i in protein a and b va vb r i and r i

Comparing two proteins 19/05/2008 [ 22 ] take one atom (C α ) from residue i what do I know from the picture? va v if my two proteins are similar r i - ri for each residue i va vb v define r - r distance between r i i b a i will be a short vector v and r b i I want a single number that tells me usually how close is a residue in a to the corresponding residue in b va vb think of the set of distances r i - ri how spread out is this population of distances? like a standard deviation (standard Abweichung)

root mean square 19/05/2008 [ 23 ] normal formula for standard deviation σ x N 1 = N i= 1 ( x x) i 2 1 2 something similar for coordinates r rmsd N res 1 v = r Nres i= 1 where proteins a and b have N res residues rmsd is root mean square difference complications a i v r b i 2 1 2

Before calculating rmsd 19/05/2008 [ 24 ] two very similar proteins coordinates are in different orientations not on top of each other what are the orientations of files in PDB? totally arbitrary first some other steps

Superposition of coordinates 19/05/2008 [ 25 ] rotation and translation now use formula for rmsd

first problems with rmsd 19/05/2008 [ 26 ] Before calculating rmsd coordinates must be superimposed (translation + rotation) if you and I use slightly different superpositions our rmsd values (similarity) will be different meaning of rmsd units Å rmsd is size dependent 5 Å in a small protein (50 residues) will not look similar 5 Å in a big protein (250 residues) will look similar

Difficulty with rmsd 19/05/2008 [ 27 ] these two proteins have the same number of residues N res 1 v = a vb rrmsd r i ri Nres i= 1 if i = 1, 2, 3,.. we use residue 1, 2, 3 in both proteins 2 1 2 these two proteins have slightly different numbers of residues we cannot compare residue 1 to 1, 2 to 2..

Proteins of different sizes first version 19/05/2008 [ 28 ] Problem - for each residue i in protein a we need matching residue in protein b One approach first build a sequence alignment?

Selecting residues for alignment 19/05/2008 [ 29 ] take the sequence of each protein, calculate alignment ACDEFG-IK-MNP.. A-DEGGHIKLMNP.. use these residues ACDEFG-IK-MNP.. A-DEGGHIKLMNP.. will find corresponding residues will allow for missing / inserted residues used in some programs chimera problem sequence similarity may be near nothing a sequence based alignment may be very wrong

Selecting residues for alignment - better 19/05/2008 [ 30 ] We need corresponding residues some kind of alignment can one do an alignment based on structures? Answer : yes but.. no guaranteed correct solution many different methods naïve approach how difficult?

1vs60 & 1yl35 Naïve structural alignments 19/05/2008 [ 31 ] how hard is it to superimpose two structures (different sizes) looks easy when answer is known

Superposition 19/05/2008 [ 32 ] 3 types of translation at each step in x, y, z 3 rotations to fix worse problem.. as you move one structure match gets better and worse

Local minima 19/05/2008 [ 33 ] each of these superpositions matches different parts of the helix difficult to calculate quality of match each superposition implies a different alignment

Summary of comparing two structures we want a single measure of similarity (like rmsd) this requires we have a set of corresponding residues in the two proteins if there is good sequence similarity use it naïve methods will not give the best superposition structure-based alignments can be calculated require approximations often slow can not guarantee the best answer Next topic.. what can we do now that we can quantify similarity? classifications 19/05/2008 [ 34 ]