1 Swiss Institute of Bioinformatics Protein Structure Bioinformatics Introduction Basel, 27. September 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 - 4056 Basel, Switzerland Tel: 41-61 267 15 81 Torsten.Schwede@unibas.ch Introduction & Recapitulation Proteins Polypeptides Amino acids Physicochemical Properties
2 Introduction Three and one letter code:
3 Amino Acids with aliphatic Side-Chains 3 H 3 C H 3 C Ala (A) Val (V) H 3 C 2 H 3 C H 3 C H 3 C 2 Ile (I) Leu (L) Sidechains with hydroxyl (-OH) groups HO 2 Ser (S) pk=13 H 3 C HO Thr (T) pk=13
4 Sidechains containing sulphur HS 2 Cys (C) pk=8.3 H 3 C S 2 2 Met (M) Acidic amino acids Asp (D) pk=3.9 - OOC 2 Glu (E) - OOC 2 2 pk=4.1
5 Amides of acidic amino acids Asn (N) H 2 N C O 2 Gln (Q) H 2 N C O 2 2 Basic Amino Acids Arg (R) pk=12.5 HN C 2 NH 2 2 2 NH 2 Lys (K) pk=10.8 2 2 2 2 His (H) pk=6.0 HC HN C H N 2
6 Side-chains with aromatic rings Phe (F) HC HC HC C 2 Tyr (Y) pk=10.1 Trp (W) HO HC HC H C C H HC C HC C C N H C C 2 2 Special cases Pro (P) H 2 C H 2 C 2 NH 2 Imino acid Gly (G) H
7 Side Chain Structures Side Chain Properties Neutral Hydrophobic Alanine Valine Leucine Isoleucine Proline Tryptophane Phenylalanine Methionine Neutral Polar Glycine Serine Threonine Tyrosine Cysteine Asparagine Glutamine Basic Lysin Arginine (Histidine) Acidic Aspartic Acid Glutamic Acid
8 ph and pka ph ph [ ] = log H Water ion product log K w = [ H ][ OH ] [ H ] log [ OH ] = 10 ph poh = 14 14 = log10 14 ph and pka Dissociation of weak acids HA H A K a = [ H ][ A ] [ HA] [ ] [ HA] H = Ka [ A ] log log [ ] [ HA] H = log Ka log [ A ] [ ] [ A ] H = log K log a [ HA]
9 ph and pka Henderson - Hasselbach Equation ph = pk a log [ ] A [ HA] ph and pka 14 Glycine 12 10 pk 2 ph 8 6 4 Isoelectric point 2 0 pk 1 0 0.5 1 1.5 2 1 Equivalents of OH - added -1 NH 3 NH 3 COOH zwitterion NH 2
10 ph and pka Glu ph and pka Lys
11 ph and pka Enzymatic reactions often require proton transfer. Q: Which amino-acid(s) are able to change their protonation state under physiological conditions? Why do proteins fold? MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL
12 Anfinson s paradigm 1957, Nobel Prize 1972 Anfinson s paradigm MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL All the necessary information for the 3-dimensional structure of an enzyme is contained in the primary structure or sequence of the amino acids.
13 Levinthal's Paradox (1968) If a chain of a hundred amino acids is considered and it assumed each amino acid can exist in one of three conformations, extended, helical or loop, then there are 3 100 possible ways to arrange this chain. This is roughly 10 48 conformations. Bond rotation can be estimated to occur at a rate of roughly 10 14 s -1. This means that search for the right conformation through random searching alone would take the order of 10 34 s or 10 26 years, several orders of magnitudes greater than the age of the universe! [J. Chim. Phys., 1968, 85, 44] MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL Many proteins fold spontaneously to their native structure Protein folding is relatively fast (nsec sec) Chaperones speed up folding, but do not alter the structure The protein sequence contains all information needed to create a correctly folded protein.
14 Why do proteins fold? - N - C α (HR 1 ) - CO -N -C α (HR 2 ) - CO - N - C α (HR 1 ) - CO - Side Chain Properties Neutral Hydrophobic Alanine Valine Leucine Isoleucine Proline Tryptophane Phenylalanine Methionine Neutral Polar Glycine Serine Threonine Tyrosine Cysteine Asparagine Glutamine Basic Lysin Arginine (Histidine) Acidic Aspartic Acid Glutamic Acid
15 Hydrophobic Effects main driving force for protein folding Surface Definitions Van der Waals Radius Molecular Surface Solvent Accessible Surface
16 Hydrogen Bonds H-atoms bound to electronegative atoms (e.g. N, O) are polarized and can form H-bonds H-bonding partners include: main chain atoms side chain atoms water molecules ligands, etc N H O N C C Q: Do H-bonds stabilize a protein fold?
17 Energetics of protein folding Energetics of protein folding G = H - T S H-bonds hydrophobic effects salt bridges SS - bonds loss of solvation entropy change dispersion / VdW contacts conformational energy Difference of two very large energetic terms Low overall stabilization energy Why do proteins fold? Change of energy state from unfolded to folded Folded state must have overall lower energy Let s assume the folded state is the lowest possible state for this polypeptide
18 Protein Sequence Space How many different proteins are theoretically possible? How many of these have been tested during evolution? Protein sequence space Assuming a peptide of length 100 aa Possible combinations: n c = 20 100 1.27 *10 130 Volume of one peptide: r atom 2Å v atom 35Å 3 packing 75% v peptide 1.3 * 10 5 Å 3
19 Protein sequence space 1.27*10 130 combinations. For comparison Volume of the Earth: R 6.4 *10 V = 4 3 πr 3 3 km 6.4*10 1.1* 10 51 Å 3 16 Å 51 Peptides/Earth: 1.1* 10 45 np 7.7 * 10 5 1.3*10 Protein sequence space 1.27*10 130 combinations. For comparison Age of the Earth: 3*10 9 years 2.6*10 13 hours 9.5*10 16 sec 9.5 * 10 28 psec If the whole planet consisted of peptides, and peptides were renewed every psec... 45 28 74 ( 7.7 *10 )* ( 9.5 *10 ) 7.3 * nt 10
20 Protein sequence space Assuming a peptide of length 100 aa Possible combinations: n c = 20 100 1.27 *10 130 If the whole planet consisted of peptides, and peptides were renewed every psec... 45 28 74 ( 7.7 *10 )* ( 9.5 *10 ) 7.3 * nt 10 10 130 10 75 10 130?!? Introduction & Recap Principles of Protein Structure Primary Structure Secondary Structure Tertiary Structure Quaternary Structure
21 Principles of protein structure Tertiary Primary Quaternary Secondary Geometry of a peptide bond H R R H
22 Dihedral angles Φ, Ψ, and ω Q: Which values would you you expect for ω? ω Dihedral angles Φ and Ψ
23 Dihedral angles Φ and Ψ Φ = 0, Ψ = 0 Ramachandran Plots Ψ (deg) Φ (deg)
24 Ramachandran Plots Ψ (deg) Φ (deg) Amino acid preferences Alanine and Arginine ALA ARG
25 Amino acid preferences Amino acid with special preferences: GLY PRO Alpha helices
26 Beta strands / beta sheets Anti-parallel beta sheet
27 Parallel and anti-parallel beta sheets Left-handed twist in beta-sheets 0-30 per aa Bovine pancreatic trypsin inhibitor
28 Turns and loops Schematic diagram showing the interresidue backbone hydrogen bonds that stabilize the reversal of the chain direction. Side chains are depicted as large light purple spheres. Due to the tight geometry of the turn, some residues are found more commonly in turns than others. Turns and loops Hairpin loops
29 Conformational Preferences α β RT Biochimica et Biophysica Acta 916: 200-204 (1987). Protein Structure / Fold Databases PDB: http://www.pdb.org EBI-MSD http://www.ebi.ac.uk/msd
30 PDB Holdings PDB Holdings
31 PDB Growth Pair wise protein structure comparison
32 Pair wise protein structure comparison Root mean square deviation Comparing two structures A and B R i,a = Position of atom i in structure A n = Number of equivalent atoms r. m. s. d. n i= 0 = ( R n R 2 i, A i, B) RMSD Comparing two structures Min: r. m. s. d. n i= 0 = ( R R 2 i, A i, B) n
33 RMSD Comparing two structures References I. Branden, J. Tooze. Introduction to Protein Structure, Garland Publishing. P.E.Bourne, H. Weissig. Structural Bioinformatics, Wiley- Liss and Sons. G.A. Petsko, D. Ringe. Protein Structure and Function, New Science Press.