Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn PDF Free Download

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

The presentation is based on the presentation by Professor Alexander Dikiy, which is given in the course compedium: Part 4.4 on page 165

Outline Part 1: Protein structure fundamentals Part 2: Determining the protein structure

Part 1: Protein structure fundamentals

Polypeptides Biopolymer Monomers (building blocks): Amino Acids Monodisperse DNA RNA Protein Defined sequence of amino acids A protein: one or more polypeptide chains folded into a structure, having a biological function All proteins are polypeptides, but not all polypeptides are proteins

Amino acids the building blocks

Amino acids stereochemistry In Fischer projection: Vertical bonds: stretch out in the space behind the paper Horizontal bonds: stretch up and out of the plane of the paper L-configuration: Functional group (-NH 3+ ) to the left D-configuration: Functional group (-NH 3+ ) to the right The proteins are constituted by L-AA isomers

Amino acids chemistry of the side group Nonpolar, aliphatic Polar, uncharged Aromatic Charged Positively Negatively

Chemistry of the side group function 20 different amino acids many different functional groups in one molecule The proteins are tailor made to specific biological functions and reactions Proteins from very different organisms, with the same biological function: almost identical or very similar primary structure (homology) Complex proteins Glycoprotein Lipoprotein Phosphoproteins Functions: Catalysis Regulation Structure Movement Transport Signaling

The protein alphabet The protein alphabet is represented either in a one-letter or a threeletter code language Each AA has its own unique code definition Alanine Ala A Arginine Arg R Asparagine Asn N Aspartic acid Asp D Cysteine Cys C Glutamine Gln Q Glutamic acid Glu E Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V

Acid-base properties of amino acids Monovalent acid: HA H + + A - Henderson Hasselbach equation: K a ph = pka- log acid base ph = pk + log 1 a

Ionization of Gly and His

The isoelectric point (pi) of amino acids Definition: pi = ph when net charge is zero (+) = (-)

The properties of single amino acids are reflected on the protein functional peculiarities and structure

Importance of the amino acid nature for protein structure - The hemoglobin Hemoglobin A: -Val-His-Leu-Thr-Pro-Glu-Glu-Lys- Hemoglobin S: -Val-His-Leu-Thr-Pro-Glu-Val-Lys- Mutation of Glu (hydrophilic) on Val (hydrophobic) results in complete alteration of the protein structure thus causing disease Sickle cell anemia.

The peptide bond Formed by a condensation reaction: carboxyl + amine = amide + H 2 O

Rotation flexibility of AA

cis- and trans- AA

Backbone dihedral (torsion) angles Dihedral angle - Angle between two planes - Determined from 4 atoms Phi angle (φ) The dihedral angle composed of the four atoms: C(i-1) - >N(i) - C (i) - C(i). - free rotation around N-C bond. Psi angle (ψ) The dihedral angle composed of the four atoms: N(i) - C (i) - C(i) >- N(i+1). - free rotation around C -C(O) bond Omega angle (ω) The dihedral angle decided by the four atoms: Cα(i)-C(i)-N(i+1)-Cα(i+1) - rotation around the C(O)-N bond (peptide bond - restricted rotation, 0 or 180 (cis or trans)

Phi and Psi- dihedral angles can not take any values combination, due to steric hindrance Psi- angle

Main area 1: - φ: -60-180 - Ψ: -75-15 α helix Main area 2: - φ: -60-180 - Ψ: 10 180 β sheet

Polypeptide chain

Protein structure

Primary structure The amino acid sequence. The nascent polypeptide chain should, in most cases, take the protein fold. Let s consider a protein with 100 AA. If each AA can assume 3 different conformations (in practice it is much more), it would exist for this protein 3 100 = 10 47 possible conformations. However, the proteins, during around picoseconds, chooses its unique fold.

Anfinsen s experiment Proteins adopt their native structure/information spontaneously

Protein folding Proteins gets folded through the interaction of amino acids. Weak interactions: electrostatic, hydrophobic, hydrogen bonding, metal-aa coordination bonds Covalent bonds in a protein exist only within AA, peptide bond and disulfide bridges (S-S).

Secondary structure Interaction between AA lead to different types of secondary structure. Local folding -helix, -sheet and loops

Different types of helixes 3.6-3 - 5 - residues per turn

-helix Hydrogen bonding network: i - i+3 residue (3 10 helix), i - i+4 residue (normal helix), i - i+5 residue (pi helix)

-sheets antiparallel parallel

Tertiary structure Tertiary structure represents the protein folding and is a spatial arrangement of elements of secondary structure ( -helixes, -sheets), as well as connecting loops, turns, unfolded (not structured) regions. Total amount of different folds can be estimated as approximately 2000.

Protein domains A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of different proteins. Wikipedia Pyruvate Kinase 1pkn

Quaternary structure Only multi chain proteins have quaternary structure. The inter-chain interaction is based on weak and S-S interactions

Some structural examples 1) Membrane protein: Rhodopsin 2) Globular protein: SelW 3) Fibrous protein: Collagen

Membrane protein: Rhodopsin Rhodopsin is the protein component of the light receptor in the retinal rods of the vertebrate eye. Similar molecules are found in the light-sensing structures of all animals

Globular protein: The mammalian SelW protein SelW is a selenoprotein involved in cellular redox reactions. Small: 89 amino acids Motif: Cys-X-X-U, where U is Selenocystein Its structure reveals a - - - - - fold Globular

Fibrous protein: Collagen Collagen is the most abundant protein in mammals. About one quarter of all of the protein in your body is collagen. Collagen is the main protein of connective tissue. It has great tensile strength.

Fibrous protein: Collagen Three polypeptide chains with the repeat sequence: Gly-X-Y X is often proline Y is often hydroxyproline (posttranslational modification) Each chain is about 1000 amino acid residues long Synthesized as procollagens - globular propeptides that are excised off by extracellular enzymes. Excision of propeptides allows the triple chain molecule to polymerize into fibrils Branden C., Tooze, J. (1999) Introduction to protein structure, 2nd ed., Garland publishing, New York, p 284

Fibrous protein: Collagen Each of the three polypeptide chains are folded into an extended left-handed helix 3.3 residues per turn (α-helix: 3.6) Rise per residue: 2.9 Å (α-helix: 1.5) Rise per turn: 9.6 Å (α-helix: 5.4) More extended conformation than the α- helix. The three helices in collagen form a trimeric molecule by coiling about the central axis to form a right-handed superhelix Branden C., Tooze, J. (1999) Introduction to protein structure, 2nd ed., Garland publishing, New York, p 284 The side chain of every third residue is close to the central axis, where there is no room for a side chain, consequently every third residue must be a glycine.

Fibrous protein: Collagen

Part 2: Determining the protein structure

What can we learn analyzing the protein structure? Protein function Protein mechanism Protein evolution Protein system biology Structure based drug design

What does it mean to determine the 3D structure of a protein? Determine either ALL the distances between each atom and the remaining protein atoms or ALL protein s dihedral angles

Experimental techniques for macro-molecule structures determination Low resolution techniques 1. Electron microscopy 2. SAXS (small angle X-ray scattering) rough structure, topology, quarternary structure of large proteins. Not position of each atom High resolution techniques 1. X-ray crystallography first applied in 1961 (Kendrew and Perutz Nobel prize winners) 2. NMR spectroscopy first applied in 1983 (Ernst and Wuthrich Nobel prize winners) position of each atom

X-ray Crystallography Most widespread technique to determine high-resolution structure of molecules in the solid state The method depends on directing a beam of x-rays onto a regular, repeating array of many identical molecules a crystal The x-rays diffract from the crystal in a diffraction pattern The diffraction data from the crystal is used to calculate an electron density map Interpret the map as a polypeptide chain with a particular amino acid sequence Branden C., Tooze, J. (1999) Introduction to protein structure, 2nd ed., Garland publishing, New York, p 377

X-ray Crystallography

X-ray Crystallography Prerequisite Have to obtain well ordered crystals that diffract x-rays Proteins can be difficult: large spherical irregular surfaces that is impossible to pack into a crystal. Large channels between the individual molecules, filled with disordered solvent molecules Only a few contact points between the protein molecules. This is also the reason why the structures determined by x-ray crystallography are the same as those for the proteins in solution

NMR spectroscopy A technique that relies on observation of energy absorption by nuclei in a external magnetic field under the influence of electromagnetic radio frequency irradiation Place the protein molecules in a strong magnetic field and the spin of their nuclei will align along the field. This process is an equilibrium process If you apply radio frequency pulses the equilibrium alignment will be changed to an excited state When the nuclei return to the equilibrium state, they emit radio frequency radiation that can be measured The frequency of the emitted radiation depends on the chemical environment of the nucleus and will therefor be different for each atom. The different frequencies are obtained relative to a reference signal and is what we call a chemical shift.

NMR spectroscopy Distinguish various nuclei on the basis of their magnetic properties determined by their chemical environment The nuclei has to have an intrinsic magnetic moment (non zero spin): 1 H, 13 C, 15 N The nature, duration, and combination of the applied RF pulses can be varied to probe different molecular properties of the sample Assign the spectrum of chemical shifts Measure distances and dihedral angles Solid state NMR or Solution NMR Complementary techniques Crystallography: high resolution, fast technique, strong macromolecular complexes NMR: Structure in solution, dynamics(folding), weak macromolecular complexes

How can I find whether the structure I am interested in is already determined? all the determined structures are deposited in the protein data bank Internet address: www.rcsb.org

Statistics available at RCSB on October 9, 2012 85212 released atomic coordinate entries Molecule Type: 78911 proteins, peptides, and viruses 2432 nucleic acids 3845 protein/nucleic acid complexes 24 other Experimental Technique 78911 X-ray 9626 NMR 499 electron microscopy 165 other

ATOM 1 N CYS A 1-23.284 7.726 4.920 1.00 5.78 N ATOM 2 CA CYS A 1-23.838 6.461 5.494 1.00 4.91 C ATOM 3 C CYS A 1-22.786 5.345 5.449 1.00 3.96 C ATOM 4 O CYS A 1-21.826 5.419 4.700 1.00 3.93 O ATOM 5 CB CYS A 1-25.060 6.097 4.640 1.00 5.02 C ATOM 6 SG CYS A 1-26.538 6.897 5.318 1.00 5.60 S ATOM 7 H2 CYS A 1-24.029 8.449 4.870 1.00 6.28 H ATOM 8 HA CYS A 1-24.152 6.627 6.514 1.00 5.16 H ATOM 9 HB2 CYS A 1-24.908 6.431 3.625 1.00 5.62 H ATOM 10 HB3 CYS A 1-25.201 5.025 4.645 1.00 4.44 H ATOM 11 HG CYS A 1-26.624 7.759 4.904 1.00 5.70 H ATOM 12 H1 CYS A 1-22.908 7.542 3.966 1.00 5.73 H ATOM 13 H3 CYS A 1-22.516 8.073 5.530 1.00 6.16 H ATOM 14 N ALA A 2-22.968 4.318 6.246 1.00 3.36 N ATOM 15 CA ALA A 2-21.993 3.182 6.271 1.00 2.48 C ATOM 16 C ALA A 2-22.085 2.364 4.975 1.00 1.96 C ATOM 17 O ALA A 2-23.145 2.256 4.384 1.00 2.54 O ATOM 18 CB ALA A 2-22.369 2.322 7.481 1.00 3.05 C ATOM 19 H ALA A 2-23.753 4.294 6.832 1.00 3.63 H ATOM 20 HA ALA A 2-20.991 3.564 6.403 1.00 2.30 H ATOM 21 HB1 ALA A 2-22.564 2.957 8.333 1.00 2.93 H ATOM 22 HB2 ALA A 2-23.252 1.744 7.252 1.00 3.30 H

From the coordinates we can easily calculate the distance between two points: D=

What does it mean to determine the 3D structure of a protein? Determine either ALL the distances between each atom and the remaining protein atoms or ALL protein s dihedral angles Therefore, the coordinates of each atom allows us to determine ALL the distances within the protein, and thus describe the structure of our protein

References Brandon, C., and Tooze, J., (1999) Introduction to protein structure, 2 nd edition, Garland Publishing, New York Smidsrød, O., Moe, S.T., (2008) Biopolymer chemistry, Tapir academic press, Trondheim, chapter 3 & 8 Christensen, B.E., (2013) Compedium TBT4135 Biopolymers