Proteins: Of sequences and structures
First investigations into protein structure 1895: Wilhelm Conrad Röntgen discovers X-rays electromagnetic (photon) spectrum early X-ray picture (radiograph) taken at a public lecture by Wilhelm Röntgen (Jan 1896) sources: wikipedia
First investigations into protein structure 1895: Wilhelm Conrad Röntgen discovers X-rays 1909-1915: William Henry Bragg (father) and William Lawrence Bragg (son) found X-ray crystallography (exploring submicroscopic structure through X-ray diffraction of crystals) diffraction pattern of APS Kinase, UC Davis Structural Biology Lab source: wikipedia
First investigations into protein structure 1895: Wilhelm Conrad Röntgen discovers X-rays 1909-1915: William Henry Bragg (father) and William Lawrence Bragg (son) found X-ray crystallography (exploring submicroscopic structure through X-ray diffraction of crystals) 1930s: William Astbury (former graduate student of WH Bragg) collects and classifies diffraction spectra for natural fibers (fiber diffraction): wool, feathers, hair, quills, tendons, even DNA preparations. He finds three types of spectra for proteinacious material: an α-type (native wool), a β- type (denatured wool), and a γ-type (tendon) α-type β-type γ-type Astbury, 1938
First investigations into protein structure 1895: Wilhelm Conrad Röntgen discovers X-rays 1909-1915: William Henry Bragg (father) and William Lawrence Bragg (son) found X-ray crystallography (exploring submicroscopic structure through X-ray diffraction of crystals) 1930s: William Astbury (former graduate student of WH Bragg) collects and classifies diffraction spectra for natural fibers (fiber diffraction): wool, feathers, hair, quills, tendons, even DNA preparations. He finds three types of spectra for proteinacious material: an α-type (native wool), a β- type (denatured wool), and a γ-type (tendon) 1948-51: race for the correct structural models for the three types between Linus Pauling and the Cavendish group (Bragg, Perutz, Kendrew, Crick)
Secondary structures 1948: stuck in bed during a trip to Oxford, Pauling uses paper strips with schematic drawings of the polypeptide backbone to explore structural parameters of the α-type. He settles on a model with a rise of 1.5 Å per residue and 3.6 residues per turn. The fact that the periodicity of the helix is 5.4 Å, differing from the dominant meridional arc of the α-type at 5.1 Å, prevents him from publicising the model 1950: Bragg, Perutz and Kendrew publish a 35-page paper evaluating all models they deemed compatible with the α-type and settle on a nonhelical ribbon structure (Proc. Roy. Soc. A203:321-357). - "It was one of those papers you publish mainly because you've done all that work." (Perutz) - "I have always regarded this paper as the most ill-planned and abortive in which I have ever been involved" (Bragg) The main problems with the paper: (I) failure to recognize the planarity of the peptide bond (II) premise that the structure had to have an integer number of res/turn (III) effort to accomodate the 5.1 Å meridional arc
Secondary structures 1950: Pauling and Corey publish a brief communication in JACS announcing models for two spiral configurations of the polypeptide chain. The one with 1.47 Å rise per residue and 3.7 residues per turn is the α- type (J. Am. Chem. Soc. 72:534). The other is approximately a π-helix and does not occur in nature. Pauling understood the planarity of the peptide bond and the importance of backbone hydrogen bonds. The confidence to ignore the 5.1 Å meridional arc came from the diffraction spectrum of a synthetic fiber made by DuPont, which had the arc at 5.4 Å. 1951: Pauling, Corey and Branson publish eight back-to-back papers in PNAS describing structural models for the α-, β-, and γ-type, as well as for other configurations of the polypeptide chain. Essentially, only the structural solutions for the α- and β-type are correct, but...
Secondary structures Pauling, Corey and Branson PNAS 1951
Secondary structure - the Ramachandran plot Pauling's α-helix was left-handed. It remained for Ramachandran to show in 1963, on theoretical grounds, that the α-helix had to be right-handed because of steric clashes between the backbone and the Cβ-carbon of the sidechains Branden & Tooze, Introduction to Protein Structure
Secondary structure hydrogen-bonded helices 3 10 -helix i+3 i H-bonds phi = -49, psi = -26 3 residues/turn, typically 1-2 turns, longest known instance 11 res α-helix i+4 i H-bonds phi = -60, psi = -45 3.6 residues/turn, can be arbitrarily long π-helix i+5 i H-bonds phi = -57, psi = -70 4.1 residues/turn, typically 1 turn, longest known instance 10 res Schultz and Schirmer, Principles of Protein Structure
Secondary structure - helices π-helices are typically observed as kinks in α-helices, obtained by insertion of one residue π-helix i+5 i H-bonds phi = -57, psi = -70 4.1 residues/turn, typically 1 turn, longest known instance 10 res wikipedia Schultz and Schirmer, Principles of Protein Structure
Beyond secondary structure: 5.4 vs 5.1 Å coiled coils (Crick) and compound helices (Pauling) Pauling's secondary structures were an instant triumph (and humiliation to the Cavendish lab see the race for the DNA structure), but did not explain the dominant meridional arc of the α-form at 5.1 Å In 1952, Pauling and Crick both realized that the discrepancy could be explained if the helices in fibers of the α-form were distorted by packing with neighboring α-helices in the opposite sense of their own handedness. They submitted papers on this within two weeks of each other Crick derived an exact description of the model, based on parametric equations. The distortion resulted from the fact that, if one were to twist the α-helix, which has 3.6 residues per turn, into a spiral with 3.5 residues per turn (7 residues over 2 turns = the heptad repeat), then residues could lock systematically at the interface between helices ('knobs-into-holes'). tropomyosin
Beyond secondary structure: Crick's parametrization of the coiled coil Transformation of an ideal α-helix extending along the z axis: xi = r0.cos(2πz/p+φi) + x.cos(2πz/p+φi) - y.cosα.sin(2πz/p+φi) yi = r0.sin(2πz/p+φi) + x.sin(2πz/p+φi) + y.cosα.cos(2πz/p+φi) zi = z - y.sinα where x, y, z are the coordinates of the ideal α-helix, i indicates helices 1 to i of the coiled coil, φi describes the rotation of each helix relative to the ideal α-helix (0 for helix 1, π/2 for helix 2, π for helix 3 and 3π/2 for helix 4), r0 is the supercoil radius, P is the supercoil pitch, and α is the pitch angle given by: α = arctan(2πr0/p). The pitch P is calculated from the supercoil radius r0, the axial rise per amino acid h, and the twist differential Δt: P = (2π/Δt).[h2 - (r0δt)2]1/2 where the twist differential Δt is derived from the number of residues per turn of the ideal α-helix (a) and from the periodicity of hydrophobic residues of the sequence to be modeled (p): Δt = 2π(1/a - 1/p). Input values: axial rise per residue h, number of residues per turn of the ideal α-helix a, supercoil radius r0, number of helices i
Beyond secondary structure: coiled coils
Periodicity of hydrophobic residues in coiled coils Crick predicted that the distortion (supercoil) of helices in coiled coils would be due to the regular meshing of hydrophobic sidechains in the core and be driven energetically by the exclusion of these sidechains from water (hydrophobic effect). The periodicity required for regular meshing is 7/2 = 3.5 residues per turn. Crick therefore predicted that the sequences of coiled coils would show a repeating pattern of seven residues a-g (the heptad repeat), with hydrophobic residues in positions a and d.
Beyond secondary structure: Pauling's periodicities Pauling did not understand the role of sidechains in bringing about coiledcoil structure (compound helices in his nomenclature), so he proposed a number of additional periodicities, such as 11/3 and 15/4, which would have the opposite handedness from 7/2. The issue with these periodicities would be that sidechains do not mesh perfectly in the core, as for 7/2.
Beyond secondary structure: Pauling's periodicities Today we know that these periodicities occur in natural coiled coils and employ an additional packing mode in which the sidechains do not lock, but face each other ('knobs-to-knobs'). In fact, most combinations of 3 and 4 residue elements starting with a hydrophobic residue can give coiled coils 4.0 number of residues per turn 1 2 3 4 5 helical turns per repeat 6 7 3.6 3.5 righthanded straight lefthanded stutters stammers canonical coiled coil (7/2) (11/3) (10/3) (15/4) (18/5) (17/5) 3.0 Gruber and Lupas, TiBS 2003 7 10 11 14 15 17 18 19 20 21 22 24 25 26 27 sequence repeat length
Beyond secondary structure: Pauling's periodicities Lupas and Gruber, Adv Prot Chem 2005
Secondary structure - helices without backbone hydrogen bonds Polyproline I helix right-handed, cis peptide bonds, no internal H-bonds, phi = -75, psi = 160, 3.3 residues/turn Polyproline II helix (collagen) left-handed, trans peptide bonds, no internal H-bonds, phi = -75, psi = 150, 3 residues/turn http://en.wikipedia.org/wiki/polyproline_helix
Beyond secondary structure: collagen - a triple helix mainly formed of Gly-Pro-hydroxyPro repeats Both Pauling and Crick proposed incorrect models for collagen. The correct model was derived, after several increasingly accurate attempts starting in 1954, by Ramachandran (Ramachandran and Sasisekharan, Nature 1961) http://what-when-how.com/protein-structure/ secondary-structural-motifs-protein-structure/
Beyond secondary structure: the role of conserved hydrophobic residues As Crick correctly posited in 1953 for the coiled coil, hydrophobic residues will be repelled by water and driven to associate with each other (hydrophobic collapse) The conserved hydrophobic positions in the sequence of a protein are therefore powerful indicators of the structure(s) being formed and an important tool in bioinformatic analysis hydrophobic residues in the patterns h--h---h, h--hh--h and h--h--h indicate an amphipathic α-helix, in the pattern h-h-h-h an amphipathic β- strand, and in the pattern hhhh a buried β-strand
Structure prediction from hydrophobic patterns structure HHHHHHHHHH EEE HHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEE HHHH EEEE HHHH Quick2D hhhhhhhhhh eee hhhhhhhh eeeee hhhhhhhhhhh eeee hhhhhhh eeeeeeee hhhhhh T0077 QESINQKLALVIKSGKYTLGYKSTVKSLRQGKSKLIIIAANTPVLRKSELEYYAMLSKTKVYYFQGGNNELGTAVGKLFRVGVVSILEAGDSDILTTLA.. GI132877 MVDFAFELRKAQDTGKIVMGARKSIQYAKMGGAKLIIVARNARPDIKEDIEYYARLSGIPVYEFEGTSVELGTLLGRPHTVSALAVVDPGESRILALGGKE GI3257971.MDLAFELKKALETGKVILGSNETIRLAKTGGAKLIIVARNAPKEIKDDIYYYAKLSDIPVYEFEGTSVELGTLLGKPFVVASLAIVDPGESRILALVKR. GI132947 SMDINRAIRVAVDTGNVVLGTKQAIKNIKHGEGKLVIIAGNCAKDVKEDIFYYTKLSETPVYTHQVTSIELGAICGKPFPVSALLVLEPGNSAILNINNE. GI1710554 NMDVNKAIRTAVDTGKVILGSKRTIKFVKHGEGKLVVLAGNIPKDLEEDVKYYAKLSNIPVYQHKITSLELGAVCGKPFPVAALLVLDEGLSNIMELVEKK GI3122729.MDIDRGIRVAVDTGNVILGSKRTIQSLKLGKGKLVVMASNIPEDLKEDIEYYAKLSEIPVYTHEGTSVELGSVCGKPFTVGALLIQDPGDSTILEMVG.. GI141382 SQSFEGELKTLLRSGKVILGTRKTLKLLKTGKVKGVVVSSTLRQDLKDDIMTFSKFSDIPIYLYKGSGYELGTLCGKPFMVSVIGIVDEGESKILEFIKEV GI730563 QENINQKLALVVKSGKYSLGYKSTVKSLRQGKAKLIIIAANTPVLRKSELEYYAMLSKTKVYYFQGGNNELGTAVGKLFRVGVVTILDAGDSDILTALA.. GI2507317 GDTINAKLALTMKSGKYVLGYKSTLKTLRSGKAKLILIAGNCPPLRKSELEYYAMLSKANVHHYAGTNIDLGTACGKLFRVGVLAITDAGDSDILDA... GI132876 LESINSRLQLVMKSGKYVLGYKQTLKMIRQGKAKLVILANNCPALRKSEIEYYAMLAKTGVHHYSGNNIELGTACGKYYRVCTLAIIDPGDSDIIRSMPEQ GI2668750 TDNINNKLQLVMKSGKYTLGYKTVLRTLRNSKSKLVIIANNCPPLRKSEIEYYAMLAKVTVHHFHGNNVDLGTACGKYFRVCCLSIIDPGDSDIIKTTPGE GI2879811 HESINNRLALVMKSGKYTLGYKTVLKSLRSSKGKLIIIANNCPPLRKSEIEYYAMLAKVGVHHYNGNNVDLGTACGKYYRVCCLSIVDPGDSDIIKTLPGD GI1350718 VDTINTKIQLVMKSGKYVLGTQQSLKTLRQGRSKLVVISANCPPIRKAEIEYYCTLSKTPIHHYSGNNLDLGTACGRHFRACVLSITDVGDSDITSA... GI730555 VNTINAKLQLVMKSGKYVLGTQQALTTLRQGRSKLVVIANNCPPIRRAEVEYYCTLSKTPIHHYSGNNLDLGTACGKPFRTCVLSVTNVGDSDIAT... GI2129247 QKELLDAVAKAQ...KIKKGANEVTKAVERGIAKLVIIAEDVPEEVVAHLPYL.CEEKGIPYAYVASKQDLGKAAGLEVAASSVAIINEGDAEELKVLIEK GI2621305 ADKAAEALEIARETGKVSKGTNEVTKAVERGVAQLVLIAEDVPAEIVAHLPLL.AEEKEIPYIYIPTKDELGAAAGLNVGTASSAIVEAGDAEDLIKEIIE GI3257921 AEKALQAVEIARDTGKIRKGTNETTKAVERGQAKLVIIAEDVPEEIVAHLPPL.CEEKEIPYIYVPSKKELGAAAGIEVAAASVAIIEPGKARDLVEEIAM GI2500350 ADKVLEAVRKAKESGKIKKGTNETTKAVERGQAKLVIIAEDVPEEIVAHLPLL.CDEKKIPYVYVSSKKALGEACGLQVATASAAILEPGEAKDLVDEIIK GI2649836 QNEALSLLEKVRESGKVKKGTNETTKAVERGLAKLVYIAEDVPPEIVAHLPLL.CEEKNVPYIYVKSKNDLGRAVGIEVPCASAAIINEGELRKELGSLVE GI2648662 MTEVETIIKTVLKTGGYRLGSKSTLKSLRNGEAKAVIVASNCPEEVLEKIKSY...DVKILVYNGTNMELGALCGKPFSVAAMAITEEI... CASP3 target T0077, ribosomal protein L30 two inner, two outer β-strands doubly wound topology with three parallel and one anti-parallel strand possible strand order: 1243 or 1423
Structure prediction from hydrophobic patterns structure model core superposition (the N- and C-terminal helices are omitted in order to free the view onto the central sheet)
Structure prediction from hydrophobic patterns Integral membrane proteins do not fold by hydrophobic collapse and have their own specific patterns of hydrophobic residues eukaryotic and archaeal membrane proteins and bacterial inner membrane proteins consist mainly of transmembrane α-helices bacterial outer membrane proteins are mainly formed of transmembrane β-strands
Structure prediction from hydrophobic patterns Hydrophobic patterns of transmembrane sequences green: inner membrane proteins are anchored at the membrane interface (primarily on the intracellular side) by positively charged residues green: outer membrane proteins are anchored at the membrane interface (on both sides) by aromatic residues
Structure prediction from hydrophobic patterns Bioinformatic analysis of inner membrane proteins
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia >gi 401465 sp P31489 YADA_YEREN INVASIN PRECURSOR! MTKDFKISVSAALISALFSSPYAFADDYDGIPNLTAVQISPNADPALGLEYPVRPPVPGAGGLNASAKGI! HSIAIGATAEAAKGAAVAVGAGSIATGVNSVAIGPLSKALGDSAVTYGAASTAQKDGVAIGARASTSDTG! VAVGFNSKADAKNSVAIGHSSHVAANHGYSIAIGDRSKTDRENSVSIGHESLNRQLTHLAAGTKDTDAVN! VAQLKKEIEKTQENTNKRSAELLANANAYADNKSSSVLGIANNYTDSKSAETLENARKEAFAQSKDVLNM! AKAHSNSVARTTLETAEEHANSVARTTLETAEEHANKKSAEALASANVYADSKSSHTLKTANSYTDVTVS! NSTKKAIRESNQYTDHKFRQLDNRLDKLDTRVDKGLASSAALNSLFQPYGVGKVNFTAGVGGYRSSQALA! IGSGYRVNENVALKAGVAYAGSSDVMYNASFNIEW!
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia >gi 401465 sp P31489 YADA_YEREN INVASIN PRECURSOR! MTKDFKISVSAALISALFSSPYAFADDYDGIPNLTAVQISPNADPALGLEYPVRPPVPGAGGLNASAKGI! HSIAIGATAEAAKGAAVAVGAGSIATGVNSVAIGPLSKALGDSAVTYGAASTAQKDGVAIGARASTSDTG! VAVGFNSKADAKNSVAIGHSSHVAANHGYSIAIGDRSKTDRENSVSIGHESLNRQLTHLAAGTKDTDAVN! VAQLKKEIEKTQENTNKRSAELLANANAYADNKSSSVLGIANNYTDSKSAETLENARKEAFAQSKDVLNM! AKAHSNSVARTTLETAEEHANSVARTTLETAEEHANKKSAEALASANVYADSKSSHTLKTANSYTDVTVS! NSTKKAIRESNQYTDHKFRQLDNRLDKLDTRVDKGLASSAALNSLFQPYGVGKVNFTAGVGGYRSSQALA! IGSGYRVNENVALKAGVAYAGSSDVMYNASFNIEW!
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia >gi 401465 sp P31489 YADA_YEREN INVASIN PRECURSOR!! MTKDFKISVSAALISALFSSPYAFA/! DDYDGIPNLTAVQISPNADPALGLEYPVRPPV! PGAGGLNASAKGIH! SIAIGATAEAAKGA! AVAVGAGSIATGVN! SVAIGPLSKALGDS! AVTYGAASTAQKD! GVAIGARASTSDT! GVAVGFNSKADAKN! SVAIGHSSHVAANHGY! SIAIGDRSKTDREN! SVSIGHESLNRQ! LTHLAAGTKDTDAVNVAQ! LKKEIEKTQENTNKR! SAELLANANAYADNK! SSSVLGIANNYTDSK! SAETLENARKEAFAQ! SKDVLNMAKAHSNSV! ARTTLETAEEHANSV! ARTTLETAEEHANKK! SAEALASANVYADSK! SSHTLKTANSYTDVTVSNS! TKKAIRESNQYTDHK! FRQLDNRLDKLDTRVDKGLASSAALNS! LFQPYGVGKVNFTAGVGGYRSSQALAIGSGYRVNENVALKAGVAYAGSSDVMYNASFNIEW! Hoiczyk E. et al., EMBO J. 2000
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia Anchor structure YadA XadA NadA DsrA Hsf EibA UspA2 BadA β1 β2 β3 β4 coiled coil a d a d a d x a FRQLDNRLDKLDTRVDKGLASSAALNSLFQPYGVGKVNFTAGVGGYRSSQALAIGSGYRVNENVALKAGVAYAGSSDVMYNASFNIEW ---VNGQMRRQDRRISRQGAMGAAMLNMATSAAG QNRVGAGVGFQNGQAALSLGYQRAISDRSTVTIGGAFSS-SDSSVGIGAGFGW IDSLDKNVANLRKETRQGLAEQAALSGLFQPYNVGRFNVTAAVGGYKSESAVAIGTGFRF ENFAAKAGVAVGTS SAAYHVGVNYEW MEQNTHNINKLSKELQTGLANQSALSMLVQPNGVGKTSVSAAVGGYRDKTALAIGVGSRI DRFTAKAGVAFNTY GMSYGASVGYEF INKLGDHINKVDKDLRAGIAGATAVAFLQRPNEAGKSIVSLGVGSYRSESAIAVGYARNS NKISIKLGGGMNSRGDVNFGGSIGYQW LDSQQRQINENHKEMKRAAAQSAALTGLFQPYSVGKFNASAAVGGYSDEQALAVGVGYRFNEQTAAKAGVAFSD-GDASWNVGVNFEF VNAFDGRITALDSKVENGMAAQAALSGLFQPYSVGKFNATAALGGYGSKSAVAIGAGYRVNPNLAFKAGAAINTS KGSYNIGVNYEF FEALSYAVEDVRKEARQAAAIGLAVSNLRYYDIPGSLSLSFGTGIWRSQSAFAVGAGYTS GNIRSNLSITNAG-GHWGVGAGITLRL Koretke et al., J Struct Biol 2006
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia >gi 401465 sp P31489 YADA_YEREN INVASIN PRECURSOR!! MTKDFKISVSAALISALFSSPYAFA/! DDYDGIPNLTAVQISPNADPALGLEYPVRPPV! PGAGGLNASAKGIH! SIAIGATAEAAKGA! AVAVGAGSIATGVN! SVAIGPLSKALGDS! AVTYGAASTAQKD! GVAIGARASTSDT! GVAVGFNSKADAKN! SVAIGHSSHVAANHGY! SIAIGDRSKTDREN! SVSIGHESLNRQ! LTHLAAGTKDTDAVNVAQ! LKKEIEKTQENTNKR! SAELLANANAYADNK! SSSVLGIANNYTDSK! SAETLENARKEAFAQ! SKDVLNMAKAHSNSV! ARTTLETAEEHANSV! ARTTLETAEEHANKK! SAEALASANVYADSK! SSHTLKTANSYTDVTVSNS! TKKAIRESNQYTDHK! FRQLDNRLDKLDTRVDKGLASSAALNS! LFQPYGVGKVNFTAGVGGYRSSQALAIGSGYRVNENVALKAGVAYAGSSDVMYNASFNIEW! Hoiczyk E. et al., EMBO J. 2000
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia Head and neck structure Nummelin H. et al., EMBO J. 2004
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia Stalk structure 4.0 number of residues per turn 1 2 3 4 5 helical turns per repeat 6 7 3.6 3.5 righthanded straight lefthanded stutters stammers canonical coiled coil (7/2) (11/3) (10/3) (15/4) (18/5) (17/5) 3.0 7 10 11 14 15 17 18 19 20 21 22 24 25 26 27 sequence repeat length
Combining simple rules for structure prediction: the YadA surface adhesin of Yersinia model (shear no. 18) structure (shear no. 12) pores with shear numbers of 14, 16 and 18 superposition of model (red) to structure shows the discrepancy in shear nos. fiber model Koretke K.K. et al., JMB 2006 Shahid S.A. et al., Nat Meth 2012