Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu i. Account for role each group member played ii. Late policy: 5% lost per day late c. Grading rubric see attached 2. Protein structure a. Basics b. Experimental determination i. X-ray crystallography ii. NMR c. Classification d. Prediction: 2-D, 3D Basics of protein structure 1. Primary structure is the sequence of amino acids that compose the protein 2. Different regions of the sequence form local secondary structures, such as alpha helices and beta strands. 3. Tertiary structure is formed by packing secondary structural elements into one or several compact globular units called domains. 4. Final protein may contain several polypeptide chains arranged in quaternary structure. The secondary structure is formed through Hydrogen bonds between amino acids in the sequence Further interactions result in the formation of the tertiary structure (with help from chaperones, membrane proteins) Some motifs form distinct, highly predictable secondary structure Core of protein is more tightly packed and highly specialized than exterior 1
Protein Structure Elucidation X-ray crystallography The 3-D structure of a protein is determined by directing a beam of x-rays onto a regular, repeating array of many identical protein molecules (a crystal) which diffracts the x-rays. The resulting diffraction pattern can be used to determine the structure of the protein of interest. Crystallization of the protein of interest is usually difficult to achieve. The amplitudes and phases of the diffraction data from the protein crystals are used to calculate an electron-density map. The quality of the map depends on the resolution of the diffraction data, which in turn depends on how well-ordered the crystals are. The resolution is measured in Å (angstrom) units; the smaller this number is, the higher the resolution and therefore the greater the amount of detail that can be seen. 2
NMR (Nuclear Magnetic Resonance) This method uses the magnetic properties of atomic nuclei. This technique can be exploited to give information on the distances between atoms in a molecule, using atomic nuclei, such as 1 H 13 C, 15 N, and 31 P that have a magnetic moment or spin. X-ray Crystallography vs. NMR X-ray Crystallography 1. large structures possible (e.g. the ribosome) 2. crystallization parameters difficult to define, largely empirical 3. crystals hard to get, major bottleneck of method 4. proteins are packed in crystal NMR 1. only small structures (<~300 aa) 2. proteins are in solution Protein Classification Parameters: structural and sequence similarity Might get same 3D structure from very different sequences and species, and might get very different structures from highly similar sequences. Main structural classes: Class α: Bundle of α helices connected by loops on the protein surface Class β: Antiparallel β sheets Class α/β: Mixed helices and sheets Class α + β: Segregated helices and sheets Multidomain: Domains that fall into several categories 3-D structures determined via X-ray crystallography and NMR and deposited in Brookhaven Databank as a PDB entry http://www.rcsb.org/pdb/ 3
Protein Structure Prediction: Some proteins can be completely denatured and then renatured all of the information needed for proper folding is in the amino acid sequence. Forms of Secondary Structure Alpha helices, beta sheets highly constrained in space Loops connect regions of defined structure; less constrained so more substitutions and deletions can occur Coils catch-all term for regions that don t fit the above categories Prediction of Secondary Structure Assumption: There is a correlation between amino acid sequence and secondary structure A given short sequence is more likely to form one type of structure than another Chou-Fasman/GOR Method Chou-Fasman: Calculated frequencies of each amino acid in each type of secondary structure (helix, sheet, and loop) and use as predictive probabilities for novel sequences. Combine these frequencies to calculate probability that window of amino acids forms each type of structure; if probability is above threshold try to extend into larger regions of structure. GOR: Similar approach, but used 17-AA windows for probabilities and frequencies. Likely regions of structure determined using information theory and conditional probabilities. 4
Patterns of Hydrophobic Amino Acids Helices on protein surface have hydrophobic amino acids facing the core and hydrophilic facing the exterior, giving a periodic 2:1 ratio of hydrophilic to hydrophobic residues. Characteristic patterns are also observed in other well-known structures like leucine zippers, supercoils, intermembrane proteins, etc. Neural Network Models Computer programs are trained to recognize amino acid patterns that are located in known secondary structures and to distinguish these patterns from other patterns not located in these structures. Weights of units at each layer are adjusted until input yields optimal output given training set data. This is the most sophisticated method currently used, and is theoretically able to extract the most information out of the data. Nearest-Neighbor Methods Also uses machine learning: predicts the secondary structural conformation of an amino acid in the query sequence by identifying similar sequences of known structures. This is done using moving windows that slide along the query sequence; sequence within the window is compared to that in windows of known structures. Analysis of 3-D Structures: Compare sequences of known structures to identify other proteins that might have similar structural features. Multiple-sequence-alignments to identify motifs. Threading: Sequence of amino acids in a protein of unknown structure is tested for its ability to fit into a known 3-D structure. The size and chemistry of each amino acid s R group, and proximity to other R groups, are used as parameters for goodness of fit. Alignments of two sequences via regions of secondary structure: o Dynamic programming o Distance matrix o Fast alignment using similarities of α helices and β sheets e.g., VAST, SARF Significance of alignment: The significance is determined in a way analogous to BLAST s E-values. The number of superimposed secondary structural elements found when comparing two structures is contrasted with the number found if comparing random structures of the same size. Acknowledgement: This handout contained material written by Doug Selinger 5