Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008
1. Preface When faced with an unknown protein, scientists have a multitude of methods to use in trying to determine its structure and function. Coarsely classified, there are three types of methods. Firstly, there is experimental determination, where a protein is purified from cells and then subjected to one or more experiments, ranging from rather inefficient in vitro studies of enzyme kinetics to exact spectroscopic techniques. In addition, several computational approaches assist in the work. Comparison of three-dimensional structure as well as amino acid and DNA sequence between different proteins give clues of protein function and structure. Ab initio computation, on the other hand, tries to determine the final shape of the protein from either its amino acid sequence or the coding DNA only. This work briefly introduces methods used in protein structure analysis, ranging from spectrometry to protein structure comparison and computation. 2. Experimental determination of protein structure 2.1 Gel electrophoresis Gel electrophoresis is a method used for separating different proteins from a mixture. The proteins can be separated both by their intrinsic charges and their molecular masses. When separating proteins by their molecular weight, the proteins, in a negatively charged solvent, are placed on a polyacrylamid plate which allows the proteins to slowly diffuse through it. The solvent allows the protein to unfold and often reducing agents are used to separate covalently bonded proteins from each other and attached lipids and carbohydrates. When a homogeneous electric field is applied on the plate, the proteins will start slowly moving towards the positive charge. The heavier the chain the slower it will move and thus proteins of different masses form distinct bands on the plate. Illustration 1 shows DNA fragments separated by gel electrophoresis. Illustration 1: DNA fragments separated by gel electrophoresis. Molecules can also be separated by their charges. Then a different solvent is used and a ph gradient is applied over the polyacrylamid plate. In the presence of an electric field, the proteins will move. The ph affects the charge of the proteins and at some point on the plate the protein may reach its isoelectric point where the acidity or basicity negates the charge of the protein and it will stop. Since proteins have different isoelectric points depending on their charges, the ph gradient will separate different proteins. The two methods described can also be combined to give two-
dimensional protein maps. [2] 2.2 Mass spectrometry Mass spectrometry is used to evaluate the exact molecular mass of proteins or peptide fragments. The unknown protein is often cleaved with a protease and the masses of the fragments can be used in searching information from a database. This method, known as peptide mass fingerprinting or PMG is valuable because all proteins of the same kind cleave into a similar set of peptides (provided that the same protease is used) but no two proteins have the same peptide fingerprint. In mass spectrometry, the sample is fragmented and ionized, because the samples must have a similar, non-zero charge for the method to work. The ions, in gas phase now, are accelerated in electric field and then separated by their masses by an analyzer. Often the separation is achieved by applying a magnetic field that bends their path. Ions of different masses will then hit different parts of a detector, which calculates the relative masses of all the particles that have hit a single area on the detector. The advantage of mass spectrometry is that it only requires a very small sample. However, it only reveals the molecular mass of the sample. From the mass it is possible to calculate how many atoms of different elements a molecule of the sample contains, but the method does not give any clue as to how they are spatially arranged. Still, mass spectrometry is widely used and a helpful tool. [2] [3] 2.3 X-ray crystallography X-ray crystallography is a very effective way of determining the exact three-dimensional structure of a protein. The wavelength of x-rays is close to the distance between atoms in a molecule. Thus, from the way x-rays scatter when passing through a crystal it is possible though rather difficult to calculate the three-dimensional shape of the molecules forming the crystal. The drawback of the system is that a substantial amount of the protein is needed. In addition, crystallizing the protein is often tremendously difficult. Still, a majority of known protein structures have been determined using x-ray crystallography. [1] 2.4 Nuclear magnetic resonance NMR (short for nuclear magnetic resonance) techniques are based on the magnetic properties of certain atoms, in organic and biochemistry mostly the isotopes 1 H and 13 C. When a strong magnetic field is applied, some of the atoms align themselves against the field. When the field is removed, the potential energy stored in these atoms is released as radiation. The frequency of the radiation depends on the environment the atom is in. Thus, the measurements can be used to determine, for example, the functional groups of the molecule and the number and position of the hydrogen atoms in the molecule. [4] Illustration 2 shows the 1 H NMR spectrum of a simple organic compound. The peaks respond to different hydrogen atoms in the molecule and the relative heights of the peak tell how many hydrogen atoms of the same kind there are in the molecule.
Illustration 2: The proton NMR spectrum of vanillin. 3. Computational methods in protein structure analysis 3.1. Structure similarity Analyzing the function of a protein using biochemical methods only can be rather time-consuming. Instead, when the structure of a protein has been determined it is possible to compare it threedimensional structure to the structures of known proteins. Similarity in protein structure may imply similarities in function or evolutionary origin. There are different methods and algorithms for assessing similarity. In sequence alignment, the amino acid sequences of proteins are compared using different algorithms. It is also possible to contrast the coordinates of the amino acid residues in three-dimensional space. For example, the root-mean square distance of the residues between different proteins can be used as a measure of similarity. [1] 3.1.1 Sequence alignment In sequence alignment either the amino acid or coding DNA sequences of two or more proteins are compared to find possible similarities in shape and thus in function, origin or both. There are different algorithms used in sequence alignment ranging from formal optimization techniques to often more efficient probabilistic methods. The comparison of two proteins can be performed either on their amino acid sequences or the DNA
coding the protein. Amino acid sequence comparison is usually more informative, because DNA tolerates more changes without change to the amino acids and, in addition, different amino acids can perform similar duties. However, DNA sequence analysis is often faster and easier, because it does not require the often laborious isolation and purification processes needed when the exact amino acid sequence of a protein is determined. 2.6 Structure prediction ab initio Often there are no known homologs to a protein. In such case, the amino acid sequence of the protein can be used to compute an estimate of its three-dimensional structure. Current methods most often have a very low resolution and do not give generally reliable results. They also require a large amount of computer time. A general approach to model the structure of a protein is based on minimizing the global energy of the protein molecule. The conformation that has the lowest energy is most often the natural shape the molecule will adopt. The problem with the approach is that the number of different protein folds - and thus of the energy values is often too large to go through, even for very small proteins. The field of computational structural biology evolves fast, though, and ab initio prediction of protein structure promises to grow more important and precise ad methods and computers evolve. Today, while complete protein structures are beyond the capacity of computation, several structural features of proteins can be modeled quite accurately. For example, secondary structures can very often be concluded from the amino acid sequence. Another example of a predictable important feature are phospholipid layer spanning domains of proteins [1]. 3. References [1] Pevsner, J. 2003. Bioinformatics and Functional Genomics. John Wiley & Sons Inc. [2] Alberts et al. 2002. Molecular Biology of the Cell. Garland Science. [3] Mansfield, M., O'Sullivan, C. 1999. Understanding physics. John Wiley & Sons, Praxis Publishing. [4] Clayden et al. 2001. Organic chemistry. Oxford University Press.