Protein Crystallography Part II Tim Grüne Dept. of Structural Chemistry Prof. G. Sheldrick University of Göttingen http://shelx.uni-ac.gwdg.de tg@shelx.uni-ac.gwdg.de
Overview The Reciprocal Lattice The Ewald Sphere Data Processing and Scaling The Phase Problem SAD, MAD, MIR, RIP, et al. Molecular Biology 1 Protein Crystallography II
Amplitudes and Phases The electron density can be calculated from the structure factors via the Fourier transformation. ρ(x, y, z) = = 1 V unitcell h,k,l 1 V unitcell h,k,l F (h, k, l) e 2πi(hx+ky+lz) F (h, k, l) e iφ e 2πi(hx+ky+lz) This is easily done by a computer. The equation, however, contains two unknown quantities, amplitude F (h, k, l) and phase φ of the reflections. They must be known before anything can be computed. The first half of this talk deals with how to extract the first part, the amplitude, from diffraction experiments. The second half is concerned with how to retrieve the phases. Molecular Biology 2 Protein Crystallography II
The Reciprocal Lattice The reciprocal lattice is an important concept in crystallography. It is created by three reciprocal lattice vectors, a, b, and c, derived from the real space vectors a, b, and c. c b A c = A/V unitcell a For an orthorhombic space group (all angles 90 ), the reciprocal vectors are parallel to the real space vectors, with different lengths. For general space groups the vectors are not parallel to those of the real space unit cell. But in any case, the volume of the reciprocal cell is the inverse of the real space cell, V = 1/V The point group (symmetry without translations) is the same for the real and the reciprocal lattice. Therefore many symmetry related questions for crystals also apply to their reciprocal lattice. Molecular Biology 3 Protein Crystallography II
The Ewald Sphere It is important to collect as complete data as possible, i.e., to record nearly all reflections up to the resolution limit (which is often due to the crystal quality). In order to understand which reflections are collected, one can look at the Ewald Sphere. It is constructed in reciprocal space. incident beam 2θ origin When a lattice point crosses the Ewald sphere, a reflection occurs in the direction determined by the centre of the sphere and the point of intersection. The angle 2θ is the same as the direction recorded on the detector, even though the Ewald sphere is constructed in reciprocal space. r = 1/λ Molecular Biology 4 Protein Crystallography II
Limits of Data Collection During data collection the crystal is rotated about an axis. The reciprocal lattice then rotates about the same axis. All lattice points that pass through the Ewald sphere during rotation are collected. Apart from the resolution limit (radius of the sphere, but more likely the quality of the crystal), two parts of reciprocal space cannot be collected: rotation axis The grey shaded zone can be minimised by changing the direction of the rotation axis with respect of the incident beam direction. camera limit r = 1/λ Molecular Biology 5 Protein Crystallography II
Data Collection 0 1 1 2 179 180 X-ray beam rotation Typical frame widths range from 0.2 1. For a 180 scan, this gives 180 720 images. This is typical for proteins that diffract to moderate resolution. A more thorough data collection rotates the crystal about two axes. One easily ends up with a few thousand image. Molecular Biology 6 Protein Crystallography II
Data Processing/ Integration Data collection results in a list of images, each representing a wedge of the rotation of the crystal in the beam. The images are distorted sections of reciprocal space. Data integration has to reconstitute the original, undistorted lattice in 3 dimensions. It provides a (long) list with one line per reflection: det. coord s H K L Intensity error x y z[ ] -3 0-3 4.162E+03 1.537E+02 1181.5 1235.6 107.4-3 -3 0 2.747E+03-1.075E+02 1110.9 1205.1 76.0-3 0 3 3.946E+03 1.451E+02 1156.2 1233.4 18.3 1 1-4 5.933E+03-2.139E+02 1215.0 1226.7 165.0 4 1-1 5.640E+03-2.064E+02 1209.5 1074.0 57.3 Molecular Biology 7 Protein Crystallography II
Data Integration Flow Chart Molecular Biology 8 Protein Crystallography II
Scaling I Calculation of the electron density is based on an ideal crystal: infinitely large, perfect unit cell, but also perfect data collection. This is quite far from reality. Different regions of the detector have different sensibility Beam instability: one some frames the total intensity can be higher than on others this refers especially to synchrotrons The crystal is not perfectly centred in the beam Data may even be collected from several crystals Molecular Biology 9 Protein Crystallography II
Scaling II The (experimental) differences in intensities necessitates the scaling of the data: All reflections must be put on a common scale. To do so, one takes symmetry related reflections into account: Reflections that are related by one of the symmetry operators of the crystal s space group must have equal intensities. Even in the simplest space group (P1) with no symmetries, scaling can be carried out because of Friedel s law: Reflections with negated indices, i.e., (h, k, l) and ( h, k, l) have the same intensity. That is because they are reflected from the same set of planes, but on opposite sides. Molecular Biology 10 Protein Crystallography II
The Phase Problem ρ(x, y, z) = 1 V unitcell h,k,l= h,k,l= F (h, k, l) e iφ(h,k,l) e 2πi(hx+ky+ly) to calculate electron density gives F (h, k, l), but not φ(h, k, l) The structure factor, from which we could calculate the electron density distribution of the crystal, is a complex quantity. It has an amplitude and a phase. Only the amplitude, but not the phase can be determined directly from a diffraction experiment. This loss can be compared with a projection on a plane wall: The eye may see a three dimensional object but which face points forward? This problem is known as the phase problem of crystallography. Molecular Biology 11 Protein Crystallography II
The Importance of the Phase Unfortunately, the phase of the structure factor contains the main information about the shape of the molecule. F (h, k, l), φ(h, k, l) inverse FT φ(h, k, l) FT inverse FT F (h, k, l) The phase φ of the duck determines the picture F (h, k, l), φ(h, k, l) pictures from http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html Molecular Biology 12 Protein Crystallography II
Techniques for retrieving the Phases Overview One of the major efforts of macromolecular crystallography lies in determining good phases. The following are the most frequently used techniques: 1. direct methods (small molecules and high resolution only) 2. molecular replacement 3. isomorphous replacement 4. anomalous dispersion 5. exploitation of radiation damage Molecular Biology 13 Protein Crystallography II
Direct methods With small molecules ( <1000 unique atoms) and high resolution ( > 1.2Å), one can manage to find the structure from random starting phases. The starting phases are optimised using the assumption that the structure consists of resolved atoms. This assumption imposes statistical restraints on the phase probability distribution. Very small structures can also be solved by interpreting the Patterson function. This is a Fourier transform based on intensities rather than structure factors, i.e., it can be calculated from experimental data. The Patterson function has the property that a vector to a peak is also a vector connecting two atoms in the structure. For too many atoms, the peaks of the Patterson function come to close to be interpreted. Molecular Biology 14 Protein Crystallography II
Molecular Replacement By November 2004, the PDB, the Protein Data Bank(http://www.rcsb.org/pdb), held more than 28,000 structures, both from X-ray crystallography and NMR. Less and less of newly deposited structures reveal a new fold. Sequence homology between two proteins normally also implies structural similarity, and therefore chances are good that a new structure is similar to an already determined one. One search the unit cell with a structure or a fragment of a known structure for the correct orientation and position. These co-ordinates can then be used to calculate first phases for the experimental data. The search is done in two steps: Rotational search The Patterson function can be calculated both from the diffraction data and the search model. It does not depend on the position within the unit cell, but only on the orientation. Hence, we can calculate the Patterson for the model in different orientations, compare it with the Patterson of the data, and pick the orientation with the best agreement. Translational search The model is moved through the asymmetric unit keeping the orientation found at the rotational search. At each point, the calculated structure factor amplitudes F c are scored against the experimental data. Problems: strong model bias (phases!), may sometimes not work even with 100% sequence homology (domain movements). Molecular Biology 15 Protein Crystallography II
Isomorphous Replacement Isomorphous replacement is based on the idea that introduction of a small molecule into a protein or nucleic acid crystal does not or hardly alter the structure of the macromolecule. On the other hand, a few heavy metal atoms can contribute detectably to the structure factors and hence introduce changes in the reflection intensities. Common heavy metals are Hg (80e ), Pb (82e ), Au (79e ), Pt (78e ), or U (92e ). They can be incorporated by co-crystallisation or by soaking after the crystals have grown. The first protein structures like myoglobin or hemoglobin were solved by isomorphous replacement. G. Sheldrick Molecular Biology 16 Protein Crystallography II
Isomorphous Replacement In order to use the extra information, one needs at least two data sets: a native one (no heavy metal) and a derivative (with heavy metal). derivative: F T difference Harkerco-ordinates construction F H, φ H F T, φ T native: F P The co-ordinates of the heavy metal(s) can be derived via either direct methods or Patterson methods. From the co-ordinates one can calculate structure factors (amplitude and phase!). The phases for the derivative follow from the Harker construction. Molecular Biology 17 Protein Crystallography II
The Harker Construction With a single derivative, the Harker construction provides phases for the protein structure up to a twofold ambiguity: 1. Draw a circle with radius F T 2. Draw the vector for the heavy atom, F H, φ H 3. From its endpoint, draw a circle with radius F P The two circles have two points of intersection from which one reads the two possible phases φ T for the derivative or ( drawing the vector from the endpoint of the heavy atom) the native structure φ P. F T F T, φ T F H, φ H F P With only one derivative, one speaks of SIR, single isomorphous replacement, with more than one, one speaks of MIR, multiple isomorphous replacement. MIR removes the ambiguity of SIR. The more derivatives, the better the phases (and their errors) can be determined. Molecular Biology 18 Protein Crystallography II
Anomalous Dispersion For a normal diffraction experiment, Friedel s law is valid, which states that the intensities of the reflection (h, k, l) and ( h, k, l) are equal and that the phases of the underlying structure factor have opposite signs, φ(h, k, l) = φ( h, k, l). For heavy atoms, the wavelength of X-rays lies in a region where this is no longer true under all circumstances. This effect is due to absorption of these atoms at specific wavelengths. This wavelength is different for every type of atom and normally has to be determined before data collection by a fluorescence scan (scattering of X-rays at right angle to the incident beam). The difference in intensities can be exploited by a Harker construction similar to isomorphous replacement, but with F T and F P replaced with F (h, k, l) and F ( h, k, l). With this SAD (single-wavelength anomalous dispersion) approach, the two-fold ambiguity for the phases remains. Molecular Biology 19 Protein Crystallography II
SIRAS and MAD phasing To overcome the twofold phase ambiguity, two methods can be applied: 1. SIRAS Often a native crystal or dataset is available, when SAD data are collected. This leads to the combination of SIR and SAD SIRAS. SIR from the comparison of native to derivative, SAD from the derivative 2. MAD Instead of changing crystals, one can change the wavelength: the strength of anomalous signal varies with the wavelength. This results in multi-wavelength anomalous dispersion or MAD Molecular Biology 20 Protein Crystallography II
Some exotic experimental techniques RIP Radiation Induced Phasing makes use of the fact that radiation forms radicals. They damage the molecule, and apart from random destruction, carboxyl-groups are removed and disulphides destroyed. For RIP, a normal data set is collected ( native ), then the crystal is exposed to a high dose of X-rays, then a second set ( derivative ) is collected. Sulphur SAD exploitation of the very weak signal of native S (or P for nucleic acid structures). Halide soaking Iodide SIRAS or bromide MAD after a quick soak (10 30s) in 1M KI or NaBr. Molecular Biology 21 Protein Crystallography II
Example Phases (Initial) centroid phases Resolved twofold ambiguity Final (refined) phases Molecular Biology 22 Protein Crystallography II