X-Ray structure analysis Kay Diederichs kay.diederichs@uni-konstanz.de
Analysis of what? Proteins ( /ˈproʊˌtiːnz/ or /ˈproʊti.ɨnz/) are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids. (Wikipedia) 2
What is Macromolecular Crystallography? The art of getting a protein to sit still then taking a 3D picture What are protein crystals? Static, well-ordered arrays of protein molecules 3D-pictures are stitched together from 2D ones. How are the 2D-pictures made? By irradiating the ordered protein array with X-rays, collecting the constructively diffracted X-rays, and reconstructing a likely model of the protein s 3D structure using a computer 3 3
Overview History and current status Practical aspects Theory Comparison with other techniques 4
Absorption of X-rays Wilhelm Conrad Röntgen (1845-1923) January 23, 1896 5
Diffraction of X-rays Max von Laue (1879-1960) Paul Peter Ewald (1888-1985) X-ray diffraction of crystals (1912) theoretical explanation (1912) William Henry Bragg (1862 1942) William Lawrence Bragg (1890-1971) Bragg's equation: nλ = 2d sin Θ (Nobel Prize, 1915) 6
X-ray diffraction patterns of DNA Rosalind Franklin and Maurice Wilkins (1953) The central cross shaped pattern as indicative of a helical structure. The heavy dark patterns (left and right) indicate that the bases are stacked perpendicular to the axis of the molecule. http://www.pbs.org/wgbh/nova/photo51/ 7
DNA Structure: History 8
Myo- and haemoglobin models at 5.5 Å resolution (1959) sausage Balsa wood http://www2.mrc-lmb.cam.ac.uk/about-lmb/archive-service/models-and-artefacts 9
2 Å Myoglobin model built by A. A. Barker, Model Maker in Cambridge (UK), 1960 10 www.umass.edu/molvis/francoeur/barker/barker.html
11
So far, 29 Nobel Prizes are associated with crystallography For a list, see http://www.iucr.org/people/nobel-prize Either for physical basis or mathematical treatment ( Physics ) or important chemical compounds ( Chemistry ) or Physiology and Medicine (DNA; Crick, Watson, Wilkins 1962) Most recently: (2013 Karplus, Levitt & Warshel); 2012 Lefkowitz & Kubilka; 2011 Shechtman: Quasicrystals; 2009 Ramakrishnan, Steitz, Yonath: Studies of the structure and function of the ribosome 14 of the 29 were awarded in Structural Biology (starting in 1946) See http://www.ebi.ac.uk/pdbe/docs/nobel/ 12
Examples of high-profile structures Protein translocation through the SecA SecY complex Ribosome with mrna Structure of Ebola virus 13
14
From Protein Data Bank (PDB) file Crystal Structure at 1.9 Å Resolution of HIV II Protease J.Biol.Chem. v269 pp.26344-26348, 1994 15 15
HEADER COMPND COMPND SOURCE SOURCE EXPDTA REMARK REMARK REMARK REMARK REMARK SEQRES HYDROLASE (ACID PROTEINASE) 31-MAR-95 2 MOLECULE: HIV-1 PROTEASE; 3 CHAIN: A, B; 2 ORGANISM_SCIENTIFIC: HUMAN IMMUNODEFICIENCY VIRUS TYPE 1; 3 GENE: HIV-1 PROTEASE FROM THE NY5 ISOLATE; X-RAY DIFFRACTION 2 2 RESOLUTION. 2.0 ANGSTROMS. 3 R VALUE 0.166 3 RMSD BOND DISTANCES 0.017 ANGSTROMS 3 RMSD BOND ANGLES 1.9 DEGREES 1 A 99 PRO GLN ILE THR LEU TRP GLN ARG PRO LEU VAL THR ILE 1 N PRO A 1 29.361 39.686 5.862 1.00 38.10 2 CA PRO A 1 30.307 38.663 5.319 1.00 40.62 3 C PRO A 1 29.760 38.071 4.022 1.00 42.64 4 O PRO A 1 28.600 38.302 3.676 1.00 43.40 5 CB PRO A 1 30.508 37.541 6.342 1.00 37.87 6 CG PRO A 1 29.296 37.591 7.162 1.00 38.40 7 CD PRO A 1 28.778 39.015 7.019 1.00 38.74 8 N GLN A 2 30.607 37.334 3.305 1.00 41.76 9 CA GLN A 2 30.158 36.492 2.199 1.00 41.30 10 C GLN A 2 30.298 35.041 2.643 1.00 41.38 11 O GLN A 2 31.401 34.494 2.763 1.00 43.09 12 CB GLN A 2 30.970 36.738 0.926 1.00 40.81 13 CG GLN A 2 30.625 35.783-0.201 1.00 46.61 14 CD GLN A 2 31.184 36.217-1.549 1.00 50.36 15 OE1 GLN A 2 32.006 35.518-2.156 1.00 53.89 16 NE2 GLN A 2 30.684 37.339-2.061 1.00 51.46 17 N ILE A 3 29.160 34.436 2.919 1.00 37.80 18 CA ILE A 3 29.123 33.098 3.397 1.00 34.13 19 C ILE A 3 28.968 32.155 2.198 1.00 33.19 20 O ILE A 3 28.088 32.330 1.368 1.00 32.74 2 4 5 10 11 13 25 26 31 32 33 62 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 16 16
Thermus thermophilus 70S ribosome PDB id 2WDI: 32 chains; 90700 atoms 890.000 reflections, 3.3 Å resolution Voorhees et al (2009) Nature Structural Molecular Biology 16, 528 17
Structure Determination Phases (hkl) Crystal h, k, l, I, (I) (x,y,z) Structure 18
First steps of X-ray structure analysis: Choice of protein/organism/expression system Expression and purification Crystallization (http://hamptonresearch.com) 19
Crystals R32 & R3 P321 C2 20
Synchrotron Radiation Synchrotron Radiation occurs when a charge moves at relativistic speed following a curved trajectory. 1. high brilliance 1. large spectral range 2. time structure 21
Data collection: Swiss Light Source Paul-Scherrer-Institut (PSI), Villigen (CH) 22
Diffraction Data Collection The data are 3-dimensional the crystal has to be rotated through a large angular range, and for each orientation a diffraction image is recorded on the detector. The symmetry of the diffraction pattern means that depending on the space group, e.g. 90 rotation suffice. 23
Diffraction Data Collection 2 pieces of information h, k, l Miller indices I(h,k,l) intensity I(hkl) std dev of I(h,k,l) 24
The measured intensity (and the accuracy of its measurement) are influenced by: - Crystal quality - Poisson (counting) statistics - Beam strength and quality; exposure time - Radiation damage - Beamline setup and qualitymurakami et al., Nature (2002) 25
Ewald sphere: Bragg's eqn in 3D 26
27
28
Theory: the electromagnetic wave... can be mathematically described by Maxwell s equations (1864): 29
What does this mean? Visualization is possible with Radiation2D see T. Shintake, New Mathematical Method for Radiation Field of Moving Charge, Proc. EPAC (2002) http://accelconf.web.cern.ch/accelconf/e02/papers/wepri038.pdf, download of binary from http://www-xfel.spring8.or.jp for Linux, Mac, Windows 30
Diffraction maths Superposition of all waves emanating from all electrons of an object results in a diffraction image Mathematical description of wave from x,y,z is f*e-2 i(hx+ky+lz) Mathematically, addition of waves is a Fourier transform (array of complex numbers that is 1:1 related to the electrons of the object) The amplitude of the Fourier transform can be measured by a detector Its phase cannot be measured ( Phase Problem ) but is required to calculate the electron density A regularly ordered (i.e. crystalline) sample has a diffraction image consisting of regularly spaced reflections that are characterized by their position and intensity on the detector All electrons of the object contribute to all reflections! 31
The Structure Factor Equation F(hkl) = F(hkl) ei (hkl) = j fj e2 i(hxj+kyj+lzj) Structure factor amplitude F(hkl) I(hkl)1/2 Atomic form factor fj Phase (hkl) Complex plane The calculation of F(hkl) from a structure (xj,yj,zj) is just a summation of the waves originating from each atom (j) in the direction defined by (hkl). 32
The Electron Density Equation (x,y,z) = 1/V hkl F(hkl) ei (hkl) e-2 i(hx+ky+lz) Structure factor amplitude F(hkl) I(hkl)1/2 Phase (hkl) Mathematically inclined people will notice: this is just the Fourier back transform! 33
The Electron Density Equation The electron density (x,y,z) is a three-dimensional function (with the unit e/å3), which describes where in the unit cell of the crystal the electrons (and therefore the atoms) are. It is basically the image of the structure we want to determine. (x,y,z) = 1/V hkl F(hkl) ei (hkl) e-2 i(hx+ky+lz) It is important to note that every reflection (hkl) of the diffraction pattern contributes to the electron density at each and every position (xyz) in the unit cell of the crystal. 34
Interactive tutorials http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html http://www.ysbl.york.ac.uk/~cowtan/sfapplet/sfintro.html 35
Detectors do not measure amplitudes! they measure deposited energy the energy is ~ amplitude 2 thus, detectors don't measure phase because amplitude * ei (hkl) = amplitude 36
Data processing Indexing Integration (=summation) Space group determination Scaling => alle h,k,l,i(hkl),σ(ihkl) PDB depositions 5000 4500 4000 3500 XDS DENZO or HKL MOSFLM 3000 2500 2000 1500 1000 37 500 0 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
The Phase Problem From the diffraction pattern, we can only obtain the intensities I(hkl) of the reflections (hkl). Intensities are the squares of the (complex) amplitudes: I(hkl) ~ F(hkl) F*(hkl) = F(hkl) ei (hkl) F(hkl) e-i (hkl) = F(hkl) 2 The phase (hkl) cannot be measured. 38
How to solve the Phase problem / an X-ray structure - Direct Methods: suitable for highest resolution data and few atoms, usually not applicable for macromolecules - Molecular Replacement: obtain a related/similar (= approximately correct) structure from the PDB, orient it correctly in the crystal lattice, identify and remove errors until the atomic model agrees with the experimental data. Not applicable to new/unknown structures. 2/3 of X-ray entries of PDB. - Experimental Phase Determination (MIR/MAD/SAD): modify the scattering of the object, measure intensities again, work out phase from change in intensities. Requires highly accurate measurement of intensities. Always applicable. Other 1/3 of X-ray entries of PDB. 39