Direct Method. Very few protein diffraction data meet the 2nd condition

Direct Method Two conditions: -atoms in the structure are equal-weighted -resolution of data are higher than the distance between the atoms in the structure Very few protein diffraction data meet the 2nd condition Heavy atoms in protein => sub-structure of heavy atoms in a derivative F H = F PH - F P

Direct Method 1. Determine the substructure of heavy atoms 2. Determine overall protein structure at 1.2 Å 3. Programs: - Shake and bake (SnB) - SHELXD 4. Phases refined by Sharp

Large substructure solved mainly by direct method http://www.hwi.buffalo.edu/substructure/viewdb.htm The number of heavy atoms (anomalous scatters) > 20 Mainly apply to protein crystals contain many Se-Met and halide soaking

Molecular Replacement 1. A homologous model (structure) available 2. A structure determined by putting the model in proper orientation and precise position in the target unit cell. - rotation search (rotation matrix) - translation search (translation vector) 3. New structure X B X B = [C] X A + d 4. Amore, Molrep, CNS, Phaser 5. Over 50% protein structures solved by MR

Solvent Content Mathew coefficient (V M ) V O V M = Z x n x MW V M = 1.7~ 3.5 V O Unit cell volume Z number of au in a unit cell M W molecular weight of protein n number of protein molecules in the au Protein content: V protein = 1.23/V M, 1.23 = density of protein crystals Solvent content: V sol = 1-1.23/V M When V M = 2.5 Å 3 /Da, V sol = 0.51 =>Most common Vm and Vsol -Solvent fractions of 0.3-0.7 are common. -Crystals with high solvent contents general diffract poorly and are fragile. -But high solvent content is a great advantage in phase improvement by density modification.

Fourier transforms Electron density ρ (x,y,z) = 1/V F hkl exp (-2πi(hX+kY+lZ)) h k l F hkl = F hkl exp(i 2π(hxj+kyj+lzj) ) Structure factor ρ (x) FT - FT F(h)

Density modification We use current phases (model) to calculate a density map. Then we modify that map to make it conform better to some idea about what an electron density map should look like. The new and better map is then back-transformed to the calculate structure factors, which should have more accurate phases than original map. By iterating, we are getting a map that does not change anymore and should be closer to the true map

Solvent flattening/flipping Density in protein and solvent regions: Solvent flattening: -replacing all the density values within the solvent region with the average value throughout the solvent region. ρ out (x) = (ρ in (x) - ρ sol ) * µ (x) + ρ sol Solvent flipping: - modified solvent flattening to remove biases from original map

NCS averaging NCS copies in AU The NCS copies in AU should have same electron density The difference in a noisy density map between NCS copies caused by random errors. Non-crystallographic symmetry: symmetry relations among identical copies in AU A new map made by averaging the copies of density related by non-crystallographic symmetry should be more accurate, since the noise is averaged out.

Model Building Putting blocks of protein structures into electron density Mainly use interactive computer graphic programs: - O, XtalView, Coot Automatic model-building programs: - ARP/wARP, RESOLVE, MAID

Map and Model 6.0 Å map

Map and Model 1.0 Å map

Fourier Method F F(h) = F exp (iα c ) -F is the true structure factor but only F measured -Fc is from an initial model or a molecular replacement search model -F(h) is model-phased structured factor closer to the true F than Fc was -So the map will the map will have the features of the true structure -Used to solve ligand- soaked structures

Map from Fourier Method o Model-phased map

Difference Map F(h) = ( F - F C ) exp (iα c ) F o - F c map The difference map highlights the difference of true structure and the model -positive density (blue)indicates atoms should be added -negative density (red) indicates the atoms should be moved elsewhere (or removed)

Refinement of Model Now we have 1) X-ray diffraction intensities (h, k, l) from data 2) x, y, z positions of atoms in unit cell from models 3) known scattering factors of atoms Let s adjust the model to find a closer agreement between the calculated and observed structure factors - minimize the difference below: Σ ( F cal - F obs ) 2 hkl - correct the errors in the initial atomic model

Thermal Motion - atoms undergo motion in crystal - not exactly fixed at x j, y j, z j - B factor B j = 8π 2 u j 2 B j is a measure of motion u j is degree of vibration B j = 80 Å 2, u j = 1.00Å B j = 20 Å 2, u j = 0.5Å atoms F (h,k,l) = Σ f (j) exp [2π * i(hx (j) + ky (j) +lz (j) ]* exp [-B j * (sinθ/λ) 2 ] j=1

Resolution and Refinement Resolution Observations/parameters 3.5 Å 0.5 3.0 Å 0.8 2.5 Å 1.4 2.0 Å 2.8 1.5 Å 6.2 -For a protein crystal with a typical packing density, and 4 parameters (x, y, z, and B) per atom (non-h). -At resolutions < 1.0, the ratio of observations to parameters is low and the refinement is poor over-determined.

Resolution and Structure 1.0 Å 2.5 Å 3.0 Å 4.0 Å Cambridge course http://www-structmed.cimr.cam.ac.uk/course/fitting/fittingtalk.html

Restraints and Constraints Additional observations are incorporated in the refinement -Stereochemical data from small molecular structures e.g. bond lengths and angles, etc Constraints The stereo data taken as rigid and only dihedral angles varied in models -effectively reduce the # of parameters Restraints The stereo data allowed to vary around a standard value and controlled by an energy term E = E chem + w E xray E xay = Σ( F cal (h) - k F obs (h) ) 2 h -define the difference of models to x-ray data E chem = Σ (M ideal M model ) M bond lengths and angles, torsion angles and van der Waals contacts, etc

R Factor Difference between F obs and F calc R factor = Σ F obs - F calc hkl Σ F obs hkl Quality of Model R factor = 0.00 perfect fit 0.20 good fit 0.60 random fit

Subjectivity and Overfitting Subjectivity: misinterpret density map Overfitting: lower R-factors without removing errors in the model Protein crystals usually could not diffract to atomic resolution, which provide a room for above two error-inducing phenomena Overfit the diffraction data by introduction of too many adjustable parameters. e.g., too many water molecules are fitted to the diffraction data, which compensates for errors in the model or the data. Certain subtle errors introduced by overfitting can produce a low R factor.

Validation and Model Evaluation Ramachandran plot Bond lengths/angles Homolog structure comparison Independent structural solutions

Ramachandran Plot

R free and Cross-validation R free = hkl T Σ F obs - F calc hkl T Σ F obs Difference between F obs and F calc for the test data hkl T: All the reflections belong to test set, random selection of ~5% of the observed reflections which never used in refinement. Every observation contains information from all the atoms in a structure.

Structure Quality What to look for 1) R free near 0.25 2) Resolution (better than 3.5 Å) 3) Completeness ~ 95% 4) Ramachandran plot 5) RMSD of bond lengths and angles from ideal values (<0.02 Å, <2.0 o )

Refine and Rebuild Model - Check the correctness of the model - 2Fo-Fc and Fo-Fc maps (re-building) - database in O, coot, XtalView - Refine the updated model against diffraction data - use R free as monitor - Evaluate the structure - torsion angles by Ramachandran Plot - geometry, bond lengths and angles - Iterate the three steps until satisfaction