Macromolecular Crystallography Part II

Similar documents
Macromolecular Crystallography Part II

Molecular Biology Course 2006 Protein Crystallography Part II

Protein Crystallography Part II

X-ray Crystallography I. James Fraser Macromolecluar Interactions BP204

shelxl: Refinement of Macromolecular Structures from Neutron Data

X-ray Crystallography

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Protein Crystallography

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

Full wwpdb X-ray Structure Validation Report i

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Experimental Phasing with SHELX C/D/E

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Full wwpdb X-ray Structure Validation Report i

Summary of Experimental Protein Structure Determination. Key Elements

Full wwpdb X-ray Structure Validation Report i

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Anisotropy in macromolecular crystal structures. Andrea Thorn July 19 th, 2012

wwpdb X-ray Structure Validation Summary Report

Full wwpdb X-ray Structure Validation Report i

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

Direct Method. Very few protein diffraction data meet the 2nd condition

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Full wwpdb X-ray Structure Validation Report i

SHELXC/D/E. Andrea Thorn

SUPPLEMENTARY INFORMATION

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Molecular Modeling lecture 2

DOCKING TUTORIAL. A. The docking Workflow

Proteins. Central Dogma : DNA RNA protein Amino acid polymers - defined composition & order. Perform nearly all cellular functions Drug Targets

Protein crystallography. Garry Taylor

Section III - Designing Models for 3D Printing

Electron Density at various resolutions, and fitting a model as accurately as possible.

Physiochemical Properties of Residues

CALIFORNIA INSTITUTE OF TECHNOLOGY BECKMAN INSTITUTE X-RAY CRYSTALLOGRAPHY LABORATORY

APPENDIX E. Crystallographic Data for TBA Eu(DO2A)(DPA) Temperature Dependence

Automated Protein Model Building with ARP/wARP

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

Visualization of Macromolecular Structures

Protein Struktur (optional, flexible)

Details of Protein Structure

Report of protein analysis

Molecular Biology Course 2006 Protein Crystallography Part I

Drug targets, Protein Structures and Crystallography

Scientific Integrity: A crystallographic perspective

Preparing a PDB File

NMR, X-ray Diffraction, Protein Structure, and RasMol

Full wwpdb/emdatabank EM Map/Model Validation Report i

11/6/2013. Refinement. Fourier Methods. Fourier Methods. Difference Map. Difference Map Find H s. Difference Map No C 1

Pymol Practial Guide

Garib N Murshudov MRC-LMB, Cambridge

IgE binds asymmetrically to its B cell receptor CD23

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Determination of the Substructure

Peptides And Proteins

Why do We Trust X-ray Crystallography?

Resolution and data formats. Andrea Thorn

Modelling Macromolecules with Coot

HTCondor and macromolecular structure validation

Ultra-high resolution structures in validation

Manipulating Ligands Using Coot. Paul Emsley May 2013

Packing of Secondary Structures

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description. Version Document Published by the wwpdb

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

Let s continue our discussion on the interaction between Fe(III) and 6,7-dihydroxynaphthalene-2- sulfonate.

Introduction to Structure Preparation and Visualization

X-Ray structure analysis

NMR of proteins (and all things regular )

Better Bond Angles in the Protein Data Bank

Protein Structure and Visualisation. Introduction to PDB and PyMOL

3D Visualization of Drugs-Protein Complexes

The structure of Aquifex aeolicus FtsH in the ADP-bound state reveals a C2-symmetric hexamer

Part 8 Working with Nucleic Acids

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Physical Chemistry Analyzing a Crystal Structure and the Diffraction Pattern Virginia B. Pett The College of Wooster

BIOCHEMISTRY Course Outline (Fall, 2011)

Full wwpdb NMR Structure Validation Report i

Model Mélange. Physical Models of Peptides and Proteins

Data File Formats. There are dozens of file formats for chemical data.

Flexibility and Constraints in GOLD

Model and data. An X-ray structure solution requires a model.

Ensemble refinement of protein crystal structures in PHENIX. Tom Burnley Piet Gros

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

CAP 5510 Lecture 3 Protein Structures

BIRKBECK COLLEGE (University of London)

Bioinformatics. Macromolecular structure

Refine & Validate. In the *.res file, be sure to add the following four commands after the UNIT instruction and before any atoms: ACTA CONF WPDB -2

Computational Molecular Modeling

Basics of protein structure

Dictionary of ligands

Transcription:

Molecular Biology Course 2010 Macromolecular Crystallography Part II University of Göttingen Dept. of Structural Chemistry November 2010 http://shelx.uni-ac.gwdg.de tg@shelx.uni-ac.gwdg.de Crystallography 1/73

Interpretation of Crystallographic Data Crystallography Crystallographic Models 2/73

Data & Model Data: 1. Unit cell dimensions (a, b, c, α, β, γ) 2. List of reflections (Miller indices (hkl), intensity I, error estimate σ I each for several thousand reflections) Model: chemically sensible molecule(s) which are most consistent with the data. Crystallography Crystallographic Models 3/73

Electron Density Map: Data + Phases Later we will see why experimental phases (from MAD, SAD, SIR,... ) are not represented by the model. Crystallography Crystallographic Models 4/73

The Role of the Model The electron density map is the actual result of an X-ray experiment. The map is difficult to interpret. It does not tell much about the (bio-)chemistry of the molecule. Crystallography Crystallographic Models 5/73

The Role of the Model The electron density map is the actual result of an X-ray experiment. The model lets the map make a lot more sense. It shows the different atom times and how they are connected, and visualises their interaction (e.g. with ligands). Crystallography Crystallographic Models 6/73

Model & Resolution Low resolution: Helix region of a molecular replacement solution at 3.4 Å. Humps for the side chains can be seen, but not identified. No staircase helix, rather a rod/ cylinder. Crystallography Crystallographic Models 7/73

Model & Resolution Low resolution: Loop or coil region of the same molecule at 3.4 Å. Breaks in density, no density for side chains. Crystallography Crystallographic Models 8/73

Model & Resolution Medium to high resolution: Thermolysin at 1.9 Å. Side chains can be distinguished (one Phering even shows a hole). Single atoms are not visible, but e.g. S in Met shows stronger density than C, N. Crystallography Crystallographic Models 9/73

Model & Resolution Ultra high resolution: Guanine in a 0.95 Å DNA structure. Separate atoms visible - so visible that there is not even connected density for the main chain. Model building would be: place atom - name atom. Crystallography Crystallographic Models 10/73

Model & Resolution In the previous picture, all models looked similar: they are atom positions connected by sticks, which we understand as amino acid residues (or nucleic acids). It depends, however, on the data resolution and overall quality (which are related but not the same!), how much the model can tell us about the molecule it represents. Crystallography Crystallographic Models 11/73

The PDB-file Crystallography The PDB-file 12/73

The PDB-file The PDB-file is the most common format for macromolecular structural information. Its content can be displayed in many ways. ball and stick CPK (space filling) C α trace(smooth) C α trace (B-factor) ball-and-stick (B-factor) ribbons Crystallography The PDB-file 13/73

The Protein Data Base Macromolecular Structural Data from crystallography or NMR are stored at the Protein Data Bank (PDB, www.pdb.org), or the Nucleic Acid Database (NDB, ndbserver.rutgers.edu). The data are stored as PDB-files. Access to the PDB is free. Small molecule data are stored in the Cambridge Data Base (CSD), a commercial product for which a license must be obtained. Small molecule data are stored in CIF-format (which we will not discuss). Crystallography The PDB-File 14/73

The PDB-format The PDB-file is a plain text file which stores the information of the model. HEADER LIGASE 28-APR-99 1CLI TITLE X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDE AUTHOR C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK REMARK 2 RESOLUTION. 2.50 ANGSTROMS.... CRYST1 71.170 211.680 94.450 90.00 90.00 90.00 P 21 21 21 16... ATOM 1 N THR A 5 15.163 80.897 61.279 1.00 20.99 N ATOM 2 CA THR A 5 15.093 82.326 61.723 1.00 22.09 C ATOM 3 C THR A 5 16.450 83.017 61.598 1.00 21.68 C X Y Z occ B-fact... The ATOM lines contain the coordinates and atom types. All other lines contain additional information (publication, resolution, refinement statistics,... ) which are worth reading when working with a PDB-file. Crystallography The PDB-file 15/73

Occupancy and B-factor The electron density map obtained from X-ray data is the average of all unit cells in the crystals. Crystallography can only detect those atoms which are at the same position all over the crystal. Two types of deviations from the average can be described in the PDB-file by B-factor and occupancy However, when the deviations are too big and too arbitrary, there are no data and the atoms cannot be modelled. Crystallography The PDB-file 16/73

B-Factor The B-Factor of an atom describes its thermal motion. Even though data are usually collected at 100K, the atoms are not frozen, but move slightly. At about 1.5Å resolution and better, every atom has six parameters which describe the anisotropic thermal displacement (ADP) of the atom in three directions independently. From 1.5Å - 3.5Å resolution there are not enough data for such detailed description and the thermal motion is described by only one isotropic B-factor. At worse than 3.5Å resolution this is even further reduced to one B-factor per residue and eventually one parameter for the whole molecule. Crystallography The PDB-file 17/73

Isotropic vs. Anisotropic B-Factor Isotropic B-Factors Anisotropic B-Factors The spheres and ellipsoids show the volume where the corresponding atom can be found with a probability of 50%. An ellipsoid is more accurate, but requires high resolution data to be calculated. Crystallography The PDB-file 18/73

An Example for Occupancy High-resolution map (1.3 Å) In most parts the positions of the backbone and side-chains are visible. At the centre the density looks a little blobby. The main-chain splits into two parts: 40% of all unit cells contain one conformation, 60% the other one. Crystallography The PDB-file 19/73

An Example for Occupancy High-resolution map (1.3 Å) In most parts the positions of the backbone and side-chains are visible. At the centre the density looks a little blobby. The main-chain splits into two parts: 40% of all unit cells contain one conformation, 60% the other one. Crystallography The PDB-file 20/73

Occupancy vs. B-factor The occupancy describes discrete conformations of side chains or even whole parts of a molecule. The B-factor describes small movements of atoms. Any other larger flexibility is not displayed by crystallography. This often affects long side-chains of e.g. Arg that stick out from the surface of the molecule into the solvent region. Crystallography The PDB-file 21/73

Model Building Crystallography Model Building 22/73

Model Building The creation of a model for the electron density consists of two parts: Model Building Refinement The two steps are cycled several times (Model Building Refinement Model Building... ) until the crystallographer decides that the model cannot be further improved. Crystallography Model Building 23/73

Getting Started At medium or better resolution (< 2.5 Å) the first model is most likely created by a program for automated model building (Arp/wArp, shelxe, resolve,... ), which can be 80% complete or better. After successful molecular replacement, one also has at least most of the backbone and only needs to make minor corrections. At low resolution, such comfort may not be available and one has to create a model from scratch. The best thing to start with is to find secondary structure elements, i.e. α-helices and β- sheets. Especially α-helices are visible even at low resolution. For nucleic acids, the backbone phosphates and base-stacking would be the features to look out for in the electron density map. Crystallography Model Building 24/73

α-helices: the Christmas Tree 2.4Å map after SeMet-MAD The side chains, in particular the C β -atoms, of an α-helix tend to point backwards to the N-terminus of the sequence. This is a good way to get the direction right of the helix. Crystallography Model Building 25/73

β-sheets 2.4Å-map after SeMet-MAD β-sheets are also striking, but their direction is not as obvious and they can easily be placed the wrong way round. Crystallography Model Building 26/73

β-sheets 2.4Å-map after SeMet-MAD β-sheets are also striking, but their direction is not as obvious and they can easily be placed the wrong way round. Crystallography Model Building 27/73

Sequence Docking Model building begins with the placement of the C α atoms. Once they are placed, the C α -chain can relatively easy be turned into a poly-alanine chain (this fixes the direction of the chain). In order place the side chains correctly, it is good to start with bulky, large side chains like Trp, Phe, Tyr. Marker atoms from the phasing experiment are also good anchors, especially the Se-atoms after SeMet-MAD. Crystallography Model Building 28/73

Refinement Crystallography Refinement 29/73

Phases for the Map Amplitudes F(hkl) With the phases from MAD, or SAD, or Molecular Replacement, or..., one cal- Phases φ (hkl) initial Map initial Model culates an initial electron density map and builds an initial model of the structure. Crystallography Refinement 30/73

Phases for the Map The phases from any of these methods are usually of very poor quality they are only rough estimates of the real phases. The phases (and the amplitudes/ intensities) can be calculated from the model coordinates. Even an only partially complete model provides much better estimates for the phases than any experimentally determined phases. The initial phases are discarded after the very first few cycles of model building. The model itself serves as source for the phases required to calculated the electron density. Crystallography Refinement 31/73

The Role of the Model (revisited) The purpose of the model is therefore twofold: 1. Facilitate understanding and interpretation of the data 2. Storage container of the phases which we cannot measure experimentally. Crystallography Refinement 32/73

The Refinement-Building-Cycle refinement by program model (checks chemical correctness) φ calculate map new model F build model/ match model to map better w.r.t. map! data improving the model = improving the phases = improving the map = improving the model... Crystallography Refinement 33/73

Refinement vs. Model Building Model Building: includes local changes the model. Atoms can be moved over greater distances (several Å), or even added and removed to ensure that 1. the model fits to the calculated electron density map 2. the model makes sense (bio-)chemically (e.g. Do water molecules have partners for hydrogen bonding? Is a metal ion in a chemically sensible environment? Does the ligand orientation make sense?) Some parts of model building can be automated, but the last check has to be done be the crystallographer in front of a graphical display of the map and the molecule. Crystallography Refinement 34/73

Refinement vs. Model Building Refinement: improves the paramters describing the model (coordinates, B-factor) with respect to the experimental data, i.e. the measured intensities/ amplitudes. It is a global improvement of the model and takes stereochemistry into account, i.e. expected bond distances and angles. These values are based on those published by R. A. Engh & R. Huber, Acta Cryst. A47 (1991). Refinement is a computational minimisation procedure carried out by programs (e.g. refmac5, phenix.refine, shelxl). Crystallography Refinement 35/73

Data to Parameters Ratio The spot intensities are the data of the X-ray experiment. The model has to be created so that it is consistent with the data. The model is described by parameters: we want to determine the coordinates and B-factor of each atom to get the model which is most consistent with our data. The more data points we could measure, the more reliably the parameters can be determined: Therefore, high-resolution structures (many reflections) provide better models than low-resolution structures (few reflections). Crystallography Refinement 36/73

Data to Parameter Ratio: Example Estimates Resolution[Å] refined parameters a data/parameters ratio 3.0 x,y,z 0.9:1 2.3 x,y,z; B 1.5:1 1.8 x,y,z; B 3.1:1 1.5 x,y,z; B 5.4:1 1.5 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 2.4:1 1.1 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 6.1:1 0.8 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 16:1 a x,y,z: coordinates; B: isotropic B-value; U ij : anisotropic B-values G. Sheldrick Technically there would not be enough data points to create a reliable model below 1.8Å. The data to parameter ratio can be improved by additional (bio ) chemical etc. information. This information can be constraints and restraints. Crystallography Refinement 37/73

Restraints and Constraints Constraints and restraints are introduced into refinement in order to improve the data to parameter ratio. Restraints increase the number of data. Should be or should be approximately expressions (mathematically: inequalities) e.g. angle (N, C, O) 122. Constraints reduce the number of parameters. They are must have or must be expression (mathematically: equalities) e.g.: the sequence information is a very important constraint. The Engh-Huber parameters are restraints. Only small molecules at very high resolution (< 0.8 Å) can be freely refined, i.e. without using constraints and restraints. Crystallography Refinement 38/73

Model Bias and Overfitting The refinement programs minimise the difference between I meas (hkl) and I calc (hkl).... including the difference density for this beautifully displayed Phe which is missing in the model, because the refinement program does not add or remove atoms - it only wiggles them around. Crystallography Refinement 39/73

Model Bias and Overfitting At high resolution such strong difference density would not disappear, it is too strongly anchored in the data I(hkl), no matter what the phases say (i.e. what the model says). But...... at low or medium resolution...... at the beginning of model building...... after molecular replacement... the phases are still poor, and when the data are weak (low or medium resolution), it may happen that the refinement program levels out such features from the difference map, or enhance errors in the model. Therefore: Always do as much model building as possible before running the refinement program. Crystallography Refinement 40/73

Refinement Summary Model Building and Refinement are a bit of a vicious circle: Because of the lack of reliable experimental phases, the model is required for creating the model. Especially at weak resolution it is easy to introduce or overlook errors. Therefore it is important to understand what refinement does and to validate the structure one has build. Crystallography Refinement 41/73

Validation Crystallography Validation 42/73

Why do we Need Validation? The mistake so clearly illustrates [... ] that those lovely colored ribbons festooning the covers and pages of journals are just models, not data[... ] C. Miller, Science 2007, 315, p. 459 about the Great Pentaretraction This showed that the structures of MsbA and EmrE were incorrect [... ]. In this case, unfortunately it appears that the incorrect structures have had serious adverse effects on the development of the field and possibly also on the distribution of grant money. A.M. Davis, S. A. St-Gallen, G. J. Kleywegt, Drug Discovery Today (2008), Vol. 13, pp. 831-841 Crystallography Validation 43/73

Creativity in Crystallography Scientific presentations should display the results of scientific experiments. Crystallography Validation 44/73

Creativity in Crystallography The way crystal structures are presented allows for a lot of fantasy and creativity, e.g. because atoms have no colour. Crystallography Validation 45/73

Crystallography is Seductive Crystallography has always been computer based. Crystallography programs are well advanced and easy to use. It is easy to play around and reassemble various structures Pictures easily stay in mind, and sometimes it is overlooked that the picture displays pure imagination. It is becoming more and more difficult to publish a structure itself there has to be a story. Crystallography Validation 46/73

Low Resolution Maps are Difficult to Interpret Correct structure of SarA protein (PDB ID 2FRH, 2006) Previously determined structure of SarA (PDB ID 1FZN, 2001) (Davis, St-Gallay, and Kleywegt, Drug Discovery Today (2008), Vol. 13, p. 831) Structure validation has advanced so that such drastic examples of misinterpretation have become very unlikely. Crystallography Validation 47/73

Initial interpretation of PPAR-β/δ as apo-form (PDB ID 2GWX, 1999). Waters cover up for Anything (Davis, St-Gallay, and Kleywegt, Drug Discovery Today (2008), Vol. 13, p. 831) Crystallography Validation 48/73

Initial interpretation of PPAR-β/δ as apo-form (PDB ID 2GWX, 1999). Waters cover up for Anything Re-evaluation of the same data reveals a fatty acid bound in the active site (PDB ID 2BAW, 2006) (Davis, St-Gallay, and Kleywegt, Drug Discovery Today (2008), Vol. 13, p. 831) Placing water molecules into unexplained density may lower the R-value (see below), but also wash out features of the data. Crystallography Validation 49/73

Validation: Who and When Crystallographers must validate to ensure they deposit a correct structure. Users (including non-crystallographers) should check the quality of the deposited model before drawing conclusions. Understanding the quality of a structure becomes particularly important for noncrystallographers involved in ligand design. Crystallography Validation 50/73

Misconceptions in Crystallography The main misconceptions about crystal structures: 1. Correct structure. Correct amino acid sequence (undetected cloning/ PCR errors) Complete model (including ligands & waters) Correct and accurate coordinates 2. in vivo significance. Crystallisation conditions can be quite unnatural, e.g. crystallisation kits vary the ph between 4 and 9. While this probably does not affect the overall structure, it might well affect the interaction between ligand and protein (protonation state,... ). Crystallography Validation 51/73

Means of Validation There are two types of validation: 1. Conformance of the model with what we expect, e.g. for the model: bond angle deviation, bond distance deviation, R, R free ; for reflections: completeness, I/σ I, R int. 2. Validation of the correctness of the model by information/ knowledge that was not used during the construction process of the model. The author of a structure should understand both types, a user of the PDB should at least understand the second type. Crystallography Validation 52/73

The R-value The term R-value is very abundant in statistics: The letter R stands for residual" and one usually has to understand the correct definition from the context. Even within crystallography there are several R-values. The general meaning of an R-value is, however, always the same: It describes the descrepancy between measured data and calculated or predicted data, i.e. it tests our theory/model against the experiment. Crystallography Validation 53/73

Data-Conformance: The R work The R-value for refinement is sometimes called R work. programs (phenix, refmac5, shelxl,... ) as R work = hkl ( F (hkl) F calc (hkl) ) hkl( F (hkl) ) It is calculated by all refinement At normal resolution ranges (1.8-3Å, say) the R work should be around 10% of the resolution, e.g., a 2.3Å data set should have a final R work around 0.23 = 23%. Irrespective of the resolution an R work -value worse than 30% should rise suspicion. For more precise estimates, see e.g. Tickle et al., Acta Cryst. 1998, D54, pp. 547 NB: For an engineer, such a high R-value (for any type of measurement) would be horrendously high. But that is the fate of protein crystallography - the data are very poor and we have to make the best of it. The programs differ, though, how they calculate F calc Crystallography Validation 54/73

Limits of R work The rationale behind R work seems reasonable: We want to create a model that comes as close as possible to the data, so we want to reduce its difference from the data, i.e. the R work. However, it is possible to arbitrarily reduce the R work, e.g. by filling up the difference density with water molecules. This is called overfitting and probably happens to some extent in all structures which are not atomic resolution (1.2Å or better). It is e.g. possible to fit a protein completely the wrong way round and still arrive at the same R work (Kleywegt, Jones, Structure 1995, pp. 535). Crystallography Validation 55/73

Fooling R work A drastic example how to create a model that fits the data well. Take a 1.2 Å data set Fill the unit cell with a grid of atoms no chemical meaning at all Refine the atoms without constraints or restraints. Crystallography Validation 56/73

Fooling R work A drastic example how to create a model that fits the data well. Take a 1.2 Å data set Fill the unit cell with a grid of atoms no chemical meaning at all Refine the atoms without constraints or restraints. The result still makes chemically no sense at all. What does R work say? Crystallography Validation 57/73

Fooling R work Resulting R work at different resolution: Resolution R work 3.0 Å 7.2% 2.5 Å 17.8% 2.0 Å 27.7% 1.5 Å 33.7% 1.2 Å 38.6% Observations: The R work drops as the resolution gets worse: the lower the resolution, the easier the data can get fooled. At 3 Å resolution R work is suspiciously low. Even at 1.2 Å resolution, the R work is not outrageously high, but could simply indicate an incomplete model, e.g. after molecular replacement. Crystallography Validation 58/73

The Guard: R free A very good way of validating a structure is to give the data to two (or more) crystallographers and have them both build a model independently. When the structures compare equal, the structure is probably correct (at least some of the subjectivity of the structure would be removed). This approach is rather impractible. One uses calculates the R free instead. The concept of R free has been known in statistics for some time and was introduced to crystallography in 1992 by Axel Brünger. Crystallography Validation 59/73

The Guard: R free R work suffers from model bias, it can easily be fooled. Therefore, a test set of 500-1000 randomly selected reflections is generated before refinement and model building. This test set is put aside and never used for model building (map generation) or refinement. R free is calculated just like R work, but because the test set is not used in refinement or model building, it is not biased. Crystallography Validation 60/73

The Guard: R free The value of R free should be roughly 3-5% worse than R work (again, see Tickle et al., Acta Cryst. 1998, D54, pp. 547). Example: One of the structures of the Great Pentaretraction reported an R work = 38% and R free = 45%. Even though this is only a 4.5Å structure, both the absolute values and the gap of 7% is very high. The correctly refined structure reported R work = 28% and R free = 31% (P. D. Jeffrey, Acta Cryst. 2009, D65). Crystallography Validation 61/73

R free of our Thought Experiment Resolution R work R free 3.0 Å 7.2% 55.3% 2.5 Å 17.8% 52.6% 2.0 Å 27.7% 56.6% 1.5 Å 33.7% 54.5% 1.2 Å 38.6% 54.4% The R free is equally poor at all resolutions (an R-value of above 50% generally indicates a random model). Crystallography Validation 62/73

Global and Local Validation With R work and R free we have one number each that describes the quality for the whole model. They are therefore called global quality indicators or global figures of merit. More detailed insight into the quality of a model is provided by local quality indicators like 1. Real space correlation coefficient 2. Ramachandran plot 3. Kleywegt plot. Crystallography Validation 63/73

Real Space Correlation Coefficient The Real Space Correlation Coefficient (RSCC) compares the model to the electron density on a per-residue basis (instead of calculated data to measured data as the R-values do). RSCC (black) and B-factor (blue) for the BotLC/B protease in complex with synaptobrevin-ii (2.0Å). The B-factor for the ligand is very high and the RSCC very low for the ligand. Maybe the authors were too optimistic fitting the ligand. Crystallography Validation 64/73

The Ramachandran Plot The Ramachandran plot shows the φ vs. ψ backbone dihedral angles for a structure and the most probable regions derived from the 500 high quality protein structures. β strand α helix Interactive Ramachandran window of the model building program Coot. Everything outside the shaded region is an outlier and deserves a closer look. Outliers can be justified in well-ordered regions if, e.g. the residue makes a special contact to another residue. Crystallography Validation 65/73

The Kleywegt Plot The Kleywegt plot is derived from the Ramachandran plot in the presence of homo-oligomers in the crystal. The Kleywegt plot compares corresponding dihedral angles in the different molecules and plots large deviations. Proteins crystallise from solution. Therefore, all molecules in the crystal are similar, and their Ramachandran plots should be similar - unlike this example. Kleywegt, Acta D(2000), D56 Crystallography Validation 66/73

Validation Tools These validation programs are freely available and everyone using PDB-files should be aware of them. Uppsala EDS (eds.bms.uu.se/eds) MolProbity (molprobity.biochem.duke.edu) WhatCheck (swift.cmbi.ru.nl/gv/whatcheck) The PDB-Webservice www.pdbe.org has a direct link to the EDS server which offers several visualisation tools for validation. Crystallography Validation 67/73

Molprobity The Molprobity server provides a convienent way to check the geometry of a PDB-file. It works for both proteins and nucleic acids. Molprobity checks for (too) close contacts of atoms, e.g. between ligand and protein possible flips for Asn, Gln, His Ramachandran plot... and a few more things Crystallography Validation 68/73

Molprobity - Flips At anything less than atomic resolution one cannot distinguish between N, C, O. For His, Gln, Asn, this means we cannot tell their orientation except by investigating the chemical environment (network of hydrogen bonding). Crystallography Validation 69/73

Molprobity Example Output - Flips Crystallography Validation 70/73

Molprobity Example Output - Rotamers, Clashes,... Crystallography Validation 71/73

Molprobity Example Output - Score Crystallography Validation 72/73

Final Advice The Molprobity Server should be used by everyone who wants to submit a structure to the PDB, and all complaints should be thoroughly investigated and explained. The Molprobity Server can also be used for structures containing nucleic acids. It can be used by users to make sure the structure of interest is at least statistically sound. The EDS service (via www.pdbe.org) should definitely be used in case the structure contains a (biologically interesting) ligand. Crystallography Validation 73/73