Removing bias from solvent atoms in electron density maps

Similar documents
Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

research papers Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias 1.

This is an author produced version of Privateer: : software for the conformational validation of carbohydrate structures.

SOLVE and RESOLVE: automated structure solution, density modification and model building

Direct-method SAD phasing with partial-structure iteration: towards automation

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

research papers Detecting outliers in non-redundant diffraction data 1. Introduction Randy J. Read

Supporting Information

research papers Reduction of density-modification bias by b correction 1. Introduction Pavol Skubák* and Navraj S. Pannu

Phaser: Experimental phasing

Supporting Information. Synthesis of Aspartame by Thermolysin : An X-ray Structural Study

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Phase Improvement by Multi-Start Simulated Annealing Re nement and Structure-Factor Averaging

Pipelining Ligands in PHENIX: elbow and REEL

Molecular Biology Course 2006 Protein Crystallography Part II

Author's personal copy

Acta Crystallographica Section F

Web-based Auto-Rickshaw for validation of the X-ray experiment at the synchrotron beamline

electronic reprint (2,4,6-Trinitrophenyl)guanidine Graham Smith, Urs D. Wermuth and Jonathan M. White Editors: W. Clegg and D. G.

Experimental phasing in Crank2

Molecular replacement. New structures from old

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

Supplementary materials. Crystal structure of the carboxyltransferase domain. of acetyl coenzyme A carboxylase. Department of Biological Sciences

electronic reprint (P)-Tetra-μ 3 -iodido-tetrakis[(cyclohexyldiphenylphosphine-»p)silver(i)] John F. Young and Glenn P. A. Yap

Experimental phasing in Crank2

PAN-modular Structure of Parasite Sarcocystis muris Microneme Protein SML-2 at 1.95 Å Resolution and the Complex with 1-Thio-β-D-Galactose

PROTEIN'STRUCTURE'DETERMINATION'

Jimmy U. Franco, Marilyn M. Olmstead and Justin C. Hammons

4. Constraints and Hydrogen Atoms

Pathogenic C9ORF72 Antisense Repeat RNA Forms a Double Helix with Tandem C:C Mismatches

Supplemental Data. Structure of the Rb C-Terminal Domain. Bound to E2F1-DP1: A Mechanism. for Phosphorylation-Induced E2F Release

MR model selection, preparation and assessing the solution

Garib N Murshudov MRC-LMB, Cambridge

Automated ligand fitting by core-fragment fitting and extension into density

A tutorial for learning and teaching macromolecular crystallography

Protein Crystallography Part II

Direct Method. Very few protein diffraction data meet the 2nd condition

Computational aspects of high-throughput crystallographic macromolecular structure determination

catena-poly[[[bis(cyclohexyldiphenylphosphine-»p)silver(i)]-μ-cyano-» 2 N:C-silver(I)-μ-cyano-» 2 C:N] dichloromethane solvate]

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

Experimental Phasing with SHELX C/D/E

research papers 1. Introduction Thomas C. Terwilliger a * and Joel Berendzen b

shelxl: Refinement of Macromolecular Structures from Neutron Data

electronic reprint 3,5-Di-p-toluoyl-1,2-dideoxy-fi-1-(imidazol-1-yl)-D-ribofuranose Nicole Düpre, Wei-Zheng Shen, Pablo J. Sanz Miguel and Jens Müller

Macromolecular Crystallography Part II

electronic reprint 5,12-Bis(4-tert-butylphenyl)-6,11-diphenylnaphthacene

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

Preparing a PDB File

research papers Development of a force field for conditional optimization of protein structures

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Small-Angle Scattering Atomic Structure Based Modeling

ID14-EH3. Adam Round

Crystals, X-rays and Proteins

research papers ARP/wARP and molecular replacement 1. Introduction

N-[(Diphenylamino)methyl]acetamide

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

research papers A general method for phasing novel complex RNA crystal structures without heavy-atom derivatives

Fast, Intuitive Structure Determination IV: Space Group Determination and Structure Solution

Modelling Macromolecules with Coot

Manipulating Ligands Using Coot. Paul Emsley May 2013

Approximation of the structure factor for nonspherical hard bodies using polydisperse spheres

3-methoxyanilinium 3-carboxy-4-hydroxybenzenesulfonate dihydrate.

Direct Methods and Many Site Se-Met MAD Problems using BnP. W. Furey

Unexpected crystallization of 1,3-bis(4-fluorophenyl)propan-2-one in paratone oil

Anisotropy in macromolecular crystal structures. Andrea Thorn July 19 th, 2012

A tutorial for learning and teaching macromolecular crystallography version 2010

Introduction to single crystal X-ray analysis VI. About CIFs Alerts and how to handle them

2-Methoxy-1-methyl-4-nitro-1H-imidazole

Scattering Lecture. February 24, 2014

Ethylenediaminium pyridine-2,5-dicarboxylate dihydrate

Supporting Information. Structural Insights into Substrate Specificity and Solvent Tolerance in Alcohol

SHELXC/D/E. Andrea Thorn

electronic reprint To B or not to B: a question of resolution? Ethan A. Merritt Crystallography Journals Online is available from journals.iucr.

MRSAD: using anomalous dispersion from S atoms collected at CuKffwavelength in molecular-replacement structure determination

1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?!

Resolution and data formats. Andrea Thorn

GC376 (compound 28). Compound 23 (GC373) (0.50 g, 1.24 mmol), sodium bisulfite (0.119 g,

Structure solution from weak anomalous data

Sodium 3,5-dinitrobenzoate

Exploiting Protein Conformational Change to Optimize Adenosine-Derived Inhibitors of HSP70

Model and data. An X-ray structure solution requires a model.

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Image definition evaluation functions for X-ray crystallography: A new perspective on the phase. problem. Hui LI*, Meng HE* and Ze ZHANG

research papers Simulated-annealing real-space refinement as a tool in model building 1. Introduction

Ab initio molecular-replacement phasing for symmetric helical membrane proteins

Better Bond Angles in the Protein Data Bank

electronic reprint Sr 5 (V IV OF 5 ) 3 F(H 2 O) 3 refined from a non-merohedrally twinned crystal Armel Le Bail, Anne-Marie Mercier and Ina Dix

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Orthorhombic, Pbca a = (3) Å b = (15) Å c = (4) Å V = (9) Å 3. Data collection. Refinement

PHENIX Wizards and Tools

X- ray crystallography. CS/CME/Biophys/BMI 279 Nov. 12, 2015 Ron Dror

Acta Cryst. (2017). D73, doi: /s

CHEM 463: Advanced Inorganic Chemistry Modeling Metalloproteins for Structural Analysis

research papers Use of knowledge-based restraints in phenix.refine to improve macromolecular refinement at low resolution 1.

Data collection. Refinement. R[F 2 >2(F 2 )] = wr(f 2 ) = S = reflections 92 parameters

research papers An introduction to molecular replacement 1. Introduction Philip Evans a * and Airlie McCoy b

Alchemical free energy calculations in OpenMM

= (8) V = (8) Å 3 Z =4 Mo K radiation. Data collection. Refinement. R[F 2 >2(F 2 )] = wr(f 2 ) = S = reflections

Diammonium biphenyl-4,4'-disulfonate. Author. Published. Journal Title DOI. Copyright Statement. Downloaded from. Link to published version

Transcription:

Journal of Applied Crystallography ISSN 0021-8898 Editor: Anke R. Pyzalla Removing bias from solvent atoms in electron density maps Eric N. Brown J. Appl. Cryst. (2008). 41, 761 767 Copyright c International Union of Crystallography Author(s) of this paper may load this reprint on their own web site or institutional repository provided that this cover page is retained. Republication of this article or its storage in electronic databases other than as specified above is not permitted without prior permission in writing from the IUCr. For further information see http://journals.iucr.org/services/authorrights.html Many research topics in condensed matter research, materials science and the life sciences make use of crystallographic methods to study crystalline and non-crystalline matter with neutrons, X-rays and electrons. Articles published in the Journal of Applied Crystallography focus on these methods and their use in identifying structural and diffusioncontrolled phase transformations, structure property relationships, structural changes of defects, interfaces and surfaces, etc. Developments of instrumentation and crystallographic apparatus, theory and interpretation, numerical analysis and other related subjects are also covered. The journal is the primary place where crystallographic computer program information is published. Crystallography Journals Online is available from journals.iucr.org J. Appl. Cryst. (2008). 41, 761 767 Eric N. Brown Solvent omit-map

Journal of Applied Crystallography ISSN 0021-8898 Received 22 February 2008 Accepted 30 May 2008 Removing bias from solvent atoms in electron density maps Eric N. Brown Department of Biochemistry, University of Iowa, Iowa City, IA, USA. Correspondence e-mail: eric-n-brown@uiowa.edu # 2008 International Union of Crystallography Printed in Singapore all rights reserved Atomic structures of proteins determined via protein crystallography contain numerous solvent atoms. The experimental data for the determination of a water molecule s O-atom position is often a small contained blob of unidentified electron density. Unfortunately, the nature of crystallographic refinement lets poorly placed solvent atoms bias the future refined positions of all atoms in the crystal structure. This research article presents the technique of omit-maps applied to remove the bias introduced by poorly determined solvent atoms, enabling the identification of incorrectly placed water molecules in partially refined crystal structures. A total of 160 protein crystal structures with 45 912 distinct water molecules were processed using this technique. Most of the water molecules in the deposited structures were well justified. However, a few of the solvent atoms in this test data set changed appreciably in position, displacement parameter or electron density when fitted to the solvent omit-map, raising questions about how much experimental support exists for these solvent atoms. 1. Introduction Unlike small-molecule crystals, protein crystals contain a large quantity of water (Matthews, 1968). Water molecules that are ordered or semi-ordered in the crystal lattice contribute greatly to the X-ray diffraction but are not as tightly restrained during refinement as the protein components of the crystal. Thus, incorrectly placed water molecules can bias electron density maps and, through further model-building refinement, the protein structure. Techniques such as simulated annealing and omit-maps can be used to remove or identify bias in crystal structures (Artymiuk & Blake, 1981; Bhat & Cohen, 1984; Bhat, 1988; Hodel et al., 1992; Adams et al., 1999; Terwilliger et al., 2008). In the building of X-ray crystallographic structures, prior structures can be utilized to generate initial phases for structure building (Read, 2001). Often, all solvent atoms are removed and protein side chains are trimmed to include only the conserved side chains or atoms (Schwarzenbacher et al., 2004). Although it is rare to include water molecules in traditional molecular-replacement techniques, when determining the structure of mutant proteins or proteins with ligands bound, an earlier structure including water molecules can be used to increase the efficiency of the crystallographer. If water molecules are retained, they may bias the final model structure. Here we present an omit-map method developed exclusively for the unbiasing of solvent atoms in protein crystal structures. It is implemented as a straightforward Perl script whose inputs are the structure factors and coordinates of the crystallographic structure and whose output is an unbiased electron density map covering the solvent in the structure. This map can be used to identify and remove the biasing effects of misplaced solvent molecules. Finally, a statistical model is presented that estimates the log-likelihood of a given set of water molecules and can be used to make objective decisions about which water molecules to remove from the structure. The initial use for this method was for identification of poorly resolved solvent atoms in numerous crystal structures of proteins with active-site mutations. Ferraro et al. (2006, 2007) used the all-atom wild-type model as the starting structure for each active-site mutation. After introducing active-site mutations in the correct locations, the solvent omitmap method was used to eliminate water molecules that were absent in the mutant crystals prior to further model building and refinement. A primary use for this method is to simplify the solvent content of protein crystal structures. It is important for crystallographers to refine and publish protein crystal structures that are well supported by statistically significant experimental diffraction data. Excess, unsupported water molecules included in refinement introduce bias during the refinement by highlighting noise present in the solvent region of the crystal. Finally, their presence can cause future interpretation issues. This solvent omit-map technique presents a method by which unlikely and unsupported water molecules can be identified and removed from a crystal structure prior to structure deposition and publication. By removing water molecules whose presence does not contribute to the quality of the structure, simplified protein crystal structures are created, thus easing their future interpretation. J. Appl. Cryst. (2008). 41, 761 767 doi:10.1107/s0021889808016609 761

1.1. Omit-maps All omit-map techniques attempt to remove the bias introduced by a set of atoms from the electron density map (Artymiuk & Blake, 1981; Bhat & Cohen, 1984; Langs et al., 2001a,b; Terwilliger et al., 2008). These methods basically (1) ignore a subset of either observed amplitudes or modeled atoms (the omit step), (2) shake up the remaining model to remove any bias introduced by the ignored information, (3) re-refine the remaining model to optimize the model on the basis of the diffraction data, and (4) predict the ignored model data on the basis of the newly refined model. There are numerous points in the omit-map technique where the crystallographer makes choices. The first step is choosing a subset of the data to omit. The data selected to omit could be geometrically neighboring atoms (Artymiuk & Blake, 1981; Bhat & Cohen, 1984; Bhat, 1988) or geometrically close solvent atoms (this method). The second step involves removing the existing bias of those data omitted from the remaining model. Commonly used methods to remove bias include randomly displacing every remaining atom and simulated annealing (Hodel et al., 1992). Thirdly, the remaining model is refined. A common procedure is to refine the coordinates of remaining atoms in the structure using existing techniques (Terwilliger et al., 2008). This completely ignores the impact of the atomic details that have been omitted (Bhat & Cohen, 1984). Other procedures utilize simulated annealing or density refinement on the remaining data to refine the electron density and hence the model (Hodel et al., 1992). The fourth step in the omit-map technique is to predict the data that had initially been omitted. The traditional method is to extract the electron density for the region of space covered by the atoms that had been omitted (Vellieux & Dijkstra, 1997). Each of these steps is repeated multiple times, each time omitting different subsets of the data. The final output is the average of the electron density maps predicted in the last step of each iteration. This electron density map is then presented to the crystallographer for manual model building or comparison with electron density maps calculated by alternative means. 1.2. Identification of solvent atoms A general rule of thumb is that one water molecule can be found for every residue in a protein structure at 2.0 Å resolution (Carugo & Bordo, 1999). Using a set of low-temperature crystal structures, Carugo and Bordo estimated the number of water molecules, N H2 O, as N H2 O ¼ N at ð0:334 0:11r max Þ, where r max is the resolution (in ångströms but used without units) and N at is the total number of protein atoms. The standard error of this estimate is 0.043[0.030 + 0.167(r max 2.2) 2 ] 1=2. A thorough analysis of the effect of temperature on the structure of lysozyme found that the number of water molecules bound to main-chain atoms was temperature independent, while the number of water molecules near side chains varied inversely with temperature (Kurinov & Harrison, 1995). Most of the well ordered solvent atoms were within 4.0 Å of the protein surface and had on average 2.6 neighboring atoms. Automated methods exist to insert solvent atoms into X-ray crystallographic structures. The ARP/wARP method iteratively adds water molecules (Morris et al., 2004). During each iteration, it identifies positive density regions in the difference electron density map that are within a set distance, r protein,of existing O and N atoms (2:3 r protein 3:5Å). Additional water molecules are then added in the maximal density regions. This is followed by the removal of the water molecules fitting other criteria. The water_ pick method of CNS (Brünger et al., 1998; Brunger, 2007), the ordered_ solvent method PHENIX (Afonine et al., 2005), and the CCP4 programs peakmax and watpeak (Collaborative Computational Project, Number 4, 1994) work in a similar manner but with different distance constraints. SHELX also contains a program, SHELXWAT, that finds peaks in the difference electron density map and classifies them as water molecules (Sheldrick, 2008). The crystallography editing software O (Jones et al., 1991) and COOT (Emsley & Cowtan, 2004) also implement water molecule finding functions. Similar to a single iteration of the ARP/wARP water molecule finding algorithm, they search for peaks in the electron density map that are near the protein. In addition to using an electron density cutoff to identify peaks, both programs use the spherical shape of the electron density around a prospective peak to choose candidate water molecules. This limits the placement of water molecules in the electron density of larger ligands. The present study, in addition to proposing a solvent omitmap methodology, utilizes the electron density of the solvent atoms to determine their validity. The interpolated solvent omit-map electron densities at the positions of all solvent atoms are extracted. Since the aim here is to detect incorrect water molecules rather than locate additional solvent molecules, distance restraints similar to those used by ARP/wARP and COOT are not employed. Instead, the statistical distribution of interpolated electron density and difference electron density is used to estimate the likelihood of a solvent atom having those characteristics. 2. Methods 2.1. Solvent omit-map The solvent omit-map method iteratively builds up an electron density map in the region covered by existing solvent atoms that is not biased by any prior water molecule locations. The algorithm has been coded in Perl and utilizes the existing CCP4 programs MTZDUMP, MAPMASK, NCSMASK, REFMAC, OVERLAPMAP, PDBSET and FFT (Collaborative Computational Project, Number 4, 1994; Murshudov et al., 1997; Pannu et al., 1999; Ten Eyck, 1973). The algorithm created here proceeds in iterations as follows: a solvent atom that has not been previously processed, 762 Eric N. Brown Solvent omit-map J. Appl. Cryst. (2008). 41, 761 767

s i, is chosen at random. An intermediate Protein Data Bank (PDB) file containing all atoms except solvent atoms within r þ 4Åof s i is created. The locations of all of these atoms are randomly perturbed and then refined using REFMAC. Finally, FFT is used to generate an electron density map. The electron density values within r þ 2Åof s i are extracted and represent the nonbiased electron density for all solvent atoms within r Å of s i. This density is merged with the growing omit-map and all solvent atoms within r Å of s i are marked as processed. Finally, an unprocessed solvent atom, s iþ1, is chosen and the procedure is repeated until all solvent atoms have been processed. The algorithm run-time is a function of the number of iterations needed to cover all water molecules and the number of cycles of maximum likelihood refinement performed by REFMAC. Each iteration updates the omit-map electron density for a volume equal to ð4=3þðr þ 2Þ 3 Å 3. To minimize the number of iterations, the radius of the omit-map region created at each iteration, r, can be increased. The default omitmap radius of r ¼ 20:0 Å was chosen to balance run-time versus the quality of the resulting omit-map. The algorithm has been implemented in parallel to accelerate omit-map generation on multi-core and multi-processor workstations. 2.2. Statistical modeling All protein structures that had coordinates and structure factors deposited in the PDB in the first three months of 2006 were downloaded and tested with the solvent omit-map method. Structures that did not contain water molecules, contained ligands unknown to REFMAC, lacked a FREE column in the deposited structure factors, contained unit cells too large for MAPMASK, or refined with REFMAC to unrealistically high R or R free factors were excluded from the analysis. A modified version of NCSMASK was compiled that allowed for larger maps to be used. This required the modification of just one parameter, maxsec, in the source code. Each structure was subjected to the solvent omit-map method using a radius of r ¼ 20:0 Å. This generated an electron density map covering all existing solvent atoms. The COOT (Version 0.3.1) scripting function fit-waters was then used to real-space refine the positions of each water molecule into the solvent omit electron density map (Emsley & Cowtan, 2004). This was followed by five cycles of maximum likelihood refinement against the original diffraction data using REFMAC. The changes in position and in the isotropic displacement parameters (B values) were monitored for every water molecule in each structure. Since the structures had been deposited by crystallographers utilizing differing refinement protocols, the original structure was also run through five cycles of REFMAC refinement for comparison with the refined structures. To identify superfluous water molecules, a statistical model was constructed that estimated the likelihood of observing each water molecule in the structure. This likelihood was based on the electron density at the water molecule in the solvent omit-map ( i ), the distance the water molecule moved upon refinement against the solvent omit-map (x i ) and the Table 1 Statistics for the 160 structures used in validation of the solvent omit-map procedure. The quality metric is defined by Brown & Ramaswamy (2007); lower is better. This quality metric has a mean of zero and a standard deviation of one computed over 16 000 structures in the PDB. Average Range Resolution (Å) 2.03 1.13 3.40 Atoms 4687 603 39 642 Water molecules 287 3 1930 Quality metric 0.33 2.21 4.10 change in B value upon refinement against the solvent omitmap ( i ). The interpolated electron density from the solvent-atom positions was extracted from the omit-map by the Uppsala Software Factory program MAPMAN (Kleywegt & Jones, 1996). It is assumed that water molecules with little experimental support will have little electron density in the solvent omit-map, a large change in position or a large change in B value. The statistical distributions of these three measures were determined from all monitored water molecules using the statistical software R (R Development Core Team, 2006). The likelihood of each individual water molecule was then computed using the determined probability distributions: Pð i ; x i ; i Þ Pð i Þ Pðx x i Þ Pð i Þ. The computed likelihoods of all water molecules were ranked to identify the least likely 5% of the water molecules. These water molecules were removed from the structures and the structures re-refined. Finally, the quality metric developed by Brown & Ramaswamy (2007) was used to compare the qualities of a structure before and after excluding water molecules. A paired Student s t-test was employed in R. 3. Results A total of 160 protein crystal structures with 45 912 distinct water molecules were processed. Summary information for those structures is presented in Table 1. The number of iterations required to generate the density for all solvent atoms ranged from 1 to 68 (Fig. 1) and appears to be weakly correlated with the number of water molecules in the structure Figure 1 Number of iterations required for solvent omit-map creation. J. Appl. Cryst. (2008). 41, 761 767 Eric N. Brown Solvent omit-map 763

but not all, of the water molecules. There does not appear to be any correlation between these displacements and the changes in B values (Fig. 3). Water molecule number 432 in the crystal structure of the cofactor-binding domain of the Cbl transcription factor (PDB code 2fyi; Stec et al., 2006) is used as an example of a poor Figure 2 Distribution of peak electron density in the solvent omit-map at the solvent atom s refined position. The line labeled Fit is a scaled Student s t-distribution with parameters m ¼ 1:152, s ¼ 0:668 and d f; ¼ 6:2. (cycles ¼ 3:8 þ 0:02N H2 O, R 2 ¼ 0:401). For example, the protocol took 68 iterations to generate an electron density map for all 642 water molecules present in the structure of the G6 antivascular endothelial growth factor antibody (PDB code 2fjf; Fuh et al., 2006). On the other hand, only a single iteration was required to process all 103 water molecules of the -PIX SH3 domain (PDB code 2g6f; Hoelz et al., 2006). The electron density in the solvent omit-map is a measure of how much experimental support exists for each water molecule in the original protein structure. To compare the likelihood of individual water molecules, the logarithm of the omit-map electron density of the refined solvent atoms was fitted to a shifted and scaled Student s t-distribution. 1 The fitted distributional parameters were shift m ¼ 1:152 (4), scale s ¼ 0:668 (4) and degrees of freedom d f; ¼ 6:2 (2) (Fig. 2). This t-distribution was chosen since it generalizes both Cauchy and normal distributions: a Cauchy distribution is a t-distribution with one degree of freedom, while a normal distribution is a t-distribution with infinite degrees of freedom. Following refinement of the structure using the solvent omit-map, the coordinates and B values for water molecules changed. The set of water molecule displacements obtained from the solvent omit-map algorithm was fitted to a lognormal distribution with mean x ¼ 2:549 (9) Å and standard deviation x ¼ 1:056 (6) (Fig. 3). The vast majority of the 45 912 water molecules shifted position very little when refined using the solvent omit-map s electron density. Changes in B values were fitted to a Cauchy distribution [location 0 ¼ 0:192 (6) Å 2 and scale ¼ 0:601 (7); Fig. 3]. More than 97% of the isotropic displacement parameters changed by less than 5 Å 2. The water molecules that had an increase in displacement parameter by more than 5 Å 2 were statistically farther displaced (0.66 versus 0.06 Å shift in coordinates, p < 0:001) and less electron dense (0.27 versus 0.40 e Å 3, p < 0:001) compared with the average water molecule. This indicates that the original electron density with which the solvent atoms were refined was acceptable for most, 1 t ðx; m; s; d f Þ¼ð1=sÞt½ðx mþ=s; d f Š where t ðx; d f Þ is the t-distribution with d f degrees of freedom. Figure 3 Change in (a) position and (b) B value for solvent atoms following fitting to the solvent omit-map. Water molecule displacements were fitted to a lognormal distribution with mean x ¼ 2:549 Å and standard deviation x ¼ 1:056. Changes in B value were fitted to a Cauchy distribution with location 0 ¼ 0:192 Å 2 and scale ¼ 0:601. (c) Correlation between the changes in position and B value. 764 Eric N. Brown Solvent omit-map J. Appl. Cryst. (2008). 41, 761 767

quality water molecule. In the traditional 2F o F c electron density map (Fig. 4a), this water molecule has reasonable electron density. However, in the solvent omit-map (Fig. 4b), there is no density for this particular water molecule. After refining water molecule positions against the solvent omitmap, this molecule shifted position by over 1 Å and increased its B value by over 20 Å 2. All 45 912 water molecules were then ranked on the basis of the probability of observing another water molecule in the data set that is at least as poorly justified as the water molecule being considered. The least likely 5% (2294) were then excluded and the structures re-refined. This included removing water molecule number 432 in structure 2fyi. The overall quality metrics computed after re-refining the structures were not statistically different from the original structure quality (p > 0:7), despite having fewer solvent molecules in the structure. Thus the re-refined, simpler structures are better protein models given the diffraction data. 4. Discussion Ordered water molecules contribute significantly to the total X-ray scattering in a diffraction experiment. Unfortunately, given the dearth of restraints upon the positions of water molecules, modeling water molecules in a protein structure too early can easily result in over-fitting noise, biasing the structure and subsequent refinement. Ultimately, this results in complicated structures with superfluous atoms. The solvent omit-map algorithm presented in this paper provides a method to remove bias from over-fitting noise in protein crystal structures by identifying unnecessary solvent atoms. This program s run-time is dependent on the number of water molecules in the structure. A larger radius results in fewer iterations and thus faster completion. However, a larger radius also omits more atomic data (including the positions of correct water molecules) that could be used in the refinement step of the protocol, lowering the quality of the predicted electron density. Thus a balance is needed between run-time and too few iterations. Users of the program are free to adjust the radius to optimize this trade-off for their particular structure. Tests of the algorithm on the 160 deposited protein structures show that most of the water molecules in the deposited structures are well justified. A few of the solvent atoms in the test data set changed appreciably in position, displacement parameter or electron density when fitted to the solvent omitmap. Thus these atoms are probably not true water molecules in the original structure but rather over-fit noise in the electron density. Their removal can thus be justified. Removal of the least likely 5% of water molecules present in these 160 structures produced structures with no significant decrease in overall structural quality [when compared with the quality measure presented by Brown & Ramaswamy (2007)]. The resulting simpler structural models, containing fewer atoms, are as good as the original deposited protein structures. We assert that a simpler structural model, containing fewer water molecules in this instance, with fewer adjustable parameters is a better overall structure. The recommended use of our solvent omit-map program would be for crystallographers to begin by refining a structure using traditional crystallographic techniques. Prior to deposition in the PDB and publication, the crystallographer would run the solvent omit-map algorithm to rank the likelihood of all water molecules. The least likely water molecules would be iteratively removed while monitoring a quality metric. Only those water molecules whose removal did not decrease the structure s overall quality would be excluded. The final structure would contain only those water molecules that are strongly supported by the experimental diffraction data, thus making it easier for future users to interpret the structure. As described above, the solvent omit-map method presents a process by which the likelihood of every water molecule can be assessed in a protein crystal structure. In summary, the probability of observing a set of water molecules (with densities up to i, change in position of at least x i and change in B values of at least i )is p ¼ pð 1 ; 2 ;...; N ;x 1 ;x 2 ;...;x N ; 1 ; 2 ;...; N Þ Q pð i ; x i ; i Þ; i where pð i ; x i ; i Þ pð< i Þ pðx > x i Þ pð> i Þ; tðm ; s ; d f; Þ; Figure 4 Stereoview comparison of (a) a traditional A -weighted 2F o F c (Srinivasan, 1966) electron density map with (b) a solvent omit-map. Water molecule number 432 in structure 2fyi is centered. Maps are thresholded at 1. x lognormalð x ; x Þ; Cauchyð 0 ; Þ J. Appl. Cryst. (2008). 41, 761 767 Eric N. Brown Solvent omit-map 765

and parameters for these distributions have been given previously. Extensions of the statistical model are possible by including additional information. After incorporating the number of water molecules expected for a given resolution (Carugo & Bordo, 1999), two possible choices for water molecule sets (such as after the addition or deletion of water molecules) can be compared. The traditionally calculated electron density peak height used by ARP/wARP and COOT and the variance measure used by COOT could also be useful additions to the likelihood calculation. One benefit from such an extension would be to help identify electron density that would be better modeled as belonging to ligands, cryoprotectants or alternative conformations of neighboring side chains. It should be recognized that correct water molecules cannot be associated with incorrect side-chain placement. For example, consider a structure containing an amino acid with two alternative conformations, only one of which is modeled, and a water molecule where the second conformation should be. The solvent omit-map procedure may not identify the water molecule as suspect even though the observed electron density results from an alternative side-chain conformation and not a water molecule. Similar problems would occur when a water molecule is incorrectly placed in electron density belonging to small molecules such as ligands or cryoprotectants. Use of an electron density variance measure or a shape measure, as utilized by O and COOT, may assist in these cases. 5. Availability The source code for the solvent omit-map program is available from the S. Ramaswamy laboratory website at the University of Iowa: http://structure.biochem.uiowa.edu/omit-map/. It uses Perl and the CCP4 collection of crystallography programs and is executed using a command-line form similar to other CCP4 programs. Work on integration into the CCP4i graphical interface and the COOT structure refinement program, and on the use of PHENIX in place of REFMAC, is currently underway. 6. Conclusion A method is presented for removing bias introduced by arbitrary placement of solvent atoms in X-ray protein crystal structures during the refinement process. This is accomplished by neglecting the contribution of solvent atoms when generating the electron density of the region surrounding the solvent atom the classical omit-map approach. This solvent omit-map method can be used to validate the presence and position of solvent atoms in published X-ray crystal structures. When tested on 160 deposited crystal structures, this method identified approximately 5% of water molecules as having questionable validity. When refining and depositing protein crystal structures, it is important to remember that the ultimate goal of protein crystallography is to obtain a structure that will answer a useful biological question. Extra water molecules with poor justification not only introduce bias during refinement but hinder later interpretation. Deposited protein crystal structures should contain only water molecules with sufficient experimental evidence. The assistance of S. Ramaswamy, Daniel Ferraro, Lokesh Gakhar, Adam Okerlund and Bryce Plapp was instrumental in tuning the protocol and finding bugs. Elizabeth Kamp was crucial in proofreading the manuscript. ENB is a University of Iowa MSTP trainee and would like to acknowledge financial support through a fellowship from the University of Iowa Center for Biocatalysis and Bioprocessing. References Adams, P. D., Pannu, N. S., Read, R. J. & Brunger, A. T. (1999). Acta Cryst. D55, 181 190. Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 42, 8. Artymiuk, P. J. & Blake, C. C. F. (1981). J. Mol. Biol. 152, 737 762. Bhat, T. N. (1988). J. Appl. Cryst. 21, 279 281. Bhat, T. N. & Cohen, G. H. (1984). J. Appl. Cryst. 17, 244 248. Brown, E. N. & Ramaswamy, S. (2007). Acta Cryst. D63, 941 950. Brünger, A. T. (2007). Nat. Protoc. 2, 2728 2733. Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905 921. Carugo, O. & Bordo, D. (1999). Acta Cryst. D55, 479 483. Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760 763. Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126 2132. Ferraro, D. J., Brown, E. N., Yu, C. L., Parales, R. E., Gibson, D. T. & Ramaswamy, S. (2007). BMC Struct. Biol. 7, 10. Ferraro, D. J., Okerlund, A. L., Mowers, J. C. & Ramaswamy, S. (2006). J. Bacteriol. 188, 6986 6994. Fuh, G., Wu, P., Liang, W.-C., Ultsch, M., Lee, C. V. & Moffat, B. (2006). J. Biol. Chem. 281, 6625 6631. Hodel, A., Kim, S.-H. & Brünger, A. T. (1992). Acta Cryst. A48, 851 858. Hoelz, A., Janz, J. M., Lawrie, S. D., Corwin, B., Lee, A. & Sakmar, T. P. (2006). J. Mol. Biol. 358, 509 522. Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110 119. Kleywegt, G. J. & Jones, T. A. (1996). Acta Cryst. D52, 826 828. Kurinov, I. V. & Harrison, R. W. (1995). Acta Cryst. D51, 98 109. Langs, D. A., Blessing, R. H. & Guo, D. (2001a). Acta Cryst. D57, 574 578. Langs, D. A., Blessing, R. H. & Guo, D. (2001b). Acta Cryst. D57, 1351 1353. Matthews, B. W. (1968). J. Mol. Biol. 33, 491 497. Morris, R. J., Zwart, P. H., Cohen, S., Fernandez, F. J., Kakaris, M., Kirillova, O., Vonrhein, C., Perrakis, A. & Lamzin, V. S. (2004). J. Synchrotron Rad. 11, 56 59. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240 255. Murshudov, G. N., Vagin, A. A., Lebedev, A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247 255. R Development Core Team (2006). R Foundation for Statistical Computing, Vienna, Austria. Read, R. J. (2001). Acta Cryst. D57, 1373 1382. Schwarzenbacher, R., Godzik, A., Grzechnik, S. K. & Jaroszewski, L. (2004). Acta Cryst. D60, 1229 1236. Sheldrick, G. M. (2008). Acta Cryst. A64, 112 122. 766 Eric N. Brown Solvent omit-map J. Appl. Cryst. (2008). 41, 761 767

Srinivasan, R. (1966). Acta Cryst. 20, 143 144. Stec, E., Witkowska-Zimny, M., Hryniewicz, M. M., Neumann, P., Wilkinson, A. J., Brzozowski, A. M., Verma, C. S., Zaim, J., Wysocki, S. & Bujacz, G. D. (2006). J. Mol. Biol. 364, 309 322. Ten Eyck, L. F. (1973). Acta Cryst. A29, 183 191. Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Adams, P. D., Read, R. J., Zwart, P. H. & Hung, L.-W. (2008). Acta Cryst. D64, 515 524. Vellieux, F. M. D. & Dijkstra, B. W. (1997). J. Appl. Cryst. 30, 396 399. J. Appl. Cryst. (2008). 41, 761 767 Eric N. Brown Solvent omit-map 767