Macromolecular Crystallography Part II

Similar documents
Protein Crystallography Part II

Molecular Biology Course 2006 Protein Crystallography Part II

Macromolecular Crystallography Part II

Direct Method. Very few protein diffraction data meet the 2nd condition

Protein Crystallography

X-ray Crystallography

Experimental Phasing with SHELX C/D/E

Protein crystallography. Garry Taylor

shelxl: Refinement of Macromolecular Structures from Neutron Data

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Molecular Biology Course 2006 Protein Crystallography Part I

Electron Density at various resolutions, and fitting a model as accurately as possible.

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Resolution and data formats. Andrea Thorn

Summary of Experimental Protein Structure Determination. Key Elements

Introduction to Comparative Protein Modeling. Chapter 4 Part I

X-ray Crystallography I. James Fraser Macromolecluar Interactions BP204

SHELXC/D/E. Andrea Thorn

CCP4 Diamond 2014 SHELXC/D/E. Andrea Thorn

Anisotropy in macromolecular crystal structures. Andrea Thorn July 19 th, 2012

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

Why do We Trust X-ray Crystallography?

X-ray Crystallography. Kalyan Das

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

Model and data. An X-ray structure solution requires a model.

Handout 12 Structure refinement. Completing the structure and evaluating how good your data and model agree

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Dictionary of ligands

Crystal lattice Real Space. Reflections Reciprocal Space. I. Solving Phases II. Model Building for CHEM 645. Purified Protein. Build model.

Rietveld Structure Refinement of Protein Powder Diffraction Data using GSAS

IgE binds asymmetrically to its B cell receptor CD23

SUPPLEMENTARY INFORMATION

Molecular Modeling lecture 2

Get familiar with PDBsum and the PDB Extract atomic coordinates from protein data files Compute bond angles and dihedral angles

Preparing a PDB File

Pymol Practial Guide

Supporting Information. Synthesis of Aspartame by Thermolysin : An X-ray Structural Study

NMR, X-ray Diffraction, Protein Structure, and RasMol

4. Constraints and Hydrogen Atoms

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

11/6/2013. Refinement. Fourier Methods. Fourier Methods. Difference Map. Difference Map Find H s. Difference Map No C 1

Scattering by two Electrons

Protein Structure and Visualisation. Introduction to PDB and PyMOL

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Structure factors again

Garib N Murshudov MRC-LMB, Cambridge

Viewing and Analyzing Proteins, Ligands and their Complexes 2

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Full wwpdb X-ray Structure Validation Report i

Small Molecule Crystallography Lab Department of Chemistry and Biochemistry University of Oklahoma 101 Stephenson Parkway Norman, OK

Ultra-high resolution structures in validation

Anomalous dispersion

CALIFORNIA INSTITUTE OF TECHNOLOGY BECKMAN INSTITUTE X-RAY CRYSTALLOGRAPHY LABORATORY

Automated Protein Model Building with ARP/wARP

Supplementary Information

Manipulating Ligands Using Coot. Paul Emsley May 2013

Introduction to" Protein Structure

wwpdb X-ray Structure Validation Summary Report

Visualization of Macromolecular Structures

Bioinformatics. Macromolecular structure

Scientific Integrity: A crystallographic perspective

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Experimental phasing in Crank2

Charge density refinement at ultra high resolution with MoPro software. Christian Jelsch CNRS Université de Lorraine

Ensemble refinement of protein crystal structures in PHENIX. Tom Burnley Piet Gros

APPENDIX E. Crystallographic Data for TBA Eu(DO2A)(DPA) Temperature Dependence

Small Molecule Crystallography Lab Department of Chemistry and Biochemistry University of Oklahoma 101 Stephenson Parkway Norman, OK

Determination of the Substructure

Full wwpdb X-ray Structure Validation Report i

Handout 13 Interpreting your results. What to make of your atomic coordinates, bond distances and angles

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Table S1. Overview of used PDZK1 constructs and their binding affinities to peptides. Related to figure 1.

1 Introduction. Abstract

Introduction Molecular Structure Script Console External resources Advanced topics. JMol tutorial. Giovanni Morelli.

Full wwpdb X-ray Structure Validation Report i

Web-based Auto-Rickshaw for validation of the X-ray experiment at the synchrotron beamline

Full wwpdb X-ray Structure Validation Report i

X-Ray structure analysis

SOLVE and RESOLVE: automated structure solution, density modification and model building

Introduction to Structure Preparation and Visualization

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Small-Angle Scattering Atomic Structure Based Modeling

Direct Methods and Many Site Se-Met MAD Problems using BnP. W. Furey

Protein Struktur (optional, flexible)

Full wwpdb X-ray Structure Validation Report i

Principles of Physical Biochemistry

Phase problem: Determining an initial phase angle α hkl for each recorded reflection. 1 ρ(x,y,z) = F hkl cos 2π (hx+ky+ lz - α hkl ) V h k l

SUPPLEMENTARY INFORMATION

Basics of protein structure

X- ray crystallography. CS/CME/Biophys/BMI 279 Nov. 12, 2015 Ron Dror

Supporting Information

Modeling Biological Systems Opportunities for Computer Scientists

Macromolecular Phasing with shelxc/d/e

Full wwpdb X-ray Structure Validation Report i

1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?!

Transcription:

Molecular Biology Course 2009 Macromolecular Crystallography Part II Tim Grüne University of Göttingen Dept. of Structural Chemistry November 2009 http://shelx.uni-ac.gwdg.de tg@shelx.uni-ac.gwdg.de

From Experiment to Model Introduction 1/56

So Far... From Experiment to Model 2/56

The Model The electron density map is the actual result of the X-ray experiment. useless: difficult to interpret. It is per se quite Model: Atom positions Atom types Relationship between atoms: secondary structure, domains, etc... Biologically/ Chemically the model is the final goal of crystallography From Experiment to Model 3/56

Visualising a Model ball and stick CPK (space filling) C α trace(smooth) C α trace (coloured by B-factor) ball-and-stick (coloured by B-factor) ribbons The PDB File 4/56

Storing Structural Data: the PDB-File Macromolecular Structural Data from crystallography or NMR are stored at the Protein Data Bank (PDB, www.pdb.org), or the Nucleic Acid Database (NDB, ndbserver.rutgers.edu). The data are stored as PDB-files. Access to the PDB is free. Small molecule data are stored in the Cambridge Data Base (CSD), a commercial product for which a license must be obtained. Small molecule data are stored in CIF-format (which we will not discuss). The PDB-File 5/56

The PDB file an Example HEADER LIGASE 28-APR-99 1CLI TITLE AUTHOR REMARK X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDE C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK 2 RESOLUTION. 2.50 ANGSTROMS.... CRYST1 71.170 211.680 94.450 90.00 90.00 90.00 P 21 21 21 16... ATOM 1 N THR A 5 15.163 80.897 61.279 1.00 20.99 N ATOM 2 CA THR A 5 15.093 82.326 61.723 1.00 22.09 C ATOM 3 C THR A 5 16.450 83.017 61.598 1.00 21.68 C... The PDB File 6/56

Storing Structural Data: the PDB-File A PDB-file is a simple text file. It contains a header with supplemental information (authors, compound, publication, etc.), the crystallographic space group and unit cell dimensions. The main part of the file are ATOM entries, one per line. An atom entry contains atom type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor. The PDB-File 7/56

Occupancy: An Example of Multiple conformation Initially the model contained only one position for the Tyrosine. But the electron density map suggests that in about half the molecules in the crystal, the side chain of the Tyrosine points in a different direction this can be modelled by setting the occupancies for both orientations of the side chain to 0.5. The PDB-File 8/56

Temperature factor of an Atom B factor Even though data are usually collected at 100 K, atoms are not immobile but vibrate thermal motion. The (isotropic) temperature (or B ) factor describes the vibration as a sphere within which the atom oscillates. This is quite a coarse assumption. At high resolution (< 1.6Å), when enough data are available, the vibrations in each of the three directions can be described separately. In that case, 6 parameters are necessary to describe the thermal motions. They are called Anisotropic Displacement Parameters (ADP). A low B-factor indicates a rigid, stable region, while a high B-factor indicates flexibility (e.g. at loops). Later will be explaned why ADP s cannot always be used and the less accurate isotropic B-factor must be used instead. The PDB-File 9/56

Illustration of the B factor Isotropic B factors Anisotropic B factors Spherical movement of atoms Ellipsoidal movement of atoms more exact The PDB-File 10/56

Data Reliability: The Data to Parameter Ratio Data to Parameter Ratio 11/56

Reliability of Data: The Data to Parameter Ratio Measurements are inexact and only approximations. The more often a value is measrued the more trustworthy is becomes: The error estimate becomes better. In macromolecular crystallography we want to determine at least the coordinates for every atom of the structure, i.e., we require 3 data points for every position. The more data were collected for a fixed number of paramersthe more reliable our model can be. We aim at a high data to parameter ratio. Data to Parameter Ratio 12/56

Data to Parameter Ratio: Example Estimates Resolution[Å] refined parameters a data/parameters ratio 3.0 x,y,z 0.9:1 2.3 x,y,z; B 1.5:1 1.8 x,y,z; B 3.1:1 1.5 x,y,z; B 5.4:1 1.5 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 2.4:1 1.1 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 6.1:1 0.8 x,y,z; U 11 U 12 U 13 U 23 U 22 U 33 16:1 a x,y,z: coordinates; B: isotropic B-value; U ij : anisotropic B-values G. Sheldrick Effectively below 1.8Å, there would not be enough data points to create a reliable model. The data to parameter ratio can be improved by additional (bio ) chemical etc. information. Data to Parameter Ratio 13/56

An Example: Data to Parameter Ratio (1/7) Scenario Measure data along a graph Experiment 1: High resolution, 21 data points with errors Experiment 2: Low Resolution, 3 data points with errors 12 12 10 21 measurements Ideal: f(x)=x 2 10 3 measurements Ideal: f(x)=x 2 8 8 6 6 4 4 2 2 0 0 2 2 4 2 1 0 1 2 3 4 2 1 0 1 2 3 Data to Parameter Ratio 14/56

An Example: Data to Parameter Ratio (2/7) Two Models Model 1: g(x) = g 2 x 2 + g 1 x + g 0 Model 2: h(x) = h 3 x 3 + h 1 x + h 0 Both Models contain three parameters, i.e., at least three data points are required for their unambiguous determination. Data to Parameter Ratio 15/56

An Example: Data to Parameter Ratio (3/7) Fitting High Resolution Data 12 10 8 6 4 2 0 2 4 data x 2 Model x 3 Model 2 1 0 1 2 3 1.19x 2 + 0.00x 0.51 χ 2 = 1.14 1 good 0.16x 3 + 0.52x + 0.47 χ 2 = 22.4 1 bad Data to Parameter Ratio 16/56

An Example: Data to Parameter Ratio (4/7) Remarks on χ 2 χ 2 is a common error estimator in statistics. χ 2 should be close to 1 for a good model. χ 2 makes a clear distinction between the two models. The reliability of χ 2 depends on a good estimate of the errors of the data points. Data to Parameter Ratio 17/56

An Example: Data to Parameter Ratio (5/7) Fitting Low Resolution Data 12 10 8 6 4 2 0 2 4 data x 2 Model x 3 Model 2 1 0 1 2 3 0.72x 2 + 0.00x + 1.17 0.48x 3 2.66x + 2.62 Data to Parameter Ratio 18/56

An Example: Data to Parameter Ratio (6/7) Problems with Fitting Low Resolution Data: Both Models fit the data perfectly. No error estimates because #data = #parameters. Additional knowledge is required to decide about the correct model. Data to Parameter Ratio 19/56

An Example: Data to Parameter Ratio (7/7) Fitting Low Resolution Data Constraints Assuming Constraint: data passes through (0, 0) Model 1: g(x) = g 2 x 2 + g 1 x +g 0 Model 2: h(x) = h 3 x 3 + h 1 x +h 0 12 10 8 6 4 2 0 2 4 data x 2 Model constraint x 3 Model constraint 2 1 0 1 2 3 0.94x 2 0.15x χ 2 = 1.30 0.83x 3 5.35x χ 2 = 14.4 Data to Parameter Ratio 20/56

Crystallographic Model Building Model Building 21/56

Model Building: Getting Started The first steps in building the model consist of finding larger groups of residues with special features. The Secondary Structure Elements of proteins are good starting points. In proteins this is the (C α ) main chain, in nucleic acids the position of the bases. α helices are particularly easy to locate, even at medium to low resolution (2.5 4Å). Model Building 22/56

Directionality of α Helices From the main chain (C α chain) one cannot determine the direction, nor which part of the sequence it covers. One gets help from the so-called Christmas tree: the side chains of an α helix point towards the N terminal end of the protein chain. Model Building 23/56

β Strands The other secondary structure element of proteins, β strands are also striking but more difficult to build. Especially the direction of the peptide chain can be difficult to find. Model Building 24/56

Sequence Docking The secondary structure basically is a Poly-Alanine model with no sequence information. Selenomethionine substituted proteins have become very popular for MAD experiments. The heavy selenium atoms are easy to find in the electron density map and help docking the sequence to the map. Disulphide bridges or metals bound to an active centre can also be helpful. Model Building 25/56

Automated Model Building Until a couple of years ago, a crystallographer had to place every residue by hand. At resolution better than, say, 2.5Å building is extremely facilitated by programs like Arp/Warp (A. Perrakis, V. Lamzin), Buccaneer (K. Cowtan), or Resolve (T. Terwilliger), which automatically build large parts of the structure in a couple of hours. Model Building 26/56

Manual Model Building Computer programs do not know about biology, certainly not of a specific molecule/structure. Human interaction is therefore required to pay attention to: presence and identification of ligands and/or metal ions (from crystallisation or protein preparation) special interaction for complexes exceptions from standard values used in refinement correct placement of solvent (water) molecules Model Interpretation Model Building 27/56

Hydrogen Atoms? X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the total number of electrons. Hydrogen atoms only have one electron. They cannot be detected by X-ray diffraction (unless with very high resolution data < 1Å). During refinement, hydrogens are treated as riding atoms, that is, in a fixed position relative to the groups they belong to (like the carbons of a phenylalanine ring). Instead of completely ignoring hydrogens, this method improves the quality of the model and also aids to keep the correct distances to neighbouring groups. Because of the fixed position, riding atoms do not increase the number of parameters. Model Building 28/56

Empty Space? The Solvent Region Arrangement of molecules in the unit cell Electron density map The holes in both pictures are not vacuum. They are filled with solvent, i.e., mostly water molecules. They are disordered, therefore one does not see explicit density in these parts of the crystal. Yet, they still contribute (a little) to the diffraction pattern at low resolution. The treatment of the solvent region in crystallography leaves space for improvement. Model Building 29/56

Model Refinement Model Refinement 30/56

Refinement & Building Model Building describes the construction of the model, addition and deletion of atoms and ligands. It is mostly done by the crystallographer in front of a computer screen. Model Refinenemt describes the improvement of that model to better match the experimental data ( F meas (hkl) ). It is mostly done by computer programs. The computer program tries small changes of the coordinates and modifcation of the temperature factors to minimise the difference between calculated and measured amplitudes. Model Refinement 31/56

Excursus: Crystallographic Theory Given the structure factors F meas (hkl) F meas (hkl) exp iφ(hkl), the electron density at position (x, y, z) is given by the Fourier transformation ρ(x, y, z) = 1 V unit cell h,k,l F meas (hkl) e iφ(hkl) e 2πi(hx+ky+lz) Once a model is known with atom coordinates (x j, y j, z j ), the structure factors can be calculated from the spherical atomic scattering factors f j by F calc (h, k, l) = j f j e 2πi(hx j+ky j +lz j ) (1) The spherical atomic scattering factors f j can be calculated from per atom properties. They are also tabulated (e.g. in the International Tables for Crystallography, Volume C). They include the effect of the temperature factor. Model Refinement 32/56

Excursus: Crystallographic Theory There are two sources for the intensities I(hkl): I meas (hkl) = F meas (hkl) 2, which are measured from the X-ray experiment I calc (hkl) = F calc (hkl) 2 calculated from model coordinates. Model refinement minimises the difference between calculated and measured structure factor amplitudes (e.g. with least-squares-methods). Model Refinement 33/56

Initial Map Generation Amplitudes F (hkl) initial Map initial Model Phases φ(hkl) For the first map, phases were determined with MAD, or SIR, or Molecular Replacement, etc. These phases are generally of low quality, i.e., they have large errors compared to the real values. Model Refinement 34/56

model refinement by program (checks chemical correctness) φ calculate map new model F build model/ match model to map better w.r.t. map! data The model is created/modified based on the map. The map is calculated using the phases from the model. Therefore, the new model is biased against the old model: errors may persist. Model Refinement 35/56

Model Building and Refinement (1/2) Creating a model from X-ray data is an iterative process consisting of model building and refinement. Refinement: global improvement of the model with respect to the experimental data. Coordinates of all atoms together with their temperature factors (and sometimes, at very high resolution, even the occupancy), are moved in order to minimise the difference between the measured intensities and the ones calculated from the model. Refinement 36/56

Model Building and Refinement (2/2) Model Building: local improvement of the model with respect to the experimental data. Atoms are added, removed, or moved in order to ensure that 1. the model makes sense bio chemically (proximity of atoms, H-bonding, position of solvent molecules, etc.) 2. the model fits the calculated electron density (e.g. check for multiple conformations) Refinement 37/56

Restraints and Constraints The reflection data alone would not be sufficient to create a trustworthy model at worse than, say, 1.5Å. There are too many parameters. Therefore it is necessary to incorporate additional information. The re are two types of auxiliary information: restraints and constraints. Refinement 38/56

Restraints and Constraints Constraints reduce the number of parameters. They are expression like Property X must have value Y e.g.: temperature factor is isotropic instead of anisotropic : 4 parameters per atom instead of 9 parameters per atom Restraints increase the number of data. Should be or should be approximately expressions, e.g. distance (N C α ) 1.458Å. Restraints used in refinement encompass bond lengths and bond angles. They are important for macromolecular crystallography, and solving a structure without them would be impossible. Refinement 39/56

Traps: Local Minima Refinement programs cannot cross this barrier they would get stuck in the local minimum and could not move the Phenylalanine into the right position. These local minima and the vicious circle make validation of the model necessary. Model Refinement 40/56

Refinement: R and R free R and R free 41/56

The R Value The difference between calculated and measured amplitudes is a so-called R value R = hkl ( F meas F calc ) hkl ( F meas ) For small molecules, R values between 2% and 5% are normal, for macromolecules, the range is approximately 10% 30%. As a rule of thumb the R value should be about 1/10 of the resolution: a 2.5Å structure should have an R value of 0.25 = 25%. R and R free 42/56

Refinement and Overfitting For macromolecular molecules, the data to parameter ratio is not very high at a normal resolution range. Therefore, the R value can be nearly arbitrarily reduced by adding more and more atoms that were not really present in the crystal structure or allowing positions that chemically do not make much sense (stereochemical clashes). This is called overfitting the data. Refinement 43/56

Quality Measures (2): The R free -value One measure to reduce overfitting is the R free value. About 5% 10% of the reflections are excluded both from refinement and model building. They remain unconsidered and are like an independent judge : after refinement, the R free value is calculated like the R value, but with the excluded reflections. The two values should not differ too much (model errors) but should also not be too close (model bias). The R free value is common in statistics, but was introduced to crystallography only in the mid 90 s by Axel Brünger. Refinement 44/56

Structure Validation Validation 45/56

Why Validation? Experimental data never free of errors Scientists never free of prejudice Compared to other technical or physical disciplines, the errors in X-ray experiments are huge. It is easy to create erroneous models non-deliberately. The results - structural models - are often used by non-crystallographers. They must be able to check the quality without knowing too much about crystallography. Validation 46/56

Photoactive Yellow Protein: 1989 and 1995 1989 1995 This model was published in 1989 (PDB entry 1phy) The correct version: published six years later (PDB entry 2phy) Kleywegt, Acta D(2000), D56 NB: The first structure was published before usage of the R free and other means of validation. It is nowadays very unlikely that such coarse misinterpretations happen. Validation 47/56

Caveat: Modelling Models The structure of TBP, the TATA-box binding protein (TBP or TFIIDτ) was published in 1992 (Nikolov et al., Nature 360, pp.40 46). The shape of the molecule suggested that the TATA box sits straight in the groove of the protein. The structure of the complex, published a year later by Kim et al. (Nature 365, pp. 520 527) revealed that the DNA was actually heavily bent. Validation 48/56

Caveat: What You see is What You get? Another issue with PDB files is that they contain more information than a graphical viewer might be able to display. Many crystallographers include atoms/residues into their structures without experimental support and set their occupancy to zero. While this chemically makes sense, this procedure is error prone for users of the structure. Validation 49/56

How to Validate Validation means estimation of the model in comparison with the data. However, since the model was created by refinement against the data, the model is biased. Therefore, there is need for independent factors. All information can be used 1. that did not participate in the creation of the model/ minimisation of the model data difference 2. of which ideal/ average values are known. This means that these information must be the same or similar for all proteins. Validation 50/56

The Real Space R factor R and R free are global figures of merit: one number describes the quality of the whole structure. A local figure of merit is the real space R factor or real space correlation coefficient between model and electron density map. It expresses the fit between the electron density and the model. Validation 51/56

Dihedral Angles the Ramachandran-Plot The Ramachandran-plot is probably the most famous validation tool. It is based on the two dihedral angles ψ and ϕ. Φ is the angle between the two planes defined by C i 1 N i C α and N i C α i C i. Ψ is the angle between the two planes of N i C α i C i and C α i C i N i+1 Validation 52/56

The Ramachandran-Plot The Ramachandran plot shows the φ vs. ψ angles for a structure and the most probable regions derived from the 500 best determined protein structures. β strand α helix left handed α helix Interactive Ramachandran window of the model building program Coot. Everything outside the shaded region is an outlier Validation 53/56

The Kleywegt Plot Even more information can be read from the Ramachandran plot, if there are more than one copy of a molecule: the two (or more) copies should be rather similar to each other. If one plots the Ramachandran plot for all molecules into the same diagram and connects corresponding residues, one should NOT obtain a picture like this. Kleywegt, Acta D(2000), D56 Validation 54/56

Validation Tools also for non-crystallographers Various programs are available to check the quality of a PDB-file, e.g. WhatIF SFcheck ProCheck MolProbity The MolProbity program is available online http://molprobity.biochem.duke.edu One can upload a PDF-file or enter a PDB ID-code and various plots. It even checks the flip states of Asn, Gln, His-residue based on possible hydrogen bondings. Validation 55/56

Validation: Summary Most of the pretty pictures about proteins represent structures determined by X ray diffraction. But do not be deceived by colours and artistic compositions. Everyone who make use of PDB files / structural data should be aware of possible pitfalls. 1. Read the header information. 2. Consider the resolution and data quality 3. Does the quality and resolution match allow for the details you want to extract? 4. Make use of programs that examine structure and (if available/possible) data Validation 56/56