Accuracy and Uncertainty
Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule)? How things have changed in a digital world. Spatial data quality issues (Metadata).
Why an Issue? Imperfect or uncertain reconciliation [science >< practice] [concepts >< application] [analytical capability >< social context] It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Changing (temporal) patterns (e.g., census) Fractals (e.g., coastlines)
Why an Issue? GIGO (?) Often little is known of the input data quality, and far too much is assumed about the output quality GIS fosters data merging Engineering Planning Taxation Single municipal database
Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule)? How things have changed in a digital world. Spatial data quality issues (Metadata). Accounting for uncertainty.
Sources of Error Measurement error: different observers, measuring instruments, etc. Setup Observer Procedure Software Environment Error Equipment
Sources of Error Specification error: omitted variables (regression discussion) Ambiguity, vagueness and the quality of a GIS representation Blunders or spurious errors Uncertainty has become a catch-all for incomplete representations or a quality measure. It can be characterized and managed but not eliminated.
Uncertainty Uncertainty: a reflection of our imperfect and inexact knowledge of the world. Data uncertainty: our observations or measurements encompass ambiguity / inherent variability. (e.g., standard deviation) Rule uncertainty: how we reason with observations we are unsure of the conclusions we can draw from even perfect data. (e.g., linear versus non-linear models)
Uncertainty: Data to Rule Conception Measurement & Representation Analysis Real World
Data Uncertainty Spatial uncertainty Natural geographic units? Multivariate extensions? Census vs samples Vagueness Statistical, cartographic (i.e., generalization), cognitive Ambiguity Values, language (semantics, ontologies) http://xkcd.com/776/
Data Uncertainty (& MAUP) Regions Uniformity (one or all aspects?) Relations typically grow stronger when based on larger geographic units (statistical uncertainty appears to decrease while our ability to assign characteristics to individuals decreases) # of regions r 48.2189 24.2963 12.5757 6.7649 3.9902
Dealing with Data Uncertainty In fuzzy set theory, it is possible to have partial membership in a set membership can vary, e.g., from 0 to 1 this adds a third option to classification: yes, no, and possibly Fuzzy approaches have been applied to the mapping of soils, vegetation cover, and land use
Fuzzy Membership Functions 1 Short Medium Tall Membership 0 5 Height 6
Measurement/Representation Representational models filter reality differently Vector crisp Raster can be fuzzy Savannah Forest Forest membership 0.9 1.0 0.5 0.9 0.1 0.5 0.0 0.1
Measures of Uncertainty Given that uncertainty exists in every layer in a GIS, and that decisions are made based on analyses of that data, how can we quantify the certainty we have in our data? The measures of uncertainty we can use are dependent on the type of data we are working with (NOIR). Measures of central tendency Measures of dispersion
How to measure the accuracy of nominal attributes? e.g., a vegetation cover map The confusion or misclassification matrix compares recorded classes (the observations) with classes obtained by a more accurate process, or from a more accurate source (the reference) Measures of Uncertainty
Examining every parcel (or pixel) may not be practical Rarer classes should be sampled more often in order to reliably assess accuracy sampling is often stratified by class (recall notes on spatial sampling) Collecting the Reference Data
The bolded numbers reflect correct classification (i.e., where the land use in the database equaled the land use observed in the field). The off-diagonal numbers reflect incorrect land use records in the database. Land use: in the field A B C D E Total Land use: in the database A 80 4 0 15 7 106 B 2 17 0 9 2 30 C 12 5 9 4 8 38 D 7 8 0 65 0 80 E 3 2 1 6 38 50 Total 104 36 10 99 55 304 Misclassification Matrix
The bolded numbers reflect areas where the land use has not changed. The offdiagonal numbers reflect areas where the land use has changed over time. Land use: at time B A B C D E Total A 80 4 0 15 7 106 B 2 17 0 9 2 30 Land use: at time A C 12 5 9 4 8 38 D 7 8 0 65 0 80 E 3 2 1 6 38 50 Total 104 36 10 99 55 304 Transition Matrix Markov models
Misclassification Statistics Percent correctly classified total of diagonal entries divided by the grand total, times 100 209/304*100 = 68.8% but chance would give a score of better than 0 Kappa statistic (more) normalized to range from 0 (chance) to 100 evaluates to 58.3%
Per-Polygon and Per-Pixel Assessment Error can occur in the attributes of polygons, but it can also occur in the positions of the boundaries better to conceive of the map as a field, and to sample points this also reflects how the data are likely to be used (to query the class at a point)
Assessment An example of a vegetation cover map. Two strategies for accuracy assessment are available: to check by area (polygon), or to check by point. In the former case a strategy would be devised for field checking each area, to determine the area's correct class. In the latter, points would be sampled across the state and the correct class determined at each point. It will be much easier to sample using points.
Errors typically distort interval / ratio measurements by small amounts (unless a blunder or outlier) Accuracy refers to the amount of deviation from the true value Precision the variation among repeated measurements, and also the amount of detail in the reporting of a measurement Measures of Uncertainty
The term precision is often used to refer to the repeatability of measurements. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher (no apparent bias). Precision
The amount of detail in a reported measurement [precision] (e.g., output from a GIS) should reflect its accuracy 14.4 m implies an accuracy of ±0.05 m 14 m implies an accuracy of ±0.5 m Excess precision should be removed by rounding Reporting Measurements
Measuring Accuracy Root Mean Square Error is the square root of the average squared error or deviation the primary measure of accuracy in map accuracy standards and GIS databases e.g., elevations in a digital elevation model might have an RMSE of 6.1m (TRIM Specs) the abundances of errors of different magnitudes often closely follow a Gaussian or normal distribution where y i is an elevation from the DEM, y j is the "true" known or measured elevation of a test point and N is the number of sample points. Standard deviations
Uncertainty based on an assumed RMSE of 7 m. The Gaussian distribution with a mean of 350 m and a standard deviation of 7 m gives a 95% probability that the true location of the 350 m contour lies in the colored area, and a 5% probability that it lies outside. Visualizing Uncertainty
Positional accuracy of features on a paper map is roughly 0.5mm on the map e.g., 0.5mm on a map at scale 1:20,000 gives a positional accuracy of 10m this is approximately the U.S. National Map Accuracy Standard, used by BC TRIM and also allows for digitizing error, stretching of the paper, and other common sources of positional error A Useful Rule of Thumb
Correlation of Errors Absolute positional errors may be high reflecting the technical difficulty of measuring distances from the Equator and the Greenwich Meridian, or elevations from mean sea level or the geoide Relative positional errors over short distances may be much lower positional errors tend to be strongly correlated over short distances As a result, positional errors can largely cancel out in the calculation of properties such as distance or area
Measures of central tendency: Mean (Arithmetic, geometric, harmonic), median or mode? The form of the relation? (linear, nonlinear, multiple regression models) Alternative approaches to many operations such as slope, spatial interpolation. More than just incomplete knowledge. Rule Uncertainty Interpolation methods
What is the slope? Use a 2x2 window or a 3x3 window? Maximum difference (2262-2017) Best fitting surface Several other options Slope Uncertainty
Slope Uncertainty SAGA GIS has nine different slope derivation methods
Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule) How things have changed in a digital world. Spatial data quality issues (Metadata) Accounting for uncertainty
Standards, such as the National Map Accuracy Standard (NMAS), defined positional accuracy but little else. Most maps were made by governmental mapping agencies with typically high standards. Scale, accuracy and resolution are linked on paper maps. The Digital Divide: Paper Maps
Digital data can come from anyone (unknown standards) Data entry can create problems Digitizing errors Conflation of data layers Datum differences, scale differences Georegistration (rubber sheeting) Mixed pixels An OpenStreetMap.org trace Mandates Software GIS Data Filter Filter Mandates The Digital Divide: Digital Data
Trust in the data provider? How the big three of online maps compare to a more precise source.
Geocoding results Bing Maps Google Maps Yahoo! Maps Threshold Quantile 95% Threshold Outlier 67.08 15.36 42.62 111.76 40.81 68.41
Map conflation problems Digitizing errors Data Entry
Data entry A common source of error working directly off-of an aerial photograph (ignoring relief displacement) But even orthophotos aren t perfect.
Georegistration error? Rubber sheet transformation
Analysis: Error Propagation Addresses the effects of errors and uncertainty on the results of GIS analysis Typical mathematical or statistical approaches are not appropriate. Almost every input to a GIS is subject to error and uncertainty In principle, every output should have confidence limits or some other expression of uncertainty
Geostatistical interpolation (kriging) Zinc data: Predicted values Zinc data: Estimated standard deviation Showing your [uncertainty] colours
Accounting for Uncertainty Traditional statistics vs spatial statistics spatial autocorrelation is assumed when using spatial data, but it is the antithesis to aspatial statistics [IID] Analytical error propagation is not possible in such cases.
Accounting for Uncertainty Monte Carlo Simulation (MCS) or sensitivity analysis Elevation Repeat 100 times Determine slope Add random value How likely is the slope we determine the actual slope value, given that we know the elevation values have an underlying uncertainty? Generate a random raster Iterators in ArcMap Toolbox Examples of Iterators
Three realizations of a model simulating the effects of error on a digital elevation model. The three data sets differ only to a degree consistent with known error. Error has been simulated using a model designed to replicate the known error properties of this data set the distribution of error magnitude, and the spatial autocorrelation between errors. Error Modelling: M C Simulation
Error Modelling Error in the measurement of the area of a square 100 m on a side. Each of the four corner points has been surveyed; the errors are subject to bivariate Gaussian distributions with standard deviations in x and y of 1 m (dashed circles). The red polygon shows one possible surveyed square (one realization of the error model). In this case the measurement of area is subject to a standard deviation of 200 sq m; a result such as 10,014.603 is quite likely, though the true area is 10,000 sq m. In principle, the result of 10,014.603 should be rounded to the known accuracy and reported as 10,000 sq m.
Living with Uncertainty It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively but we may have no option, especially in disputes that are likely to involve litigation
Some Basic Principles Uncertainty is inevitable in GIS Data obtained from others should never be taken as truth efforts should be made to determine quality Effects on GIS outputs are often much greater than expected there is an automatic tendency to regard outputs from a computer as the truth
Some Basic Principles Use as many sources of data as possible and cross-check them for accuracy Be honest and informative in reporting results add relevant caveats and cautions Interactive Map Disclaimer Data presented should be considered in conjunction with the report. Product demand projections detailed in the report will vary depending on the amount of retrofit activity the 1200 Buildings program stimulates and the extent of retrofit that is implemented in buildings. As there are inherent uncertainties in predicting demand from future activity the product demand projections and findings may not be accurate. Specific advice should be sought and further detailed analysis undertaken before acting or relying on the product demand projections
Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule) How things have changed in a digital world. Spatial data quality issues (Metadata)
Metadata Definition: Metadata are "data about data They describe the content, quality, condition, and other characteristics of data. Metadata help a person to locate and understand data. In epistemology the word meta means "about (its own category). Thus metadata is "data about the data".
Metadata Lineage (author, manipulations) Positional accuracy ( x, y, z) Attribute accuracy (misclassification) Completeness (how thorough?) Logical consistency (turn tables) Semantic accuracy (one-to-many) Temporal information (date obtained, entered into system) Air photo taken Event happens Air photo entered into GIS Why event not acted upon?
Examples of Metadata Identification Title? Area covered? Themes? Currentness? Restrictions? Data Quality Accuracy? Completeness? Logical Consistency? Lineage? Spatial Data Organization Vector? Raster? Type of elements? Number? Spatial Reference Projection? Grid system? Datum? Coordinate system?
Examples of Metadata Entity and Attribute Information Features? Attributes? Attribute values? Distribution Distributor? Formats? Media? Online? Price? Metadata Reference Metadata currentness? Responsible party?
A classification of uncertainty
Conclusion Uncertainty is ever-present in GIS analyses from the source data through to the analyses and to the final output. It is important to be conscious of uncertainty in your data / analyses / presentation and, if necessary, determine the impact of it on your results. (MCS) Knowledge of uncertainty allows us to make estimates of the confidence limits of the results. Metadata!