Accuracy and Uncertainty

Similar documents
Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Geog183: Cartographic Design and Geovisualization Spring Quarter 2016 Lecture 15: Dealing with Uncertainty

Uncertainty in Geographic Information: House of Cards? Michael F. Goodchild University of California Santa Barbara

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Basics of GIS. by Basudeb Bhatta. Computer Aided Design Centre Department of Computer Science and Engineering Jadavpur University

Software. People. Data. Network. What is GIS? Procedures. Hardware. Chapter 1

Quality and Coverage of Data Sources

Introduction to GIS. Phil Guertin School of Natural Resources and the Environment GeoSpatial Technologies

Representation of Geographic Data

Introduction to GIS I

NR402 GIS Applications in Natural Resources. Lesson 9: Scale and Accuracy

Least-Cost Transportation Corridor Analysis Using Raster Data.

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Cell-based Model For GIS Generalization

UNCERTAINTY AND ERRORS IN GIS

A General Framework for Conflation

GIS for ChEs Introduction to Geographic Information Systems

USE OF RADIOMETRICS IN SOIL SURVEY

Geog 469 GIS Workshop. Data Analysis

Version 1.1 GIS Syllabus

Errors in GIS. Errors in GIS. False: If its digital, it must be correct, or at least more correct than a manual product

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

Data Quality and Uncertainty

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

Projections & GIS Data Collection: An Overview

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

UNIVERSITY OF TECHNOLOGY, SYDNEY

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Overview key concepts and terms (based on the textbook Chang 2006 and the practical manual)

Lecture 1: Geospatial Data Models

METADATA. Publication Date: Fiscal Year Cooperative Purchase Program Geospatial Data Presentation Form: Map Publication Information:

Supplementary material: Methodological annex

Data Quality and Uncertainty. Accuracy, Precision, Data quality and Errors

Spatial Process VS. Non-spatial Process. Landscape Process

Geostatistics and Spatial Scales

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

Introduction-Overview. Why use a GIS? What can a GIS do? Spatial (coordinate) data model Relational (tabular) data model

Lecture 9: Reference Maps & Aerial Photography

Geographically weighted methods for examining the spatial variation in land cover accuracy

SPOT DEM Product Description

Geographic Information Systems. Introduction to Data and Data Sources

GIST 4302/5302: Spatial Analysis and Modeling

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands.

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Popular Mechanics, 1954

It s a Model. Quantifying uncertainty in elevation models using kriging

RESEARCH ON SPATIAL DATA MINING BASED ON UNCERTAINTY IN GOVERNMENT GIS

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

In this exercise we will learn how to use the analysis tools in ArcGIS with vector and raster data to further examine potential building sites.

GIST 4302/5302: Spatial Analysis and Modeling

USING GIS IN WATER SUPPLY AND SEWER MODELLING AND MANAGEMENT

NR402 GIS Applications in Natural Resources

Using Fuzzy Logic as a Complement to Probabilistic Radioactive Waste Disposal Facilities Safety Assessment -8450

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

USING GIS CARTOGRAPHIC MODELING TO ANALYSIS SPATIAL DISTRIBUTION OF LANDSLIDE SENSITIVE AREAS IN YANGMINGSHAN NATIONAL PARK, TAIWAN

Illustrator: Vector base Each line/point store some sort of information Mapping Representation of the world

Lesson 6: Accuracy Assessment

UNCERTAINTY EVALUATION OF MILITARY TERRAIN ANALYSIS RESULTS BY SIMULATION AND VISUALIZATION

Automatic Determination of Uncertainty versus Data Density

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

Dr.Weerakaset Suanpaga (D.Eng RS&GIS)

Evaluating Urban Vegetation Cover Using LiDAR and High Resolution Imagery

Geographical Information System GIS

A MultiGaussian Approach to Assess Block Grade Uncertainty

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Name NRS 409 Exam I. 1. (24 Points) Consider the following questions concerning standard data for GIS systems.

Chapter 02 Maps. Multiple Choice Questions

Give 4 advantages of using ICT in the collection of data. Give. Give 4 disadvantages in the use of ICT in the collection of data

GEO-DATA INPUT AND CONVERSION. Christos G. Karydas,, Dr. Lab of Remote Sensing and GIS Director: Prof. N. Silleos

INTEGRATION OF GIS AND MULTICRITORIAL HIERARCHICAL ANALYSIS FOR AID IN URBAN PLANNING: CASE STUDY OF KHEMISSET PROVINCE, MOROCCO

Formats for Expressing Acceptable Uncertainty

Harris: Quantitative Chemical Analysis, Eight Edition CHAPTER 03: EXPERIMENTAL ERROR

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Lecture 2. A Review: Geographic Information Systems & ArcGIS Basics

Experimental Uncertainty (Error) and Data Analysis

GIS = Geographic Information Systems;

INTRODUCTION TO GIS. Dr. Ori Gudes

Introducing GIS analysis

Display data in a map-like format so that geographic patterns and interrelationships are visible

Overview of Statistical Analysis of Spatial Data

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

Quality Assessment and Uncertainty Handling in Uncertainty-Based Spatial Data Mining Framework

Propagation of Errors in Spatial Analysis

Improving Spatial Data Interoperability

Harris: Quantitative Chemical Analysis, Eight Edition CHAPTER 03: EXPERIMENTAL ERROR

Understanding Geographic Information System GIS

Introduction to Geographic Information Systems (GIS): Environmental Science Focus

Data Structures & Database Queries in GIS

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

VISUAL ANALYTICS APPROACH FOR CONSIDERING UNCERTAINTY INFORMATION IN CHANGE ANALYSIS PROCESSES

KNOWLEDGE-BASED CLASSIFICATION OF LAND COVER FOR THE QUALITY ASSESSEMENT OF GIS DATABASE. Israel -

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

Spatial Webs. Michael F. Goodchild University of California Santa Barbara

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

4th HR-HU and 15th HU geomathematical congress Geomathematics as Geoscience Reliability enhancement of groundwater estimations

a system for input, storage, manipulation, and output of geographic information. GIS combines software with hardware,

Bathymetry Data and Models: Best Practices

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Transcription:

Accuracy and Uncertainty

Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule)? How things have changed in a digital world. Spatial data quality issues (Metadata).

Why an Issue? Imperfect or uncertain reconciliation [science >< practice] [concepts >< application] [analytical capability >< social context] It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Changing (temporal) patterns (e.g., census) Fractals (e.g., coastlines)

Why an Issue? GIGO (?) Often little is known of the input data quality, and far too much is assumed about the output quality GIS fosters data merging Engineering Planning Taxation Single municipal database

Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule)? How things have changed in a digital world. Spatial data quality issues (Metadata). Accounting for uncertainty.

Sources of Error Measurement error: different observers, measuring instruments, etc. Setup Observer Procedure Software Environment Error Equipment

Sources of Error Specification error: omitted variables (regression discussion) Ambiguity, vagueness and the quality of a GIS representation Blunders or spurious errors Uncertainty has become a catch-all for incomplete representations or a quality measure. It can be characterized and managed but not eliminated.

Uncertainty Uncertainty: a reflection of our imperfect and inexact knowledge of the world. Data uncertainty: our observations or measurements encompass ambiguity / inherent variability. (e.g., standard deviation) Rule uncertainty: how we reason with observations we are unsure of the conclusions we can draw from even perfect data. (e.g., linear versus non-linear models)

Uncertainty: Data to Rule Conception Measurement & Representation Analysis Real World

Data Uncertainty Spatial uncertainty Natural geographic units? Multivariate extensions? Census vs samples Vagueness Statistical, cartographic (i.e., generalization), cognitive Ambiguity Values, language (semantics, ontologies) http://xkcd.com/776/

Data Uncertainty (& MAUP) Regions Uniformity (one or all aspects?) Relations typically grow stronger when based on larger geographic units (statistical uncertainty appears to decrease while our ability to assign characteristics to individuals decreases) # of regions r 48.2189 24.2963 12.5757 6.7649 3.9902

Dealing with Data Uncertainty In fuzzy set theory, it is possible to have partial membership in a set membership can vary, e.g., from 0 to 1 this adds a third option to classification: yes, no, and possibly Fuzzy approaches have been applied to the mapping of soils, vegetation cover, and land use

Fuzzy Membership Functions 1 Short Medium Tall Membership 0 5 Height 6

Measurement/Representation Representational models filter reality differently Vector crisp Raster can be fuzzy Savannah Forest Forest membership 0.9 1.0 0.5 0.9 0.1 0.5 0.0 0.1

Measures of Uncertainty Given that uncertainty exists in every layer in a GIS, and that decisions are made based on analyses of that data, how can we quantify the certainty we have in our data? The measures of uncertainty we can use are dependent on the type of data we are working with (NOIR). Measures of central tendency Measures of dispersion

How to measure the accuracy of nominal attributes? e.g., a vegetation cover map The confusion or misclassification matrix compares recorded classes (the observations) with classes obtained by a more accurate process, or from a more accurate source (the reference) Measures of Uncertainty

Examining every parcel (or pixel) may not be practical Rarer classes should be sampled more often in order to reliably assess accuracy sampling is often stratified by class (recall notes on spatial sampling) Collecting the Reference Data

The bolded numbers reflect correct classification (i.e., where the land use in the database equaled the land use observed in the field). The off-diagonal numbers reflect incorrect land use records in the database. Land use: in the field A B C D E Total Land use: in the database A 80 4 0 15 7 106 B 2 17 0 9 2 30 C 12 5 9 4 8 38 D 7 8 0 65 0 80 E 3 2 1 6 38 50 Total 104 36 10 99 55 304 Misclassification Matrix

The bolded numbers reflect areas where the land use has not changed. The offdiagonal numbers reflect areas where the land use has changed over time. Land use: at time B A B C D E Total A 80 4 0 15 7 106 B 2 17 0 9 2 30 Land use: at time A C 12 5 9 4 8 38 D 7 8 0 65 0 80 E 3 2 1 6 38 50 Total 104 36 10 99 55 304 Transition Matrix Markov models

Misclassification Statistics Percent correctly classified total of diagonal entries divided by the grand total, times 100 209/304*100 = 68.8% but chance would give a score of better than 0 Kappa statistic (more) normalized to range from 0 (chance) to 100 evaluates to 58.3%

Per-Polygon and Per-Pixel Assessment Error can occur in the attributes of polygons, but it can also occur in the positions of the boundaries better to conceive of the map as a field, and to sample points this also reflects how the data are likely to be used (to query the class at a point)

Assessment An example of a vegetation cover map. Two strategies for accuracy assessment are available: to check by area (polygon), or to check by point. In the former case a strategy would be devised for field checking each area, to determine the area's correct class. In the latter, points would be sampled across the state and the correct class determined at each point. It will be much easier to sample using points.

Errors typically distort interval / ratio measurements by small amounts (unless a blunder or outlier) Accuracy refers to the amount of deviation from the true value Precision the variation among repeated measurements, and also the amount of detail in the reporting of a measurement Measures of Uncertainty

The term precision is often used to refer to the repeatability of measurements. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher (no apparent bias). Precision

The amount of detail in a reported measurement [precision] (e.g., output from a GIS) should reflect its accuracy 14.4 m implies an accuracy of ±0.05 m 14 m implies an accuracy of ±0.5 m Excess precision should be removed by rounding Reporting Measurements

Measuring Accuracy Root Mean Square Error is the square root of the average squared error or deviation the primary measure of accuracy in map accuracy standards and GIS databases e.g., elevations in a digital elevation model might have an RMSE of 6.1m (TRIM Specs) the abundances of errors of different magnitudes often closely follow a Gaussian or normal distribution where y i is an elevation from the DEM, y j is the "true" known or measured elevation of a test point and N is the number of sample points. Standard deviations

Uncertainty based on an assumed RMSE of 7 m. The Gaussian distribution with a mean of 350 m and a standard deviation of 7 m gives a 95% probability that the true location of the 350 m contour lies in the colored area, and a 5% probability that it lies outside. Visualizing Uncertainty

Positional accuracy of features on a paper map is roughly 0.5mm on the map e.g., 0.5mm on a map at scale 1:20,000 gives a positional accuracy of 10m this is approximately the U.S. National Map Accuracy Standard, used by BC TRIM and also allows for digitizing error, stretching of the paper, and other common sources of positional error A Useful Rule of Thumb

Correlation of Errors Absolute positional errors may be high reflecting the technical difficulty of measuring distances from the Equator and the Greenwich Meridian, or elevations from mean sea level or the geoide Relative positional errors over short distances may be much lower positional errors tend to be strongly correlated over short distances As a result, positional errors can largely cancel out in the calculation of properties such as distance or area

Measures of central tendency: Mean (Arithmetic, geometric, harmonic), median or mode? The form of the relation? (linear, nonlinear, multiple regression models) Alternative approaches to many operations such as slope, spatial interpolation. More than just incomplete knowledge. Rule Uncertainty Interpolation methods

What is the slope? Use a 2x2 window or a 3x3 window? Maximum difference (2262-2017) Best fitting surface Several other options Slope Uncertainty

Slope Uncertainty SAGA GIS has nine different slope derivation methods

Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule) How things have changed in a digital world. Spatial data quality issues (Metadata) Accounting for uncertainty

Standards, such as the National Map Accuracy Standard (NMAS), defined positional accuracy but little else. Most maps were made by governmental mapping agencies with typically high standards. Scale, accuracy and resolution are linked on paper maps. The Digital Divide: Paper Maps

Digital data can come from anyone (unknown standards) Data entry can create problems Digitizing errors Conflation of data layers Datum differences, scale differences Georegistration (rubber sheeting) Mixed pixels An OpenStreetMap.org trace Mandates Software GIS Data Filter Filter Mandates The Digital Divide: Digital Data

Trust in the data provider? How the big three of online maps compare to a more precise source.

Geocoding results Bing Maps Google Maps Yahoo! Maps Threshold Quantile 95% Threshold Outlier 67.08 15.36 42.62 111.76 40.81 68.41

Map conflation problems Digitizing errors Data Entry

Data entry A common source of error working directly off-of an aerial photograph (ignoring relief displacement) But even orthophotos aren t perfect.

Georegistration error? Rubber sheet transformation

Analysis: Error Propagation Addresses the effects of errors and uncertainty on the results of GIS analysis Typical mathematical or statistical approaches are not appropriate. Almost every input to a GIS is subject to error and uncertainty In principle, every output should have confidence limits or some other expression of uncertainty

Geostatistical interpolation (kriging) Zinc data: Predicted values Zinc data: Estimated standard deviation Showing your [uncertainty] colours

Accounting for Uncertainty Traditional statistics vs spatial statistics spatial autocorrelation is assumed when using spatial data, but it is the antithesis to aspatial statistics [IID] Analytical error propagation is not possible in such cases.

Accounting for Uncertainty Monte Carlo Simulation (MCS) or sensitivity analysis Elevation Repeat 100 times Determine slope Add random value How likely is the slope we determine the actual slope value, given that we know the elevation values have an underlying uncertainty? Generate a random raster Iterators in ArcMap Toolbox Examples of Iterators

Three realizations of a model simulating the effects of error on a digital elevation model. The three data sets differ only to a degree consistent with known error. Error has been simulated using a model designed to replicate the known error properties of this data set the distribution of error magnitude, and the spatial autocorrelation between errors. Error Modelling: M C Simulation

Error Modelling Error in the measurement of the area of a square 100 m on a side. Each of the four corner points has been surveyed; the errors are subject to bivariate Gaussian distributions with standard deviations in x and y of 1 m (dashed circles). The red polygon shows one possible surveyed square (one realization of the error model). In this case the measurement of area is subject to a standard deviation of 200 sq m; a result such as 10,014.603 is quite likely, though the true area is 10,000 sq m. In principle, the result of 10,014.603 should be rounded to the known accuracy and reported as 10,000 sq m.

Living with Uncertainty It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively but we may have no option, especially in disputes that are likely to involve litigation

Some Basic Principles Uncertainty is inevitable in GIS Data obtained from others should never be taken as truth efforts should be made to determine quality Effects on GIS outputs are often much greater than expected there is an automatic tendency to regard outputs from a computer as the truth

Some Basic Principles Use as many sources of data as possible and cross-check them for accuracy Be honest and informative in reporting results add relevant caveats and cautions Interactive Map Disclaimer Data presented should be considered in conjunction with the report. Product demand projections detailed in the report will vary depending on the amount of retrofit activity the 1200 Buildings program stimulates and the extent of retrofit that is implemented in buildings. As there are inherent uncertainties in predicting demand from future activity the product demand projections and findings may not be accurate. Specific advice should be sought and further detailed analysis undertaken before acting or relying on the product demand projections

Accuracy and Uncertainty Why is this an issue? What is meant by accuracy and uncertainty (data vs rule) How things have changed in a digital world. Spatial data quality issues (Metadata)

Metadata Definition: Metadata are "data about data They describe the content, quality, condition, and other characteristics of data. Metadata help a person to locate and understand data. In epistemology the word meta means "about (its own category). Thus metadata is "data about the data".

Metadata Lineage (author, manipulations) Positional accuracy ( x, y, z) Attribute accuracy (misclassification) Completeness (how thorough?) Logical consistency (turn tables) Semantic accuracy (one-to-many) Temporal information (date obtained, entered into system) Air photo taken Event happens Air photo entered into GIS Why event not acted upon?

Examples of Metadata Identification Title? Area covered? Themes? Currentness? Restrictions? Data Quality Accuracy? Completeness? Logical Consistency? Lineage? Spatial Data Organization Vector? Raster? Type of elements? Number? Spatial Reference Projection? Grid system? Datum? Coordinate system?

Examples of Metadata Entity and Attribute Information Features? Attributes? Attribute values? Distribution Distributor? Formats? Media? Online? Price? Metadata Reference Metadata currentness? Responsible party?

A classification of uncertainty

Conclusion Uncertainty is ever-present in GIS analyses from the source data through to the analyses and to the final output. It is important to be conscious of uncertainty in your data / analyses / presentation and, if necessary, determine the impact of it on your results. (MCS) Knowledge of uncertainty allows us to make estimates of the confidence limits of the results. Metadata!