Data Quality and Uncertainty The power of GIS analysis is based on the assembly of layers of data, but as data layers increase, errors multiply (and quality decreases). Garbage in, garbage out. High quality data are expensive Analysis is worthless without good data quality http://www.colorado.edu/geography/gcraft/notes/error/error_f.html Accuracy, Precision, Data quality and Errors Accuracy is the degree to which information on a map or in a digital database matches true or accepted values. = ~ absence of errors Precision refers to the level of measurement and exactness of description in a GIS database. = ~ data quality Precise data--no matter how carefully measured--may be inaccurate. High precision does not indicate high accuracy nor does high accuracy imply high precision. High accuracy and precision are both expensive. http://www.colorado.edu/geography/gcraft/notes/error/error_f.html 1
Accuracy and Precision If there are natural variations in either the instruments used, or the object measured, this affects both accuracy and precision. Data quality: Scale, precision and accuracy Mapping tradition Precision yardstick: = the width of a line or ~ 0.5mm NTS/NTDB 1:50,000 = < 25 metres BC TRIM: 1:20,000 = 10 metres BC/Federal: 1:250,000 = 125 m These values are also referred to as positional accuracy 2
Raster precision = pixel size resolution e.g. Landsat 30m, SPOT 5-10m, GeoEye 50cm 5m SCALE and PRECISION (not accuracy) Data from a smaller scale has lower resolution (precision) Details, number of features decrease with smaller scale [both spatial location details AND attributes] 3
Data and display Scale Map scale higher resolution shouldn t be used at smaller scales (too much data) and vice versa (too little). Degree of flexibility = 0.5 to 2? Too little detail Too much detail? Coastline and lake boundaries: location uncertainty related to tides and fluctuating water levels Materials by Austin Troy 2008 Image source: Esri Uncertainty especially in natural resources and gradual boundaries Partly subjective: 10 people might digitise 10 different sets of lines polygons and attributes Similarly for soils and geology Field checking needed 4
Classification accuracy and precision e.g. vegetation 1. Low precision, high likely accuracy: low cost, high certainty e.g. Land and Water 2. Intermediate: Biogeoclimatic Ecosystem Classification e.g. Interior Cedar Hemlock Cariboo wet cool 3. BC ecosystem site series : sedge fern, etc.. High precision, lower accuracy likely higher cost, lower certainty http://www.for.gov.bc.ca/hre/resources/classificationreports/index.html http://www.for.gov.bc.ca/hre/becweb/system/how/index.html Earth Observation for Sustainable Development (EOSD) project, Canadian Forest Service from ~2000 Landsat imagery (30m resolution) download as shapefiles, updates? Available for all of Canada from geobase.ca 5
DEM example: The effect of scale on precision e.g. Forests for the World BC TRIM DEM 25m grid (precision), vertical accuracy 10-20m 6
City DEM 1 metre (from contours): higher precision, but with interpolation errors LiDAR Forests for the World, 2009: precision 30cm 7
Factors affecting loss in data quality Density of observations model sufficiency Area cover gaps due to accessibility and unavailability Age of data precision and changes Instrument inaccuracies and human processing: different agencies using different methods different precision and accuracy (?) 1888 First Canadian Topographic series 1982 NTDB 1:50,000 1:40,000 (Eldon sheet), Interval 200 Interval 40 metres Note the Lack of gulleys on north side of Bourgeau Lake 8
ETM+ 1999 Inconsistent processing methods different groups/agencies (e.g. provinces) using different methods or classifications sample polygon e.g. soil type or vegetation TRIM DEM (from Google maps) created by >1 company 9
DEMs inadequate sample points due to snow saturation Elevation artefacts, Homathko Icefield, Coast Mountains TRIM GPS data - Satellites launched 1989-1994 data now usually available Initially designed to pinpoint locations and reduce civilian casualties 10
GPS: Dilution of Precision DOP is an indicator of the quality of the geometry of the satellites PDOP < 8.0 acceptable PDOP < 4.0 : excellent High DOP (poor) Low DOP (good) Number of GPS satellites; more satellites = lower dilution of precision (DOP) 11
Selective Availability (SA) The random error, added to GPS signals before 2000.. up to 100 metres error by scrambling last 3 decimals of time signal Turned off May 1, 2000 at midnight; No intent to ever use it again May 1, 2000 Selective Availability on May 3, 2000 Selective Availability Off Differential Correction industry solved the problem of SA Uncorrected GPS ~10m Corrected (DGPS) ~1m 12
GPS data quality current factors Types of Error Ionosphere Clock Ephemeris Troposphere Receiver Multipath 4.0 metres 2.1 m 2.1 m 0.7 m 0.5 m 1.0 m Total 10.4 m ( Selective availability ~ 100m) Pre-2001 Time and changes: (outdated data) - This is a huge factor in a country as large as Canada Spatial Gradual 'natural' changes: river courses, glacier recession Catastrophic change: fires, floods, landslides Seasonal and daily changes: lake / sea / river levels Man-made: urban development, new roads, cut blocks Attributes Attribute change: forest growth (height etc.), road surfacing 13
Physical changes Mississippi River many changes over the centuries Edziza Park glacier change 1950-2000 Frequency of mapping and updating # Map versions range from 1 (north) to ~10 (most popular in south) 14
Prince George 93G015 2000 / 2010 roads updated from 1980, but nothing else..need for documentation of date and collection methods in metadata Process for national data updates Elevations - assume unchanged Hydrography unchanged Glaciers change, but not updated ~ 20m retreat per year Roads change, usually updated Metadata ( data about data ) should give sources and dates - e.g. year of aerial photography (or satellite imagery) 15