Data Quality and Uncertainty The power of GIS analysis is based on the assembly of layers of data, but as data layers increase, errors multiply - quality decreases. Garbage in, garbage out. High quality data are expensive Uncertainty: our imperfect and inexact knowledge of the world Positional uncertainty Attribute uncertainty Definitional uncertainty Measurement uncertainty www.geog.ucsb.edu/~kclarke/g176b/lecture07.ppt 1
Updating Glacier Polygons Garibaldi Provincial Park Uncertainty due to late lying snow, shadows (highlighted in blue) and debris cover - Often needs statement of % uncertainty or ± margin e.g. area = 12.5 km 2 ± 0.8 Accuracy, Precision, Data quality and Errors Accuracy is the degree to which information matches true values. = ~ absence of errors Precision refers to the level of measurement and exactness. = ~ data quality Precise data--no matter how carefully measured--may be inaccurate. High precision does not indicate high accuracy nor does high accuracy imply high precision. High accuracy and precision are both expensive. http://www.colorado.edu/geography/gcraft/notes/error/error_f.html 2
Accuracy and Precision If there are natural variations in either the instruments used, or the object measured, this affects both accuracy and precision. Data quality: Scale, precision and accuracy Mapping tradition Precision yardstick: = the width of a line or ~ 0.5mm NTS/NTDB 1:50,000 = < 25 metres BC TRIM: 1:20,000 = 10 metres BC/Federal: 1:250,000 = 125 m These values are also referred to as positional accuracy 3
Raster precision = pixel size resolution e.g. Landsat 30m, (Google) GeoEye 50cm 5m SCALE and PRECISION (not accuracy) Data from a smaller scale has lower resolution (precision) Details, number of features decrease with smaller scale [both spatial location details and attributes] 4
Data precision and display Scale Scale higher resolution shouldn t be used at smaller scales (too much data) and vice versa (too little). Degree of flexibility = 0.5 to 2? Too little detail Too much detail? Coastline and lake boundaries: location uncertainty related to tides and fluctuating water levels Materials by Austin Troy 2008 Image source: Esri Uncertainty in natural resources and gradual boundaries Subjective: 10 people might digitise 10 different sets of lines polygons and attributes Consistency required e.g. provincial guidelines And for soils and geology Field checking needed to give accuracy % 5
Classification accuracy and precision, cost and certainty e.g. vegetation 1. Land versus Water (for example) Low precision, high accuracy (?) low cost, high certainty 2. BC Biogeoclimatic Ecosystem Classification - Intermediate e.g. Interior Cedar Hemlock Cariboo wet cool BEC zones 3. BC ecosystem site series : sedge fern, etc.. High precision, lower accuracy (?) higher cost, lower certainty http://www.for.gov.bc.ca/hre/resources/classificationreports/index.html Factors causing loss in data quality A. Scale data - spatial / attributes B. Density of observations and processing C. Area cover gaps due to accessibility D. Age of data precision and changes 6
DEM example: The effect of scale on precision e.g. Forests for the World BC TRIM DEM 25m grid (precision), vertical accuracy 10-20m 7
City DEM 1 metre (from contours): higher precision, but with interpolation errors LiDAR Forests for the World, 2009: precision 30cm 8
b. Processing methods different groups/agencies (e.g. provinces) using different methods or classifications sample polygon e.g. soil type or vegetation TRIM DEM (displayed on Google maps) created by multiple companies DEMs inadequate sample points due to snow saturation Elevation artefacts, Homathko Icefield, Coast Mountains TRIM 9
Missing data: Satellite borne (ASTER) Global DEM (Chile) with cloud holes Area coverage: TRIM II showing areas with orthophotos 10
Area coverage and precision: GPS data - Satellites launched 1989-1994 data now usually available initially for military only GPS: Dilution of Precision (DOP) (P)DOP is an indicator of the quality of the geometry of the satellites PDOP < 8.0 acceptable PDOP < 4.0 : excellent High DOP (poor) Low DOP (good) 11
Number of GPS satellites; more satellites = lower dilution of precision (DOP) d. Time and changes: - outdated data ** This is a huge factor in a country as large as Canada Updating cycle: Municipal: continuous updating, new photography 3-5 years Provincial: 10-25 years? Federal: 5-50 years depending on remoteness Other countries: 1 100 years, depending on size / resources 12
Spatial changes Gradual 'natural' changes: river courses, glacier recession Catastrophic change: landslides, floods, volcanic eruptions Seasonal and daily changes: lake, sea, river levels Man-made: urban development, new roads, cut blocks Attribute changes Forest growth (height, age etc.), fire impacts Road surfacing /upgrading / deactivation Land use new parks Physical changes Mississippi River many changes over the centuries Edziza Park glacier change 1950-2000 13
Earth Observation for Sustainable Development (EOSD) project, Canadian Forest Service from ~2000 Landsat imagery (30m resolution) download as shapefiles, no updates Available for all of Canada from geobase.ca Process for national data updates Elevations - assume unchanged Hydrography mostly unchanged (?) Glaciers change, but not updated ~ 20m retreat per year Roads change, periodically updated Metadata ( data about data ) should give sources and dates - e.g. year of aerial photography (or satellite imagery) 14
# Map versions range from 1 (north) to ~10 (most popular in south) Prince George 93G015 2000 / 2010 roads updated from 1980, but nothing else..need for documentation of date and collection methods in metadata 15