NR402 GIS Applications in Natural Resources. Lesson 9: Scale and Accuracy

NR402 GIS Applications in Natural Resources Lesson 9: Scale and Accuracy 1

Map scale Map scale specifies the amount of reduction between the real world and the map The map scale specifies how much the map is reduced in size compared to the real world. A map of the scale 1:24,000 means that 1 cm (~1/2 inch) on the map is 24,000 cm (240 m or approximately 720 feet) in the real world. For a printed map the map scale is always fixed while in a GIS the map can be displayed at any scale. It is however very important to consider that each map layer in a GIS was developed at a certain scale, which as implications for the amount of detail, spatial accuracy and appropriate use of the GIS data layer. 2

Current display scale Map scale for current display You may not have noticed, but the scale of the current map is always displayed in ArcMap. In ArcMap you can change the map scale that a map is displayed in by typing the desired scale in the scale display window shown above. 3

Scale in Ecology Grain Extent Scale represents two different concepts to landscape ecologist, grain and extent. Grain is the resolution of the dataset, i.e. pixel size in a raster dataset. Extent refers to the size of the dataset, i.e. how much area that is covered by the dataset. 4

Scale has many meanings Avoid using the terms small scale and large scale as they have different meaning in different professions Cartographers interprets a large scale map as a map with fine resolution while ecologists think of a large scale map having low resolution covering large extents Preferred terminology is broad scale and fine scale In the following I will avoid using the terms small scale and large scale since they have a different meaning to cartographers and ecologists. Instead, I will use the terms broad scale and fine scale. 5

Scale examples Select map scale depending on the extent of your study area Craig Mountain Fine scale State of Idaho Mid scale North America Broad scale Data sets with a small extent (a watershed for example) often has a fine grain and is referred to as fine scale data. Mod scale data has an intermediate extent and grain while road scale data (such as a continent) covers large areas with a coarser grain. Notice that a fine scale dataset usually has both a fine grain and a small extent and a broad scale dataset has a coarse grain and covers a large extent. It is unusual to find a large dataset with a very fine grain because this dataset would take up very much memory on the computer and be difficult to display within reasonable times with current computer technology. A dataset with a small extent and a coarse grain is also unusual. In studies of small extents a fine grain is usually required or desired. Always report the grain (pixel size) of a dataset as part of the metadata. It is considered inappropriate to mix datasets with different grains in a GIS analysis. The data should always be resampled to the coarsest common pixel size. However, some variables are inherently broad scale, i.e the dataset would not change much in appearance if displayed with a finer pixel size there is no more detail to be found in the data. The scale at which maximum detail of the dataset is found is sometimes referred to as the functional scale of the dataset. 6

Comparing scales in vector data Stream not included in the 1:100k or 1:250K scale data The topographic map displayed here was developed at the 1:24,000 scale. Notice the streams on this map. Overlaid with this topo map are two GIS vector stream layers, one at 1:100,000 scale (displayed in purple) and one at 1:250,000 scale (displayed in blue). The 1:100,000 scale streams (purple line) line up fairly well with the 1:24,000 scale data (topo map). Notice however that a few ephemeral streams are present in the 1:24,000 data but missing in the 1:100,000 data. The 250,000 scale streams display in the correct general area in relation to the 1:24,000 scale topo map, however there are large spatial errors ( a poor match). 7

Effects of Scale Is there an acceptable error? Acceptable error 1/50 inch Map scale Error 1: 24,000 10 m 1: 100,000 40 m 1: 250,000 100 m 40 ft is an acceptable error for the USGS 1:24,000 scale topo maps. Imagine yourself digitizing lines in a map. Each time your hand trembles a little there is an error introduced in the map. Lets say that it is acceptable to tremble up to 1/50 of an inch when digitizing this is not very much! A slight handshake of 1/50 inch translates to a 10 meter error in a 1:24,000 map, 40 m error in a 1:100,000 map and 100 m error in a 1:250,000 scale map. Notice how the spatial error increases a the scale becomes coarser. All maps are created at a specific scale with a pre-determined acceptable error this original scale determines the acceptable (expected) error in a GIS dataset. In instances where GIS data do not overlay perfectly with an air photo or other data you should investigate at what scale the dataset was originally created at. Yes there is an acceptable error unless the map was created by a 100% perfect robot! 100% perfection does not exist. Be aware of the fact that in GIS you can zoom in much beyond the scale at which the map was created. The original scale at which the map was created limits the amount of detail you will see in the GIS! 8

Error in Area measurements True area: 1 ha (100x100 m) Max area: 1.21 ha Min area: 0.81 ha ~ 20 % error in area Error buffer, 10 m radius Consider the orange square with a true area of 1 ha (100x100m). Let s assume that the corners are located with a GPS unit with an accuracy of 10 meters. Acknowledging the GPS error, the corners may be located anywhere within 10 m from the true location. The maximum estimated area of the square would occur if all corners are estimated to be 10 m outside of the actual square, leading to a calculated square area of 1.21 ha. The minimum estimated area of the square would occur if all corners are estimated inside the actual square, leading to a calculated square of 0.81 ha. In both cases the maximum area error is about 20%. All GIS data has errors! Be aware of how large they are. The area column in the GIS attribute table will show the area with many decimals (probably 5 or 6). The number of decimals carried in the area and perimeter column in the attribute tables does not represent the accuracy of the data, but rather the accuracy of the numeric calculations performed in the GIS. 9

What s the difference? Fine scale data more detail higher spatial accuracy usually has a small extent Broad scale data less detail lower spatial accuracy usually has a large extent In general we can make the following conclusion: Fine scale data contains more detail and has a higher spatial accuracy than broad scale data. Be aware of the fact that in GIS you can zoom in much beyond the scale at which the map was created. Naturally you will not see the detail present in the real world or in a map created at the scale you are zoomed in to. The original scale of the map limits the amount of detail you will see in the GIS! The same is true for analysis. Obviously you should not use a broad scale map for fine scale analysis. The amount of detail necessary for a fine scale analysis is not present in a broad scale map and the spatial errors are too large. 10

Minimum mapping unit The minimum mapping unit (MMU) of a map is the smallest object that is present in the map. The minimum mapping unit (MMU) of a map is the smallest object that is present in the map. If an object is smaller than the MMU it will be included in other objects and will not show as a separate entity on the map. 11

Resolution Landsat 7 imagery at 30 m resolution Aster imagery at 15 m resolution The resolution of a raster dataset is determined by the camera that recorded the data. The data can of course be resampled to a smaller pixel size, this would however not improve the amount of detail that can be observed in the image. Usually a raster dataset is displayed at the pixel size it was recorded at. If vector data is converted to raster data a pixel size should be chosen such that the pixel resolution matches the spatial error of the line-work. 12

Resolution UI campus SPOT imagery 10 m resolution Orthophoto at 1 m resolution Another example of two photos of the same area with different pixel resolution. Fine scale data (small pixel size) produces data that show more detail at a higher spatial accuracy. 13

Smith Creek Watershed MMU = 1 ha Riparian 110 ha This is a vector dataset of the Smith Creek watershed in southwestern Idaho. The minimum mapping unit (MMU) is 1 ha. This means that the smallest polygon in the dataset is approximately 1 ha (100x100 m). In the following slides we will convert this vector dataset to raster data of different pixel resolutions. The appropriate pixel resolution is around 1 ha (the minimum mapping unit). A smaller pixel resolution is acceptable from an analysis standpoint, however will produce a dataset that is unnecessarily large (takes up more memory than necessary). 14

Smith Creek Watershed Grid 30 m cell size Riparian 110 ha Here the original vector dataset has been resampled to a 30 m pixel size which is smaller than the MMU. As you can see there is not much change in the spatial representation of the Smith Creek watershed when comparing the vector and the raster (30 m ) data. 15

Smith Creek Watershed Grid 100 m cell size Riparian 104 ha Resampling to 100 m pixels still show most of the detail that was originally in the vector map. 100 m is the MMU. Notice how the streams are beginning to be broken up in pixels they obviously had a width smaller than 100 m in the original vector dataset. In this raster map the stream lines are overlaid with the 100 m raster map as a reference. 16

Smith Creek Watershed Grid 500 m cell size Riparian 104 ha At resampling to 500 m pixels most of the original detail in the vector map (1 ha MMU) is lost. In this raster map the stream lines are overlaid with the 500 m raster map as a reference. The GIS user must be aware of the consequences of such resampling! Is this acceptable? Well, as usual it depends on what question the GIS analysis is intended to answer! If the analysis is focused on the stream network and the smaller polygons resampling to 500 m is unacceptable. 17

Errors add up in an overlay analysis All GIS layers have an associated error. In overlay analysis errors from different layers add up. 18

Thematic map accuracy estimating map errors The bright green polygons on this map represents aspen woodlands. If we go to one of those polygons what are the chances that we actually find aspen woodlands? So far we have been talking about the spatial accuracy of maps. Another type of accuracy is the thematic accuracy. The thematic accuracy of a map refers to the certainty with which the polygon attributes are mapped. The map above shows the vegetative cover types in the Smith Creek watershed. The bright green polygons represent aspen woodlands. If we jump in a truck and drive to one of those polygons, what are the chances that a polygon is actually aspen woodlands? If the thematic accuracy of the map is 70%, there is 70% chance that the polygon is actually what we think it is based on the map key. 19

Landsat 5 imagery, July 1992 Smith Creek Current Creek NR402 Hurry - GIS Back Applications Creek Succession in Natural Resources in a Western Juniper / Sagebrush Steppe Mosaic Landscape composition Disturbance patterns Management Strategies Red Canyon Creek In the next few slides and in Exercise 9 I will show you how to carry out an assessment of the thematic map accuracy based on GPS ground reference locations. This Landsat image shows four watersheds in the Owyhee mountains in southwestern Idaho. The image is displayed in a near infrared color combination showing the western juniper (Juniperus occidentalis) and sagebrush steppe (Artemisia spp.) dominated landscape. The near infrared display causes the juniper to appear red in the image while the sagebrush appears blue/green/gray. The vegetation types and seral stages of the Smith Creek watershed was mapped using aerial photography. This map was later used to better understand the composition of the landscape and the succession disturbance dynamics in the juniper/sagebrush ecosystem. 20

Succession in a juniper community Grassland after fire Mountain big sagebrush steppe Stand initiation juniper Open young juniper Young multistory juniper Old multistory juniper Six successional stages of the juniper woodlands were mapped. A few ancillary vegetation types such as aspen woodlands, riparian, and mountain shrub were also mapped to create a wall-to-wall map. 1. Grasslands, which occur after a fire. 2. Mountain big sagebrush steppe follows the grassland stage successionally 3. Stand initiation juniper. This stage follows about 30-40 years after the fire. 4. Open young juniper woodland. Individual trees are here 50-80 years in age 5. Young multi-story woodland 6. Old multi-story woodland composed of individuals over 200 years I age, many of them much older. 21

From air photo to GIS vegetation layer Stand initiation woodland Old multi-story woodland The successional stages were delineated on aerial photographs and transferred to a GIS. Today vegetation mapping is more commonly done using satellite imagery. Training data from a ground assessments are commonly used to train the photo interpreter in an aerial photo interpretation or to train the computer in a computerized image classification. 22

Map with GPS ground reference points Smith Creek The thematic accuracy of the map can be assessed by comparing the map to reference data collected using GPS. During the reference data collection the vegetation is assessed in the field and classified into the same categories as the map. It is recommended that approximately 30 ground reference points are collected for each vegetation category. The ground reference points are often stratified by for example elevation and should be placed far enough apart to be considered spatially independent. This map shows a subset of the entire study area and there are therefore fewer than 30 points in each vegetation category. In reality the number of sampling locations is often dictated by the amount of time and money that is available for the accuracy assessment. It is important to use the resources wisely and gather the best reference data possible. Ground reference locations should be chosen such that they are larger than the minimum mapping unit and have a homogeneous vegetation cover representative of the class. The reference plots should also be larger than the estimated GPS error. The accuracy of thematic map varies with the precision. A map that only differentiates between forest and non-forest will likely have a very high accuracy (> 90%) while a map that classifies the vegetation in many classes will likely have a lower accuracy. Rather than using GPS ground data as a reference, aerial photographs are often used as the reference data in accuracy assessments of maps with a larger pixel size, for example Landsat (30 m) or MODIS (250 m). Further recommended reading for field methods in remote sensing and mapping: Roger M. McCoy, Fieldmethods in Remote Sensing. 23

Confusion matrix Ground classification, GPS points Map classification from Low Mtn Stand Open Young Old Junipermahog Commis- sion (%) accur- Sum Omis Error User Landsat TM imagery sage big initiat. young multistorstrata -any sion error acy multi- sage woodl woodl Low sagebrush 3 0 0 0 0 0 0 3 0 0.0 100.0 Mountain big sagebrush 0 12 2 0 0 0 0 14 2 14.3 85.7 Stand initiation woodland 4 3 4 0 0 0 0 11 7 63.6 36.4 Open young woodland 0 3 4 6 0 0 0 13 7 53.8 46.2 Young multi-story 1 1 0 6 26 8 4 46 20 43.5 56.5 woodland Old multi-strata 0 2 0 1 1 27 3 34 7 20.6 79.4 woodland Juniper-mountainmahogany 0 1 0 1 3 3 24 32 8 25.0 75.0 Sum 8 22 10 14 30 38 31 153 Omission Commission error error 5 10 6 8 4 11 7 Error (%) 62.5 45.5 60.0 57.1 13.3 28.9 22.6 Producer s accuracy 37.5 54.5 40.0 42.9 86.7 71.1 77.4 Landsat TM Aerial photo Total points: 153 150 Accurate points: 102 109 Percent accuracy: 66.7 72.7 Kappa statistic: 59.2 67.2 The accuracy assessment is commonly presented in a confusion matrix (error matrix or contingency table). In this matrix the map classification is represented in the rows while the reference data is represented as the columns. For example (see column 1) there were three instances where an area was mapped as Low sagebrush and the ground reference was also assessed as Low sagebrush. Continue down in column 1: there were 4 polygons that were mapped as Stand initiation woodland but were assessed to be Low sagebrush in the reference data (circled in blue). The red circles on the diagonal show all data that were correctly assessed. The total number of CORRECT in this matrix was 102. The total number of points was 153 yielding an overall accuracy of 102/153 = 66.7%. Error of omission is the percent of the samples that should have been put into a given class but were not. Error of commission is the percent of the samples that were placed in a given class while they actually belong to another. User's accuracy represents the probability that a given polygon (or pixel) will appear on the ground as it is classed, while producer's accuracy represents the percentage of a class that is correctly classified on the map. Further recommended reading: http://www.ncgia.ucsb.edu/publications/tech_reports/91/91-23.pdf 24

Biases in classification accuracy Conservative biases (lowers overall accuracy): 1. Incorrect location due to GPS errors 2. Errors in reference data assessment 3. Ground control points were taken in an area smaller than the minimum mapping unit of the map 4. Changes in vegetation between map creation and collection of ground control data Optimistic biases (leads to overestimate in accuracy): 1. Use of training data in the accuracy assessment 2. Sampling of reference data not independent of training data 3. Reference data collected only in large homogeneous areas http://nrm.salrm.uaf.edu/~dverbyla/online/errormatrix.html There are many sources of biases in estimates of classification accuracy. Verbyla ( http://nrm.salrm.uaf.edu/~dverbyla/online/errormatrix.html) divides the biases in conservative and optimistic biases. Conservative biases tends to lower the estimated accuracy. Such biases are GPS errors, errors in reference data assessment, reference locations in areas smaller than the minimum mapping unit, and changes in the vegetation between map creation and collection of ground control data. Optimistic biases leads to overestimates of the map accuracy and occur when training data and accuracy data are related (not independent). Verbyla also states that an overestimate in map accuracy occurs when reference locations are collected only in large homogeneous areas. Training data are ground data that are used to create the vegetation classification. 25