Possible links between a sample of VHR images and LUCAS

EUROPEAN COMMISSION EUROSTAT Directorate E: Sectoral and regional statistics Unit E-1: Farms, agro-environment and rural development CPSA/LCU/08 Original: EN (available in EN) WORKING PARTY "LAND COVER/USE STATISTICS" OF THE STANDING COMMITTEE FOR AGRICULTURAL STATISTICS Meeting on 0 October 009, 9:30 a.m. in Luxembourg, BECH Building, Room: Quetelet Point 3.3. of the agenda Possible links between a sample of VHR images and LUCAS Presented by J.Gallego (JRC)

The potential use of a sample of Very High Resolution satellite images for the estimation of land cover area change in the EU F.J. Gallego JRC-IPSC. MARS unit SUMMARY This document reports the results of simulations performed to assess the feasibility and potential cost-efficiency of a sample of very high resolution (VHR) images for two similar but different problems in the European Union (EU): Land cover area estimation and land cover area change estimation. The results suggest that three are few chances to reach cost-efficiency for land cover area estimation, but that there is a good potential of use for land cover area change estimation combining VHR images with ground observation from LUCAS or other national area frame surveys. The simulations have been performed using CORINE Land Cover 000 (CLC000) and the layer of changes CLC90-CLC000. The impact of the geographic smoothing of CLC, compared with reality, needs to be studied more in detail, but is unlikely to modify the conclusions of the simulations. 1 DATA AND STUDY AREA CLC000 has been produced by photo-interpretation with common rules of Image000, a coverage of Landsat ETM+ images (Multispectral+Panchromatic) resampled with 1.5 m resolution (JRC-EEA, 005). The nomenclature of CLC000 has 44 classes. The minimum mapping unit is 5 ha; smaller units are included in the dominant land cover type around or grouped in an area coded as heterogeneous. The class heterogeneous is important due to the relatively coarse scale of CLC. The nominal location accuracy is 100 m, but the reached accuracy is much better; in fact the location accuracy of Image000 is generally under 0 m (Gallego, 005). A raster layer has been produced to facilitate certain operations of spatial analysis. The results presented in this paper were obtained using a raster version of CLC000 with a resolution of 100m. The study area is the set of countries for which the change layer CLC000-CLC90 is available Figure 1: study area (countries for which CLC change is available

SAMPLING 10X10 KM SITES FOR LAND COVER AREA ESTIMATION. Using CLC-000 as pseudo-truth in the CLC change area we have simulated the potential sampling error of estimates from simple random samples (srs) of 350 sites of 10x10 km. Since we are using a pseudo-truth for which we know the whole population the variance of the mean can be computed exactly without simulations: V y = V Y Where ( Y ) ( ) ( ) n V is the population variance. We have compared the coefficients of variation of srs with n=350 sample sites with the estimated coefficients of variation of LUCAS 006 (table 1). Table 1: Sampling errors of LUCAS and expected sampling errors of a sample of VHR images Coefficient of Variation (CV) Sample of 350 images 10 x 10 km LUCAS 006 (11 countries) artificial 9.3 1.1 arable 5. 0.4 perm crops 16.6 0.83 pastures 7.3 0.46 heterogeneous 5.7 n.a. total agriculture 3.6 0.33 forest and woodland 4.9 0.44 bare 5.5 1.3 other vegetation 13.6.1 glaciers 113.9 90 water 1.1 1.8 The sampling errors of the first column can be improved with stratification and co-variables (Euroland, CLC )., but it is not realistic to reach Coefficients of Variation similar to those of LUCAS. For example in agricultural surveys on area frame a relative efficiency of (dividing the CV by ) is considered a good result (Taylor et al., 1997, Carfagna and Gallego, 005) and only exceptionally the CV can be divided by 3 (relative efficiency = 9). The non-sampling errors are mainly linked with the identification mistakes (commission-omission errors) and need some further information from confusion matrices (using the validation survey for LUCAS and crossing LUCAS data with classified images)..1 Equivalent number of points The previous paragraph does not take into account the cost issue. One way to compare costs is computing what we can call the equivalent number of points of a site. The concept can be defined in the following way: Let us consider a single-stage sampling plan Ψ of n elementary units (points in this case); let Φ be a sampling plan of m clusters and let estimated variances for land cover class c with both sampling plans; standard binomial formula: s c = p ( 1 p) ψ and of CLC. The equivalent number of points of a cluster can be defined as n s and c,ψ s be the c,φ s c,ψ is computed with the s c,φ is computed from the set of 10x10 km tiles n s ξ c = m s that, for land cover c, a sample of m sites gives the same sampling error as a sample of non-clustered points cψ cφ. This means m ξ c

Table : Estimated equivalent number of points of a site of 10x10 km for some land cover types % area cv 350 points (%) cv 350 sites 10 km (%) equivalent number of points/site artificial 4.65 4. 9.3 6.8 arable 8.64 8.4 5..6 perm crops.9 30.8 16.6 3.5 pastures 1.31 14.3 7.3 3.8 heterogeneous 11.97 14.5 5.7 6.4 total agriculture 43.53 6.1 3.6.8 forest and woodland 6.64 8.9 4.9 3. bare 1.9 46.8 5.5 3.4 other vegetation 3.46 8. 13.6 4.3 glaciers 0.04 63 114 5.3 water 1.4 44.6 1.1 4.5 Table shows that the equivalent number of points of sites of 10 x 10 km, using CLC000 as pseudo-truth, ranges roughly between 3 and 7. This number is probably underestimated because the spatial auto-correlation of CLC000 is overestimated due to the smoothing effect. We can conjecture that the equivalent number of points may reach an indicative value of 10-0. Taking into account that the average cost of ground survey per point in LUCAS was 3 (Eurostat, 007), the cost per site, including image purchase and analysis should be less than 50-500, depending on which land cover classes are given priority. This level of cost is not realistic at the moment and therefore the target of land cover area estimation should be excluded with the cost structure in the EU. 3 LAND COVER AREA change ESTIMATION. The conclusions of the previous paragraph change if we consider the CLC land cover change as pseudo-truth. For this purpose changes have been regrouped as described in table 3: Table 3: Groups of land cover change categories CLC 000 artificial crops pasture heterog forest & wood Natural artificial 0 4 4 4 4 4 CLC crops 1 0 4 3 3 3 1990 pasture 1 4 0 3 3 3 heterog 1 0 4 4 forest &wood 1 4 0 5 natural 1 4 4 0 0 No change 1 New artificial Agricultural expansion 3 Agricultural abandonment 4 Other changes

3.1 Equivalent number of points The lower spatial autocorrelation for land cover change leads to a much higher number of equivalent points (table 3), often around 100. This means that there may be room for costefficiency of remote sensing if the cost per site can be kept below 000 or 3000. Table 4: Estimated equivalent number of points of a 10x10 km site for main land cover change types new artificial new agriculture agricultural abandonment other changes % area 0.7 0.15 0.1 1.64 cv points n=00 136.7 183.5 154.6 54.8 cv (n=00) sites 10 km 1.8 34.7 5. 15.6 Number of points for the same CV 7915 5658 7600 508 equiv n points per site 39.6 8.3 38.0 1.5 CV (n=00) sites 30 km 14.5 19.8 14. 11.6 Number of points for the same CV 1858 185 5 4703 equiv n points 9.6 91.3 16.1 3.5 3. Comparing identification errors. The above considerations on sampling errors from ground surveys per point and from remote sensing surveys per site implicitly assume that the non-sampling errors (mainly identification mistakes) are similar. For land cover change we can reasonably expect that the identification errors due to mislocation are smaller on satellite images because of the better overview of the context, although this needs to be confirmed, 4 THE CLC CHANGE LAYER AS PSEUDO-TRUTH We have also implicitly assumed that the spatial correlation structure of CLC change is close to the spatial correlation structure of the real changes. Actually a visual inspection of a map of abundance of changes raises some doubts. We have aggregated the changes reported in CLC change with two criteria: thematically with the grouping described in Table 3 and geographically with a grid of 10x10 km. This leads to the maps reported in Figure to Figure 5.

Figure : Rate of artificialisation in CLCchange per cell of 10 x 10 km. Figure 3: Rate of change to agriculture in CLCchange per cell of 10 x 10 km.

Figure 4: Rate of agricultural abandonment in CLCchange per cell of 10 x 10 km. Figure 5: Rate of other changes in CLCchange per cell of 10 x 10 km.

5 SIZE OF SAMPLING SITES We have tested the impact of larger sites (this would correspond to SPOT, Rapid Eye) on the potential sampling error. The optimal size of the site strongly depends on the cost function, i.e. the cost per site as a function of its size. In a first analysis we have tested cost functions of the type: C = α + β n + nγ s = α + n( β + γ s) We try three different assumptions: for the cost per unit β + γ s, assuming β = 1 and s expressed in number of 10x10 km. Simple random samples on CLC change (as pseudo-truth) has been used at this stage. The coefficients of variation reported correspond to samples of 350 sites of 10x10 km and a number of sites of other dimensions with the same cost. 5.1 Comparison with two types of cost functions A parameter γ=0.5 corresponds to an intensive manual input (labour intensive photointerpretation). The cost of a site of 50x50 km is 9 times the cost of a site of 10x10 km Table 5: Estimated coefficients of variation assuming a labour intensive image interpretation. Size of sites LC change 10 km 0 km 30 km 40 km 50 km 60 km New artificial 16.4 17.9 1.3 4.3 8.5 33.8 New agriculture 6.1 5.4 8.8 3.8 38.3 45.4 Agricultural Abandonment 19.0 18.5 0.6 3.5 6.1 31.0 Other changes 11.7 13.8 16.9 0.3 3.6 8.0 Assuming a value γ=0.01 corresponds to a mainly automatic processing. The cost of a site 50x50 km is only 5% higher than the cost of a site of 10x10 km. Table 6: Estimated coefficients of variation assuming a mainly automatic image processing. Size of sites LC change 10 km 0 km 30 km 40 km 50 km 60 km New artificial 16.4 1.7 11. 10.0 9.6 9.6 New agriculture 6.1 18.0 15.1 13.5 1.9 1.8 Agricultural Abandonment 19.0 13. 10.9 9.7 8.8 8.8 Other changes 11.7 9.8 8.9 8.4 7.9 7.9 Assuming a highly automated approach is more realistic with the set up of SA. However the cost function is difficult to determine because of the links with pricing policies for different image types. For example the cost of a 30x30 or 50x50 km site will be very different if we assume the acquisition of SPOT images, IRS LISS-IV or RapidEye images.

6 CONCLUSION: POSSIBLE LINKS BETWEEN A SAMPLE OF VHR IMAGES AND LUCAS. 6.1 For land cover area estimation. There seems to be little room for a cost-efficient use of a sample of VHR images for land cover area estimation. In the current situation, a sample of VHR images is more expensive and less reliable than data acquired on the ground. Exceptions appear when access to sampled points or areas is problematic: mountain areas, very large forests with few roads, large private properties in which the entrance permits are difficult to obtain, military areas, etc. The definition of a stratum areas difficult to access needs to be studied more in depth on the basis of LUCAS 006 and 009 observation mode per point. We should also assess the possible cost of a complementary survey on a sample of VHR images to cover non-accessible areas. 6. For land cover change area estimation. Consistent estimates of LULC change matrices in the EU are not available at the moment. CLC change gives a valuable idea of the location of the main changes, but its direct use to derive change matrices gives heavily biased estimates. LUCAS made an attempt of more consistent estimates between 001 and 003, but the results were unrealistic because of the wrongly designed ground survey scheme: surveyors in 003 did not have the information on the 001 observation and co-location inaccuracy resulted into a large amount of fake changes. LUCAS 009 seems to have been designed to avoid past mistakes, but its suitability remains to be checked. A combined use of ground observations (LUCAS 006-009) and photo-interpretation might be a more suitable solution, but further work is still needed to define the precise procedure. REFERENCES Carfagna E., Gallego F. J. (005) Using remote sensing for agricultural statistics. International Statistical Review, 73(3), 389-404. Eurostat (007) LUCAS 006 Quality Report. Standing Committee for Agricultural Statistics, - 3 November 007. Document ESTAT/CPSA/5a, Luxemburg Gallego F.J., 005, Stratified sampling of satellite images with a systematic grid of points, ISPRS Journal of Photogrammetry and Remote Sensing, 59, 369-376 JRC-EEA, 005, CORINE Land Cover updating for the year 000: Image000 and CLC000; Products and methods; ed. Vanda Lima, Report EUR 1757 EN. JRC-Ispra Taylor J., Sannier C., Delincé J, Gallego F.J., (1997), Regional Crop Inventories in Europe Assisted by Remote Sensing: 1988-1993. Synthesis Report. EUR 17319 EN, JRC Ispra, 71pp.