Technical specifications for implementation of a new land-monitoring concept based on EAGLE

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Technical specifications for implementation of a new land-monitoring concept based on EAGLE"

Transcription

1 EEA/IDM/R0/17/003 Technical specifications for implementation of a new land-monitoring concept based on EAGLE D4: Draft design concept and CLC-Backbone and CLC-Core technical specifications, including requirements review Verwijderd: D3 Verwijderd:, Verwijderd:. Version Verwijderd: 3 Verwijderd:

2 Version history Version Date Author Status and description Distribution /10/2017 S. Kleeschulte, G. Banko, G. Smith, S. Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin-Pihlatie /11/2017 S. Kleeschulte, G. Banko, G. Smith, S. Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin-Pihlatie /01/2018 S. Kleeschulte, G. Banko, G. Smith, S. Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin-Pihlatie Draft for review by NRC LC Updated draft integrating the comments following the NRC LC workshop Updated draft integrating the comments following the Brussels COPERNICUS Land monitoring workshop EEA, NRC LC For distribution at Copernicus User Day For EEA distribution / consultations 1 P age

3 Table of Contents 1 Context Introduction Background Concept Role of industry Role of Member States Engagement with stakeholder community Structure of this document Requirements analysis CLC-Backbone the industry call for tender Introduction Description of CLC-backbone Level 1 Objects hard bones Level 2 Objects soft bones Technical specifications Spatial scale / Minimum Mapping Unit Delineation accuracy of geometric units Reference year EO data Thematic attribution Thematic classes European ancillary data sets for Level 1 hard bone creation Criteria for ancillary data input Ancillary input data Remark: land parcel identification system (LPIS) Short-Form of CLC-Backbone end product technical specifications CLC-Core the grid approach Background Populating the database Data modelling in CLC-Core Database implementation approach for CLC-Core Database concepts revisited: from Relational to NoSQL and Triple Stores Page Verwijderd: 1 Context [1] Verwijderd: 7 Verwijderd: 9 Verwijderd: 9 Verwijderd: 10 Verwijderd: 14 Verwijderd: 14 Verwijderd: 15 Verwijderd: 15 Verwijderd: 16 Verwijderd: 21 Verwijderd: 21 Verwijderd: 22 Verwijderd: 23 Verwijderd: 23 Verwijderd: 24 Verwijderd: 24 Verwijderd: 25 Verwijderd: 25 Verwijderd: 26 Verwijderd: 27 Verwijderd: 28 Verwijderd: 29 Verwijderd: 31 Verwijderd: 32 Verwijderd: 36 Verwijderd: 37 Verwijderd: 40 Verwijderd: 40 Verwijderd: 41 Verwijderd: 42

4 5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach Processing and Publishing of CLC-Core Products Conclusion and critical remarks Short-form of CLC-Core end product technical specifications CLC+ - the long-term vision CLC-Legacy Experiences to be considered Annex 1 CLC nomenclature Annex 2: CLC-Back bone data model Integration of temporal dimension into LISA Integration of multi-temporal observations into LISA Annex 3: OSM data OSM road nomenclature OSM roads: completeness Annex 4: LPIS data in Europe Verwijderd: 48 Verwijderd: 49 Verwijderd: 49 Verwijderd: 50 Verwijderd: 53 Verwijderd: 57 Verwijderd: 59 Verwijderd: 63 Verwijderd: 64 Verwijderd: 67 Verwijderd: 68 Verwijderd: 70 Verwijderd: 70 Verwijderd: 71 Verwijderd: 72 3 P age

5 List of Figures Figure 1-1: The relationship of the 2 nd generation CLC to the existing CLMS components Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved European land monitoring (2 nd generation CLC) Figure 2-2: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC) Figure 2-3: A scale versus format schematic for the current and proposed CLMS products Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red UA, yellow - N2K, blue RZ LC) Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects based on this classification and additional Sentinel-1 information Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania (near Parta) and Serbia (near Indija) (top to bottom). Bing/Google and EOX Sentinel-2 cloud free services as Background (left to right) Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: Jan. 2018) Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across member countries (Status: Jan. 2018) Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). daa is a Norwegian unit: 10 daa = 1 ha Figure 5-2: Representation of real world data in the CLC-Core Figure 5-3: Example of an RDF triple (subject - predicate - object) Figure 5-4: GeoSPARQL query for Airports near the City of London Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to Sparsely vegetated areas in Norway and Moors and heathland in Sweden. Both classifications are correct Figure 7-2: Result of changing the CLC implementation method in Germany Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data Figure 7-4: Generalization levels used in LISA generalization in Austria Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC (top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The Verwijderd: Figure 2-1: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC) [2] Verwijderd: 7 Verwijderd: 11 Verwijderd: 12 Verwijderd: 13 Verwijderd: 21 Verwijderd: 22 Verwijderd: 33 Verwijderd: 34 Verwijderd: 35 Verwijderd: 41 Verwijderd: 42 Verwijderd: 46 Verwijderd: 47 Verwijderd: 47 Verwijderd: 48 Verwijderd: 49 Verwijderd: 58 Verwijderd: 59 Verwijderd: 61 Verwijderd: 61 Verwijderd: 61 4 P age

6 generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution CLC illustrates well the potential laying in CLC Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover extended application schema) Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component Figure 9-3: The new CLC-Basis model (based on LISA) Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, Geoville 2017) Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit82 Figure 11-2: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Verwijderd: 62 Verwijderd: 64 Verwijderd: 65 Verwijderd: 66 Verwijderd: 68 Verwijderd: 69 Verwijderd: 73 Verwijderd: 74 5 P age

7 List of Tables Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC based on initial discussions with EEA Table 2-2: Summary (matrix) of potential roles associated with each element / product Table 4-2: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones) Table 4-1: Overview of existing relevant products which were analysed as potential input to support the construction of the geometric structure (Level 1 hard bones ) of the CLC-Backbone Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data Table 6-1: List of current CLC classes and requirements for external information Verwijderd: Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC [3] Verwijderd: 12 Verwijderd: 15 Verwijderd: 24 Verwijderd: 30 Verwijderd: 46 Verwijderd: 56 6 P age

8 7 P age

9 1 CONTEXT The European Environment Agency (EEA) and European Commission DG Internal Market, Industry, Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual strategy and associated technical specifications for a new series of products within the Copernicus Land Monitoring Service (CLMS) portfolio (Figure 1-1), which should meet the current and future requirements for European Land Use Land Cover (LULC) monitoring and reporting obligations. This collection of products is nominally called the "2 nd generation CORINE Land Cover (CLC)". Verwijderd:, Verwijderd: Figure 1-1 Verwijderd:. These Verwijderd: are Figure 1-1: The relationship of the 2 nd generation CLC to the existing CLMS components. After a call for tender in 2017, the EEA has tasked the European Environment Information and Observation Network (EIONET) Action Group on Land monitoring in Europe (EAGLE Group) with developing an initial response to fulfilling the needs listed in the following chapters through a conceptual design and technical specifications. The approach adopted for the development of the 2 nd generation CLC represents a sequence of stages, where separate single elements or products of the whole concept could be developed relatively independently and at different rates, to allow time for broad consultation with stakeholders. This also allows the continuous inclusion of Member State (MS) input, the exploitation of industrial production capacity, and the necessary feedback, lessons learnt and refinement to reach the ultimate goal of a coherent and harmonized European Land Monitoring Framework. The first stage of this development process was to engage with the EEA and DG-GROW to refine their requirements and build on the ideas presented by the EAGLE Group in the proposal. These first developments were undertaken within a set of constraints outlined by EEA and DG GROW for the initial product to be delivered within the 2 nd generation CLC: Industrial production by service providers, Outcome product in vector format, Highly automated production process, Short timeframe of production phase, Driven by Earth Observation (Sentinel-2), Verwijderd: these Verwijderd: development Verwijderd: allowed Verwijderd: outlined Verwijderd: conceptual strategy and proposed a draft technical specification for Verwijderd: product (CLC-Backbone) to be developed. The first stage also involved a presentation of the concept to the EIONET NRCs Land Cover and the collection of the NRCs feedback, which Verwijderd: then carefully taken into consideration for the continuation of the process into the second stage. The first stage was Verwijderd: the 8 P age

10 Complete the picture started by the Local Component 1 products (EEA-39). These developments resulted in the first distributed deliverable (D2) which outlined the conceptual strategy and proposed a draft technical specification for the first product (CLC-Backbone, see later) to be developed. This stage also involved a presentation of the concepts in D2 to the EIONET National Reference Centres (NRCs) Land Cover in Copenhagen in October The NRCs feedback collected during the NRC meeting and afterwards was carefully taken into consideration for the continuation of the development process into the next stage. The conceptual strategy and technical specification were both refined and extended for the production of the next distributed deliverable (D3). The concepts and specification for the 2 nd generation of CLC contained in D3 were presented to the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November This meeting generated lively debate and a set of survey results from the participants. This document is the penultimate deliverable (D4) of the project representing the outcomes of the latest stage of development of the 2nd generation of CLC. This document contains a further expansion on the conceptual strategy and extended technical specifications for the proposed products. The aim of the document is to continue to communicate these developments to the stakeholders involved in the field of European land monitoring and elicit their feedback. This document will be presented by the EEA to a number of high level meetings and stakeholders. A final revised version of this document (D5) will be produced and made available in early March Verwijderd: This document represents Verwijderd: second stage of development to further expand on Verwijderd: to extend the current Verwijderd:, to propose additional follow-up Verwijderd: in more detail and Verwijderd:. This second stage also aims at enlarging the circle of Verwijderd: to all interested Copernicus users and beyond. It is vital for the success of the project and the long-term evolution of land monitoring in Europe that all the relevant stakeholders communicate their requirements and opinions to Verwijderd: process.... [4] Verwijderd: are to continue to Verwijderd: at specific milestones towards a final version 1 The term component has a twofold function, 1) as the Copernicus local component products, 2) as a term for Land Cover Component as an element within the EAGLE data model. In this case the former. 2 Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+, 16/11/2017 Report. Framework Contract 385/PP/2014/FC Lot 2 (Copernicus User Uptake Framework Contract). Prepared for: European Commission - DG GROW 9 P age

11 10 P age

12 2 INTRODUCTION This chapter provides the background to the concept, an outline of the proposed approach to be adopted, an overview of the elements of the concept and the current status of the developments. 2.1 Background Monitoring of Land Use and Land Cover (LULC) is among the most fundamental environmental survey efforts required to support policy development and effective environmental management 3. Information on LULC plays a key role in a large number of European environmental directives and regulations. Many current environmental issues are directly related to the land surface, such as habitats, biodiversity, phenology and distribution of plant species, ecosystem services, as well as other issues relevant to population growth and climate change. Human activities and behaviour in space (living, working, production, education, supplying, recreation, mobility & communication, socializing) have significant impacts on the environment through settlements, transportation and industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The land surface, who s negative change of state can only if at all be reversed with huge efforts, is therefore a crucial ecological factor, an essential economic resource, and a key societal determinant for all spatially relevant basic functions of human existence and, not least, nations sense of identity. Land thus plays a central role in all three factors of sustainable development: ecology, economy, and society. LULC products so far tend to be produced independently of each other at the global, European, national and sub-national levels, as well as by different disciplines and sectors, each of them focussing on a fairly similar end product but different emphases on thematic content and spatial detail. This independency and uncoordinated production is explained and justified by the large variation in objectives and requirements of these products, but it leads to reduced interoperability and sometimes also duplication of work, and thereby, inefficient use of resources. At the European level CORINE Land Cover (CLC) is the flagship programme for long-term land monitoring. This is now part of the Copernicus Land Monitoring Service (CLMS). CLC has been produced for reference years of 1990, 2000, 2006 and 2012, with 2018 under preparation and expected to be available by late CLC is produced by MS with technical coordination of EEA and guided by well-established specifications and methodological guidelines. The CLC specification aims to provide consistent localized geographical information on LULC using 44 classes at level-3 in the hierarchical nomenclature (see Annex 1). The vector databases have a minimum mapping unit (MMU) of 25 ha and minimum mapping width (MMW) of 100 m with a single thematic class attribute per land parcel. At the European level, the database is also made available on a 100 x 100 m and 250 x 250 m raster products, which have been aggregated from the original vector data at 1: scale. CLC also includes a directly mapped (i.e. not creating by intersecting CLC status layers) change layer, which records changes between two of the 44 thematic classes between two dates with a MMU of 5 ha. Although CLC has become well established and has been successfully used, mainly at the pan- European level, there are a number of deficiencies and limitations that restrict its wider exploitation, particularly at the MS level and below. This is partly due to the fact that the MMU of CLC (25 ha) is too coarse to capture the fine details of the landscape at the local and regional scales, but also to the fact that some MS have access to more detailed, precise and timely information from national programs. In consequence of the first point, features of smaller size that represent landscapes 3 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project. Verwijderd: and their evolving nature Verwijderd: play Verwijderd: still Verwijderd: emphasis of Verwijderd:. Such diversity Verwijderd: and Verwijderd: 5 Verwijderd: feature Verwijderd: MFW Verwijderd: has Verwijderd: Member State ( Verwijderd: ) Verwijderd: to the fact that many MS have access to more detailed, precise and timely information from national programs, but also Verwijderd:. In consequence, all 11 P age

13 diversity and complexity are not mapped either because of geometric generalization or because they are absorbed by classes with broad definitions. Moreover, landscape dynamics that are highly relevant to locally decided but globally effective policy, such as small-scale forest rotation, changes in agricultural practices and urban in-filling, may be missed due to low spatial resolution and / or thematic depth of CLC class definitions. The successful use of CLC in combination with higher spatial resolution products to detect and document urban in-filling has given a clear indication of the required direction of development for European land monitoring. To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide pan-european information on selected surface characteristics in a 20 m raster format, also available aggregated to 100 m raster cells. They provide information on basic surface properties such as imperviousness, tree cover density and grassland and can be described as intermediate products. The LoCo products in vector format are based on very high spatial resolution EO data and ancillary information tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide more detailed thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha. However, the LoCo products altogether do not provide wall to wall coverage of the EEA-39 countries, even when combined. Their nomenclatures and geometries are not harmonised and have thus caused interoperability problems which are now being addressed. 2.2 Concept Given the above issues and the known reporting obligation list in the next chapter, a revised concept for European Land Monitoring is required which both provides improved spatial and thematic performance and builds on the existing heritage. Also, recent evolutions in the field of land monitoring (i.e. improved Earth Observation (EO) input data due to e.g. Sentinel programme, national bottom-up approaches, processing methods, etc.) and desktop and cloud-computing capacities offer opportunities to deliver these improvements more effectively and efficiently. The EEA in conjunction with DG GROW has identified this need and now aims to harmonise and integrate some of the CLMS activities by investigating the concept and technical specifications for a higher performance pan-european mapping product under the banner of "2 nd generation CLC" (Figure 1-1). EEA and DG GROW determined that such a 2 nd generation CLC should build upon recent conceptual development and expertise, while still guaranteeing backwards compatibility with the conventional CLC datasets. Also, the proposed approach should be suitable to answer and assist the needs of recent evolution in European policies like reporting obligations on land use, land use change and forestry (LULUCF), plans for an upcoming Energy Union or long-term climate mitigation objectives. After a successful establishment of 2 nd generation CLC there would then also be knock-on benefits for a broad range of other European policy requirements, land monitoring activities and reporting obligations. Given these requirements, no single product or specification would address all needs and the complexity of the solution would require a stage development and implementation. The proposed conceptual strategy therefore consists of a number of interlinked elements (Figure 2-1) which represent separate products and therefore can be delivered independently in discrete production phases. Each of the products has its own technical specification and production methodology and can be produced through its own funding / resourcing mechanisms. The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is based on the elements shown in Figure 2-1. The reports associated with this work will expand incrementally giving more details on the conceptual design and provide technical specifications of increasing elaboration for the for the individual products. Verwijderd: thematically mixed Verwijderd: very Verwijderd: changes of features and Verwijderd: / Verwijderd: permanent Verwijderd: in part Verwijderd: cause Verwijderd: - Verwijderd: additional reporting requirements, INSPIRE directive, Verwijderd: ". Verwijderd: Figure 1-1 Verwijderd: It was Verwijderd: Figure 2-1 Verwijderd: stand for Verwijderd: separate stages Verwijderd: Throughout the documents delivered by this project the conceptual design given in (Figure 2-1) will be used as a key graphic to identify the product and stage being described. Verwijderd:... [5] Verwijderd: four Verwijderd: Figure 2-1 Verwijderd: Although the four elements / products will be described in more detail later in this, and subsequent, reports they can be summarised as follows: 12 P age

14 Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved European land monitoring (2 nd generation CLC). Although the four new elements / products of the conceptual strategy will be described in more detail later in this and the final subsequent report (D5), they can be summarised as follows to aid understanding of the conceptual design: 1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a geometric spatial structure attributed with raster data for landscape features with limited, but robust, EO-based land cover thematic detail on which to build other products. 2. CLC-Core is a consistent multi-use grid 4 database repository for environmental land monitoring information populated with a broad range of land cover, land use and ancillary data form the CLMS and other sources, forming the information engine to deliver and support tailored thematic information requirements. 3. CLC+ is the nominal end point or final product in the establishment of CLC 2 nd generation. It is a derived vector and raster product from the CLC-Core and CLC-Backbone and will be a LULC monitoring product with improved spatial and thematic performance, relative to the current CLC, for reporting and assessment. Verwijderd:, Verwijderd: for this specific exercise and 4. The final element of the conceptual design, although not strictly a new product, is the ability to continue producing the existing CLC, which may be referred to as CLC-Legacy in the future, which already has a well-established and agreed specification. Table 2-1 provides the current overview of the main characteristics of the four elements to allow the reader to make comparisons between the key characteristics of the products. To manage the project efficiently and effectively, when developing and specifying these new products it has been necessary to stage the work. Throughout the documents delivered by this project the staging of the work given in Figure 2-2 will be used as a key graphic to identify the product and stage being described. 4 A data structure whose grid cells are linked to a data model that can be populated with the information from the different sources. Verwijderd: Table 2-1 Verwijderd: a first Verwijderd: that are developed in more detail in the following chapters of this document. The table allows Verwijderd: Figure P age

15 Figure 2-2: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC). Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC based on initial discussions with EEA. CLC-Backbone CLC-Core CLC+ CLC-Legacy Description Detailed wall to wall (EEA-39) geometric vector reference layer with basic thematic content and a raster attribution. All-in-one data container for environmental land monitoring information according to EAGLE data model. Thematically and geometrically detailed LULC product. A more generalised LULC product consistent with the CLC specification. Role / purpose Support to CLMS products and services at the pan-european and local levels. Thematic characterisation of CLMS products and services at the pan- European and local levels. Support to EU and national reporting and policy requirements. Maintain the time series (backwards compatibility) and support legacy systems. Format Raster and vector. GRID database. Raster and vector. Raster and vector. Thematic detail Geometric detail (MMU, grid size) <10 basic land cover classes, few spectraltemporal attributes. Rich attribution of LU, LC and ancillary information. 1.0 ha. 1.0 ha, 100 x 100 m grid. Update cycle 3-6 years. Dynamic update as new information becomes available. High thematic detail including LC and LU with improvements compared to CLC. 1.0 ha for status and changes. CLC-nomenclature with 44 classes + changes between the ha for status, 5 ha for changes (raster: 100 x 100 m). 3-6 years. Standard 6 years. Reference year / 2024 Production year TBD TBD Method Input data Geospatial data integration and image segmentation for geometric boundaries and attribution / labelling by pixel-based EO derived land cover Geospatial data, EO images and ancillary data selected from LoCo (with visual control) and HRL products. EAGLE-Grid approach: population and attribution of regular GRID CLC-Backbone, all available HRLs, LoCo, ancillary data, national data provisions (e.g. LU), photointerpretation. Derived by SQL from CLC-Core CLC-Core and CLC- Backbone Derived from CLC+ and CLC-Core CLC+ and / or CLC- Core Verwijderd: business Verwijderd: Vector. Verwijderd: 0.25 to 5 Verwijderd: 10 Verwijderd: 10 Verwijderd: TBD, related to CLC- Backbone. Verwijderd: to TBD, related to CLC-Backbone. Backbone. Verwijderd: or directly from Verwijderd: (TBD.) Verwijderd: LU). 14 P age

16 Figure 2-3 is a schematic description of the conceptual design showing the relationship of the new elements / products to existing CLMS products in terms of their format and level of spatial detail. In this representation: 1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map with fewer geometric details. 2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more geometric and thematic details. 3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific surface characteristics with high spatial details. 4. The proposed CLC-Core, a grid product where the level of detail has still to be decided. 5. The CLC-Backbone and CLC+, both polygon maps with different levels of thematic detail. Verwijderd: Figure 2-23 Verwijderd: spatial Verwijderd: core Verwijderd: was Verwijderd: more details Figure 2-3: A scale versus format schematic for the current and proposed CLMS products. In any monitoring system the requirement for updates is central to the implementation of the activity. In this project we are focusing on the technical specifications, therefore the update cycle will only be addressed in terms of the suggested time between repeat productions. Details of the update process are more associated with the methodologies which still need to be finalised by the service providers, national teams or the EEA in a later phase of development. However, particularly in relation to the CLC-Legacy, reference will be made to the issues associated with the update cycle and change mapping where appropriate. Given the context, requirements and issues, the aim should be to find a means to deliver the concept and its proposed products with an efficient mixture of industrially produced material backed up by ancillary (e.g. land use) information from various national programmes if they are in existence. It is important to propose a viable system in which there is flexibility to adapt the later steps to issues of feasibility and practicality once the implementation of the earlier steps is underway. For instance, the outcomes of the CLC-Backbone production should be able to influence thematic content of the CLC-Core and / or the technical specifications of the CLC+. Verwijderd: Verwijderd: 23 Verwijderd: auxiliary Verwijderd: implemention 15 P age

17 2.3 Role of industry The development and production of CLC to this point has had only a limited role for industry, mainly focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012,.), the validation of the CLC2012 products and in some MS the actual production. Conversely, the production of the non-clc products within the CLMS has been dominated by industry through a series of service contracts to generate consistent products across Europe. Industry has the ability to produce operational solutions which exploit automation, can handle large data volumes and are scalable to European-wide coverage. It is also possible that it is easier to standardize an industrial (top-down) production process than a (bottom-up) process based on MS participation, and that industrialized production therefore will lead to more harmonized products across Europe. As the amount of available EO and geographic information data increases, highly efficient and effective mechanisms for production will be required for at least parts of the European land monitoring process and economies of scale must be exploited. Industry therefore offers a number of capabilities which will be required at selected points within the production of the 2 nd generation CLC. Verwijderd: pointed Verwijderd: etc.), Verwijderd: the subcontracting of Verwijderd: produced Verwijderd: requirements. As the amount of available EO and GI DG GROW has expressed the wish that industry should have an initial role in the production of CLC- Backbone which has therefore been designed in part to exploit industrial capabilities. Further opportunities for industrial involvement are shown in Table Role of Member States The MS have always been intimately linked with the CLC as its production has been the responsibility of the nominated authorities through EIONET, i.e. NRC Land Cover. The MS have provided the production teams for the actual mapping, allowing the process to exploit local knowledge and familiarity with native landscape types, and providing access to national datasets, which may not otherwise have been possible. The bottom-up production approach has been the key to successful delivery of this important time series of LULC data. With respect to the non-clc products within the CLMS, the MS involvement has so far been limited to verification and, in the case of the 2012 products, enhancement. Although the MS have provided valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have not yet been fully exploited within the CLMS. The EEA is looking into possibilities to further increase the role of MS in non-clc products, for instance by introducing a new task on enrichment of these products. It is therefore of the utmost importance to ensure a balanced contribution to the CLC+ suite of products between industry and MS via Eionet NRC/LCs. Table 2-2 shows the potential for a greater role for the MS across all of the new elements / products within the 2 nd generation CLC. It is important that the MS experts have oversight of the specifications and an understanding of the opportunities to provide feedback on the monitoring process and products. This includes the opportunities to contribute with data, experience and local knowledge to the production and validation activities where appropriate. The MS are currently involved in verification, but not in the validation of CLMS products. Verification is the evaluation of whether or not the product corresponds to the reality of the landscape at stake by confronting the product to the regional / local expertise of the terrain, and hence meets the expectation of the stakeholders. Validation is the independent geostatistical control on the accuracy with reference to the product specifications. The intended use of CLMS products and services at the national level calls for verification against national user requirements not as an acceptance test but as a documentation of the usefulness, a promotional tool, and possibly also as assistance to improve the products for this intended use. Verwijderd: Potential role Verwijderd:. Verwijderd:, Verwijderd: to share further. This Verwijderd: so far Verwijderd: been fully exploited within the CLMS Verwijderd: Table 2-2 Verwijderd:. Verwijderd: specification, as is happening within this project, Verwijderd: location 16 P age

18 Table 2-2: Summary (matrix) of potential roles associated with each element / product. EEA MS CLC-Backbone CLC-Core CLC+ CLC-Legacy Definition, Definition, Definition, Definition, coordination, coordination, main coordination, main coordination, main main user. user. user. user. Review of specification, provision of geometric data, verification, validation, user. Review of specification, population with national datasets, land use information, user. Industry Production. Implementation and maintenance of DB infrastructure. Review of specification, support to production, verification, validation, user. Support production, validation. Production, verification, validation, user. Validation. Verwijderd: / DG GROW / EC 2.5 Engagement with stakeholder community The success of the project and the long-term development of land monitoring in Europe is intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant stakeholders are aware of this activity and contribute their requirements and opinions to this process. Revised versions of this document are foreseen at specific milestones towards a final version for deliver in early Structure of this document This specific deliverable, D4, is the third step towards the definition of the conceptual strategy and the potential technical specifications for a series of new CLMS products. It aims at communicating these details to the stakeholders involved in European land monitoring to elicit feedback, comments and questions. The work reported here begins with a review of requirements which goes beyond the remit of the call for tender. The four elements of the 2 nd generation CLC within the CLMS and their potential products are described in further detail in the following chapters. This version includes the first feedback from the NRC LC meeting in Copenhagen in October 2017, the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017 and dedicated written feedback from various MS. For CLC-Backbone, this document updates the draft versions of the technical specifications, but with less emphasis on the outline of the implementation methodology that was provided in D3. These details are still open for discussion and review by stakeholders and will ultimately be used as input for an open call for tenders to industrial service providers for a production to start in For CLC- Core, this document further develops the technical specification and proposes a number of options for the implementation. The technical design of CLC-Core will continue to be developed based on stakeholder feedback in future steps. The current thinking around CLC+ has been developed and is approaching a technical specification. Verwijderd: D3 Verwijderd: second Verwijderd: review Verwijderd: and potential products Verwijderd: already Verwijderd:, Verwijderd:. Verwijderd: and Verwijderd: D2 Verwijderd: able to be reviewed Verwijderd: provides a more advanced outline of Verwijderd: Similarly, the Verwijderd: will continue to be Verwijderd: to illicit feedback to guide expansion of the Verwijderd: specifications at a future step within the project 17 P age

19 3 REQUIREMENTS ANALYSIS In line with the principle of Copernicus, this analysis in support of the development of new products with CLMS, is driven by user needs rather than the current technical capabilities of EO sensors and processing systems. The technical issues will be dealt with in the chapters related to the individual products focusing on their specification and, to a lesser extent, methodological development. This analysis was initially based around the requirements set out in the original call for tender, but has been extended through inclusion of recent work by the EC and ETC to provide a broader range of stakeholder needs. As this project has developed, further requirements have been identified through meetings and stakeholder feedback to support the increasingly detailed specifications of a complete 2 nd generation CLC. The MMU and nomenclature of the current CLC have been operational for three decades. Backward compatibility is therefore an essential issue when a new monitoring framework is considered. Still, the specification for a future, improved and harmonised European land monitoring product is now open for revision. The proposed products should both support current European policies through continuity of current monitoring activities, but also support new policies. Examples are the reporting obligations (on the European level) on land use, land use change and forestry (LULUCF), plans for an upcoming Energy Union, Greening of the CAP, Urban Planning, Biodiversity and long-term climate mitigation objectives. A confounding element is that many of the policies that could exploit EO-based land monitoring information rarely give clear quantitative requirements that can provide a basis for spatial, temporal or thematic specifications. Also, despite the fact that land monitoring information is crucial in many domains, so far, there has been no Land Framework Directive in Europe. The higher-level strategic policies, such as the EU Energy Union often refer to the monitoring linked to other activities such as REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to rely on currently available products. For instance, LULUCF assessments in Europe (independently from national reporting obligations) use CLC (with a 25 ha MMU), while monitoring at the global scale is using Global Land Cover 2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU). Reference was made to future monitoring in Decision No 529/2013/EU of the European Parliament and of the Council on accounting rules on greenhouse gas emissions and removals resulting from activities relating to LULUCF, but this only refers to MS activities with MMUs between 0.05 ha and 1 ha. Many MS also employ sample based field surveys (e.g. national forest inventories) to collect the required LULUCF data. Some of the initiatives associated with LULUCF have explored the potential of improved monitoring performance, such as the use of SPOT4 imagery with a 20 m spatial resolution to show land-use change between 2000 and 2010 for REDD+ reporting, but until very recently it has not been practical for these types of monitoring to become operational globally. A more fruitful avenue for user requirements in support of the 2 nd generation CLC is to consider the initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU Biodiversity Strategy to 2020 was adopted by the European Commission to halt the loss of biodiversity and improve the state of Europe s species, habitats, ecosystems and the services. It defines six major targets with the second target focusing on maintaining and enhancing ecosystem services, and restoring degraded ecosystems across the EU, in line with the global goal set in Within this target, Action 5 is directed at improving knowledge of ecosystems and their services in the EU. Member States, with the assistance of the Commission, will map and assess the state of ecosystems and their services in their national territory, assess the economic value of such services, and promote the integration of these values into accounting and reporting systems. Verwijderd: this Verwijderd: now Verwijderd: develops Verwijderd:, Verwijderd: will be integrated Verwijderd: towards those for Verwijderd: Although the Verwijderd: thematic issues Verwijderd: extensively reported Verwijderd: some time Verwijderd: target Verwijderd: are less clear. Verwijderd:, particularly assist Verwijderd: (and not on national level Verwijderd: However, Verwijderd: a Verwijderd: requirement Verwijderd: of Verwijderd: the products Verwijderd:, Verwijderd: they use Verwijderd: these 18 P age

20 The Mapping and Assessment of Ecosystems and their Services (MAES) initiative is supporting the implementation of Action 5 and, although much of the early work in this area on European level has used inputs such as CLC, it is obvious that to fully address this action an improved approach is required especially for a better characterisation of land, freshwater and coastal habitats, particularly in watershed and landscape approaches. The spatial resolution at which ecosystems and services should be mapped and assessed will vary depending on the context and the purpose for which the mapping/assessment is carried out. However, information from a more detailed thematic characterisation and classification and at higher spatial resolution are required which are compatible with the European-wide classification and could be aggregated in a consistent manner if needed. A first version of a European ecosystem map covering spatially explicit ecosystem types for land and freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully exploit the recently available Copernicus Sentinel data and land products, and move down to a higher spatial resolution. The primary goal of the 2 nd generation CLC is to underpin European policies and, where possible, fulfil their needs. Furthermore, the EEA also wants to make the new products useful at national, regional and even local levels. To this end the requirements gathering exercise was extended to a broader stakeholder community at the national level and below, including representatives of commercial service providers, NGOs and academia. This work drew on existing surveys (e.g. MS CLC survey conducted by ETC-ULS 5 ) and the feedback from the two events where the 2 nd generation CLC was promoted. The recent EC-funded NextSpace project collected user requirements for the next generation of Sentinel satellites to be launched around These requirements were analysed on a domain basis. From the land domain, those requirements which were given a context of land cover (including vegetation), glaciers, lakes, above-ground biomass, leaf area index and snow were considered. A broad range of spatial resolutions were requested from sub 2.5 m to 10 km. Two thirds of the collated requirements wanted data in the m region. The requested MMUs were also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field / city block level for much of Europe. The temporal resolution / update frequency was dominated by yearly revisions, although some users could accept 5 yearly updates. Some users also wished for monthly updates, although the reason was unclear. The users were less specific about the thematic detail required and the few references were given to CLC, EUNIS, LCCS and basic land cover. The acceptable accuracy of the products was in the range 85 to 90%. The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of EIONET members involved with the CLMS and CLC production considering a range of topics in advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the potential improvements that could be made towards a 2 nd generation CLC product. As expected, it was noted that some CLC classes cause problems because of their mixed nature and instances on the ground that are sometimes complex and difficult to disentangle. It was suggested that this situation could be improved by the use of a smaller MMU so that the mapped features have more 5 Service Contract No 3436/R0-Copernicus/EEA.56586, Task 3: Planning of cooperation with EEA member and cooperating countries. Final report in preparation of the CLC2018 exercise Verwijderd: in the Verwijderd: time horizon. Verwijderd: and from Verwijderd: Verwijderd: ) were initially considered. This was extended so that related contexts such as Verwijderd: Verwijderd: also Verwijderd:, but two Verwijderd: As would be expected the Verwijderd: and some Verwijderd: were Verwijderd: preferred 19 P age

21 homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear features to be represented more realistically. Of the 32 countries, who responded to the questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general, a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to remaining at the current 25 ha, but the majority favoured to 5 ha. Thematic refinement is also supported by around one third of the respondents, who requested improved thematic detail, separation of land cover from land use, splitting formerly mixed CLC classes and the addition of further attributes to the spatial polygons. During the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017 a number of MS representatives were asked to provide their requirements for land monitoring and their thoughts on the 2 nd generation CLC proposals as presentations to the whole event. Their required spatial detail ranged from 0.5 ha MMU down 5 m pixel resolutions, which are at the most detailed end of what had been suggested to date and potentially the smaller spatial resolutions are out of scope for the 2 nd generation CLC. For the required thematic detail, it was suggested to go beyond the current CLC 44 class nomenclature and aim for either a full level-4 classification or split the more vague or ambiguous classes. There was a unanimous request for yearly updates and a general feeling that the quality of the data should be improved relative to the local component products. As the final session in the workshop, a survey of the attendees (via an online voting tool) was undertaken against a set of questions designed to understand the audience s feeling towards the 2 nd generation CLC and assess their preferences for some of the key technical specifications. There was strong support for 2 nd generation CLC approach with 88 % of respondents considering the products to have relevance at European and national levels with 35 % also thinking they would be useful at sub-national levels. Similar levels of support were found for a spatially detailed land cover product with limited thematic content, i.e. CLC-Backbone. On the technical specification of CLC-Backbone, 80 % felt a MMU of 1.0 ha or smaller would be appropriate, but this was almost evenly split across 0.25 ha, 0.5 ha and 1 ha. The CLC-Backbone MMW was evenly split between 10 m and 20 m and a lively debate was produced around the impact of the input EO data. The dominant level of the preferred thematic detail was around 10 classes with 51 %, although 27 % voted for an even less detailed nomenclature of around 5 classes. In terms of the update cycle for CLC-Backbone, the majority of the audience thought 3 years as sufficient, however almost a quarter wanted yearly updates. For the CLC-Core the preferred grid size was less clear, ranging from 10 m to 100 m, with the extremes dominating and representing almost equal numbers of votes. Further issues regarding the data sources and contents of the 2 nd generation CLC products were also surveyed, but these will be dealt with in the relevant chapters. During the project, the EEA and the EAGLE Group have received a large amount of written feedback from MS and other stakeholders which has been addressed where appropriate in the relevant product chapters below. Some of the feedback, although important, is out of scope for this project and is being dealt with separately by the EEA and DG-GROW. The key themes of the feedback have been: Clarification of the overall 2 nd generation CLC concept. The team appreciate that this is quite a departure from current practice within the CLMS and during the iterations of the deliverables the description of the concept has been clarified and improved, particularly in ha being required by LULUCF 20 P age

22 visualising the relationship with the exiting CLMS components and the links between the individual products of the 2 nd generation CLC. Suggestions regarding the technical specifications. A large number of technical issues have been identified from the feedback from the stakeholders. These issues have been reviewed and incorporated where appropriate, practical and feasible within the revision of the technical specifications given in the later chapters of this deliverable. Use cases for the 2 nd generation CLC products. The requirements for an improved version of CLC has been known for some time and the detailed specification for CLC+ is now well in line with a range of global and European policy and environmental management needs. The new products within the 2 nd generation CLC, such as CLC-Backbone and CLC-Core, have a vital role to play in the delivery of CLC+. Still, there are so far, no admissible use cases. The required use cases should be developed as end users become more familiar with the capabilities. Inputs to the products from MS and other sources. Inclusion of MS data in the production of the 2 nd generation CLC products matches the bottom-up approaches promoted by the EEA and EAGLE, and may help make the products more relevant for use at the national and subnational level. There are, however, a number of technical and organisation challenges which need to be addressed. This project will address the technical issues by proposing additional specifications for potential MS inputs while EEA and DG-GROW will deal with the organisational issues. Quality of the 2 nd generation CLC products. A further issue of improved spatial and thematic detail is the need to increase the quality of the products so that they will be accepted by the user community. The conceptual strategy aims to support high quality products by allowing CLC-Backbone to focus on improvement of the spatial detail and CLC-Core to focus on the enrichment of thematic detail. Change detection, update cycle and maintenance. The aim of this project was to propose the initial technical specification for the 2 nd generation CLC products, not to design a methodology for the updating and maintenance of the product in an operational monitoring environment. These issues have been noted where relevant in the technical specifications and are likely to feature more prominently in later development work. Feasibility studies. The project was designed as a desk study and there was no requirement or resources to undertake any testing of the specifications proposed or methods that may be relevant. Where appropriate existing feasibility study results were incorporated into the reviews and technical specification development, such as the work undertaken in Hungary. Governance and implementation of the 2 nd generation CLC. As was expected, the proposal of a set of products more closely aligned to MS activities, yet integral to the CLMS, has raised a number of issues related to governance and implementation of work. These issues are out of scope for this project, but have been noted and are being dealt with by the EEA and DG GROW. From the requirements review so far, the feedback from the workshops and considering the specific requirement associated with LULUCF, it is suggested that the ultimate product of a 2 nd generation CLC+ should have a MMU of around 1 ha (with the ability to estimate areas down to 0.5 ha), a MMW of 20 m and be based on EO data with a spatial resolution of between 10 and 20 m. The thematic content would be a refinement of the current CLC nomenclature to cope with changing MMU, Verwijderd: requirements put forward by Verwijderd: product would 21 P age

23 separation of land cover and land use for the needs of ecosystem mapping and assessment. The temporal update could come down from 6 years to 3 years in the short-term, and potentially to 1 year in the longer-term. To reach this ultimate goal and initiate implementation of the 2 nd generation CLC, the requirements review supports the role of the CLC-backbone and CLC-Core in contributing to the development and delivery of the required land monitoring system. 22 P age

24 4 CLC-BACKBONE THE INDUSTRY CALL FOR TENDER 4.1 Introduction CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-towall coverage of CLC-Backbone shall draw on, complete and amend the incomplete picture formed by the current local component products (Urban Atlas, Riparian Zones and Natura 2000) which cover approximately a quarter of total area as shown in Figure 4-1. CLC-Backbone will provide a comprehensive and detailed coverage of whole EEA-39. The objective of the CLC-Backbone is to provide: A useful standalone product a spatially detailed geometric base for CLC+ Basic homogeneous land cover units (EAGLE terminology) as thematic input to CLC-Core and CLC+ Verwijderd: started by the LoCo (appr. 1/3 of total area) of Verwijderd: as are currently Verwijderd: Figure 4-1 Verwijderd: It Verwijderd: be Verwijderd: effectively complete the Verwijderd: the EU-28 as a minimum, but preferably the Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red UA, yellow - N2K, blue RZ LC) CLC-Backbone will initially be produced by a mostly automated industrial approach divided into three distinct steps (Figure 4-2). The spatial units of CLC-Backbone, called landscape objects, are Verwijderd:, Verwijderd:, where production can be separated Verwijderd: different levels, including different degrees and sequential order of automation and human interaction Verwijderd: Within 23 P age

25 vector polygons. They are attributed using a wall-to-wall pixel-based 10*10m 2 classification and further object-based characteristics. Verwijderd: defined as Verwijderd: and identified on different levels. Verwijderd: Step 1: The first level of object borders (skeleton) represent persistent objects ( hard bones ) in the landscape (Step 1).... [6] Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects based on this classification and additional Sentinel-1 information. For implementation of CLC-Backbone the following principles are taken into account in order to contribute to a harmonized European land cover monitoring framework: Heritage: The concept takes into account the existing pan-european and local Copernicus layers. The technical specifications to be provided by this contract are therefore based on a thorough review of these datasets and their geometry and thematic content. Integration: CLC-Backbone will integrate selected geometric information from existing COPERNICUS products and ancillary data within the process. 4.2 Description of CLC-backbone CLC-backbone will consist of two sub-products: Raster product: pixel-based (Sentinel-2 based), 10*10m 2, pure land cover classes Vector product: 1 ha MMU, derived from linear networks and image segmentation, attributed with aggregated statistics from the raster product and additional characteristics from Sentinel-1 Verwijderd: existing Verwijderd: has to be considered: Verwijderd: has to take Verwijderd: and the spatial and thematic representation of the landscape. Verwijderd: available and feasible Verwijderd: Copernicus Verwijderd: at different steps 24 P age

26 The production of the CLC-backbone foresees four main steps: Step 1: The first level of object borders (geometric skeleton) derived in this step represent persistent object borders ( hard bones ) in the landscape (Step 1). The borders are derived from ancillary data (roads, railways and rivers). Step 2: On a second level a subdivision of the persistent objects (Level 1) will be achieved through image segmentation ( soft bones ), based on multi-temporal Sentinel data within a defined observation period (resulting in Level 2 landscape objects = polygons). The geometric output of CLC-Backbone the delineated landscape objects represent spectrally and/or texturally homogeneous features that are further characterised and attributed in step 3. Step 3: Production of an independent pixel based (10*10m 2 ) pure land cover classification with pure land cover classes based on the EAGLE land cover component concept. Step 4: Characterization of the landscape (objects/features) from Step 2 using the pixel values from Step 3. Further attribution using spectral characteristic of the object (mean, variance) from Sentinel-2 and Sentinel Level 1 Objects hard bones The idea behind the Level 1 objects ( hard bones ) is to derive a partition of landscape that is reflecting relatively stable (persistent) structures and borders built from anthropogenic and natural features. The transportation network like roads and railways (both without tunnels) is the major component, followed by the river networks. When building the spatial framework, there is a potential long list of sources for persistent object geometries. However, throughout the consultation phase the decision was taken to solely concentrate on linear networks and provide all other object geometries directly from Image segmentation Level 2 Objects soft bones The aim of this second level segmentation is to derive objects that can be differentiated by the spectral response and behaviour throughout a year. With the increasing number of observation dates per pixel throughout a year the likelihood to find homogeneous management units increases, but at the same time pixel based noise is increased as environmental conditions within a field parcel may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the number of acquisitions to maximize the identification of single field parcels and on the other side to reduce salt and pepper effects due to varying spectral response. The level 2 landscape objects represent in the ideal case a consistent management of field parcels, and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a year. As an example, different settlement structures (e.g. single houses or building blocks) should appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial built-up pattern) and will therefore be delineated as separate polygons. The actual cultivation measures applied on the land will also influence the type of delineation as e.g. in agriculture the field 25 P age

27 structure is visible in multi-temporal images due to the differences in temporal-spectral response according to the applied management practices on field level (mowing, ploughing, sowing). Table 4-1: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones) 1. Level objects 2. Level objects Name Hard bones Soft bones Method Partition of landscape according to Image segmentation existing spatial data Input data Persistent landscape objects: Transport infrastructure (roads, railways excluding tunnels); river network Sentinel-2 complete time-stacks of one vegetation period (amended by Landsat 8 data in case of clouds/haze) Special issue Advantage Disadvantage Geometric transformation of line geometry into vector-polygons and further transformation of vectorpolygons into raster based regions Very good cost/benefit ratio; no doubling of efforts, reuse existing European geospatial data infrastructure; Data might be outdated; large vector processing facilities needed; effort to integrate and collect appropriate data Pre-processing (atmospheric and topographic) of S-2 images; Independent data source; Clearly defined observation period (2018) Reliability and reproducibility of segmentation algorithms Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres necessary Verwijderd: Technical specifications Spatial scale / Minimum Mapping Unit Raster product: MMU 100 m2 (10*10m 2 Pixel) Vector product: CLC-Backbone shall address landscape objects with a predefined o MMU of 1 ha and a o Minimum mapping width (MMW) of 20m. Units (final landscape objects resulting from merged Level 1 and Level 2) o o Spectral and/or textural homogeneous (over time) units that represent meaningful objects in the real world (unique land cover and/or homogeneous vegetation dynamics) They are characterized with the land cover class that is predominantly occurring throughout the season (e.g. differentiation between trees and clear-cut according to dominating time period of occurrence within observation period) Verwijderd: <#>Sequence of production: Ideally, the production of CLC- Backbone and of the other Local Component data sets would be arranged in a sequential order that CLC-Backbone can build and integrate on the most updated Local Component data to achieve best consistency among the layers.... [7] Verwijderd: features Verwijderd: 0.5 Verwijderd: minimum Verwijderd: 10m (these values are subject to the ongoing consultation process Verwijderd: might be revised). 26 P age

28 o o o Examples for meaningful objects are: Single agricultural management units (land parcels) Tree stands that are homogeneous according to criteria age, mixture and/or management Clear cuts City blocks Extraction site Lakes Wider roads, railways and rivers (>=20m) Short-term temporary effects (less than 6 month) like variation in soil water content or temporary flooding do not constitute a meaningful object in the context of CLCbackbone Adjacent Units can have the same land cover code Area coverage of units o o As they might only be separated by a road or river, but nevertheless comprise different spatial units with unique management cycle and spatial configuration (and thus different spectral-temporal characteristics) Wall-to-wall 100% of area has to be covered by segmented units Delineation accuracy of geometric units The appropriate size and delineation are defined by a sample of verification polygons derived via visual interpretation The delineation accuracy of the resulting polygons (vector product) is defined with Appropriate size: o Too large polygons (overestimation of object area): not more than 10% of all polygons o Too small polygons (underestimation of object area): not more than 15% of all polygons Any deviation of more than 15% of polygon area is considered too small or too large Appropriate delineation (positional accuracy) o Shift of border: max. 20m shift not more than 10% of perimeters is shifted more than 15m Reference year The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e. Sentinel-2 HR layer stack. The image stack covers one full vegetation season with 2018 as reference year, but taking regional conditions into account. The vegetation season in Mediterranean regions starts in October of the previous year and the vegetation season is for this region therefore set to 10/2017-9/2018. Northern regions, with short vegetation season, and areas along the Atlantic coast (with frequent and high cloud cover) may, Verwijderd: Each object (delineated polygon) in CLC-Backbone will be encoded according to a quite basic land cover nomenclature (between 5-15 pure land cover classes, in line with EAGLE Land Cover Components) and additionally characterized depending on the user requirements - by a number of attributes (e.g. NDVI time series) giving more detailed information about the land cover and their dynamics inside the polygon.... [8] Verwijderd: It is suggested that the Verwijderd: ranging from 2017 to Verwijderd:, noting that the vegetation season in the south of Europe starts already in October of the previous 27 P age

29 even with Sentinel-2, be impossible to cover with data from a single season. A pragmatic solution is then to include imagery from 2017 or earlier if necessary. Special condition, currently not known to the development team, may of course also imply similar pragmatic deviations from the reference year in other parts of Europe EO data The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available observations from the ESA service hubs. The aim is to provide a multi-spectral, multi-date, m spatial resolution imagery as basic EO-input. This selection can be amended by Sentinel-1, where helpful. As the full satellite constellation of Sentinel (using 2A and 2B) is available operationally from late 2017 onwards, it is anticipated that last months of 2017 and full year 2018 is the first full vegetation period for a comprehensive multi-temporal coverage of Europe. The complete layer stack of multiple time series of all S-2 (and if necessary S-1) acquisitions have to be considered to receive in minimum 6 (i.e. monthly) acquisitions for the 2018 vegetation period (that is extended to 2017 observations in Mediterranean areas) in order to: fully track the seasonal cycles of vegetation increase the automation of the processing ensure the homogeneity of results across Europe Older Sentinel-2 archive data (before 2018) might be used to close data gaps for 2017/18, e.g. due to cloud cover (specifically in northern countries) to confirm stable object geometries ( soft bones ) In case of availability of a European-wide coverage of VHR data for 2018, its integration in the processing chain should be considered (either for accompanying the production or for verification). The synergetic use of Sentinel -1 (SAR) imagery shall be considered for improving classification accuracy and enhancing the thematic information, especially on thematic issues like soil properties (wetness). Recent studies on the use of merged optical-sar imagery as well as SAR visual products and the experiences in HRL production have confirmed their applicability for both semi-automated classification and visual interpretation. Landsat 8 or equivalent data shall be used for gap filling in case of insufficient coverage by Sentinel Pre-processing of Sentinel-2 data As Sentinel-2 is an optical system the average cloud coverage influences the number of observations significantly. All cloud free (and shadow-free) pixels of an image can be analysed. The constellation of Sentinel-2 satellites provides an average the revisiting frequency of 3-4 days in Europe. The coverage is even more frequent in northern regions due to the overlaps of the S-2 path-footprints. EO imagery has to be pre-processed to correct for atmospheric and topographic conditions. ESA has decided that Sen2Cor will be used for correction of imagery for Europe and will reprocess the L2A level of Sentinel 2 imagery for Europe back to May 2017 using this algorithm. The current plan is thus to start atmospheric correction using Sen2Cor in Europe and to reassess the algorithms (e.g. Maja) in In case radiometric pre-processing using ESA Sentinel data hub is not available in time, service providers will have to perform their own radiometric pre-processing. Service Providers will therefore have to prove their ability to perform the pre-processing as well as a quality assurance to exclude scenes with displacement errors (common problem with S-2). Verwijderd: will be Verwijderd: month Verwijderd: full Verwijderd:. Verwijderd: synergic Verwijderd: ), tillage and harvesting activities. Verwijderd: Nowadays scenes are not ordered anymore according to their average cloud coverage, but all Verwijderd: both Verwijderd: improves Verwijderd: time to Verwijderd: on average Verwijderd:, having a Verwijderd: coverage Verwijderd: the north according Verwijderd: Each scene Verwijderd: evaluated two different types Verwijderd: atmospheric corrections software Verwijderd: in order to produce a L2A product: The cloud detection is an important element Verwijderd: Sen2Cor algorithm identifies clouds and shadows in a singular scene-by-scene approach, whereas Maja... [9] Verwijderd: MAJA Verwijderd: from 1/2018 onwards Verwijderd: The cloud detection is an 28 P age

30 4.4 Thematic attribution Raster product: Classification of individual pixels according to the nomenclature described is chapter 0. below Vector product: The polygons produced by geometric segmentation (merging Level 1 and Level 2) specification are classified using three different approaches. 1. Attribution with basic land cover class statistic based on pixels in the raster product found in the landscape object Statistical parameters to be assigned to the landscape object i. Dominating land cover class (majority) ii. Ordered list of land cover class percentages (3 most dominating land cover classes) iii. Percentage of land cover class i j 1. count of pixels per land cover class/total number of pixels per landscape object 2. Land cover class (chapter 0) determined by conventional analysis of the spectral properties of the entire object 3. Attribution according to Sentinel-1 data (1) Wetness (2) Roughness Verwijderd: Verwijderd: P age

31 4.4.1 Thematic classes Thematic detail: After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape object is then labelled by simple land cover classes. The descriptive nomenclature that is applied for the labelling contains the following EAGLE derived Land Cover Components: 9 land cover classes 1. Sealed surface (buildings and flat sealed surfaces) 2. Woody coniferous 3. Woody broadleaved 4. Permanent herbaceous (i.e. grasslands) 5. Periodically herbaceous (i.e. arable land, natural grasslands with periodic vegetation cover) 6. Permanent bare soil 7. Non-vegetated or scantly vegetated surfaces (i.e. rock, screes and sand, lichen, sparsely vegetated alpine heath) 8. Water surfaces 9. Snow & ice Remark to class woody: Representatives from some member countries have expressed the wish to further differentiate the class woody into trees and shrubs. This differentiation would be useful but can currently not be achieved with sufficient accuracy from spectral attributes alone. A robust differentiation is only possible by using a normalized difference surface model (ndsm). Verwijderd: processing chain, but for northern In the CLC-Backbone nomenclature code list no land use terminology has been used. This is done to avoid semantic confusions between land cover and land use. Land use is, however, implied in the separation between permanent and periodically herbaceous land. permanent herbaceous o Permanent herbaceous areas are characterized by a continuous vegetation cover throughout a year. No bare soil occurs within a year. These areas are either unmanaged or extensively managed natural grasslands or permanently managed grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or even set-aside land in agriculture. For managed grasslands, the biomass will vary over the year, depending on the number of mowing (grassland cuts) or grazing events. periodically herbaceous o Periodically herbaceous areas are characterized by at least one land cover change (in the sense of EAGLE land cover components) between bare soil and herbaceous vegetation within one year. Depending on the management intensities these areas can also have up to several changes between these two EAGLE land cover components within a year. Normally these areas are managed as arable areas. 30 P age

32 A permanent and managed grassland as defined in IACS/LPIS may be ploughed every 3-5 years for amelioration purposes followed by an artificial seeding phase and a renewal of vegetation cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6 years. This has to be considered, when evaluating shorter time phases, as by occasion a managed permanent grassland may accidentally be subject to renewal within the observation period of 1 year. Complex classes, either as mixture of existing land cover classes (e.g. CLC-mixed classes) or in the sense of ecologically diverse classes (e.g. wetlands), are avoided because the nomenclature solely concentrates on the classification according to surface structure and properties. A detailed technical description how to model the temporal changes is given in Annex 3. Important note: As the main emphasis is on the geometric delineation of landscape objects neighbouring objects may be assigned the same land cover code. The polygons should still be kept as unique entities, as they can be different with respect to attributes that may be added in a later stage (e.g. broadleaved trees vs. needle-leaved coniferous trees, count of grass cutting, number of bare soil count, wetness ). Accuracy: The accuracy for the thematic differentiation is defined with 9 land cover classes o 85% overall accuracy: for 10*10 m2 pixel based product Omission errors: max. 20% Commission errors: max. 20% o 90% overall accuracy: for vector based product Omission errors: max. 20% Commission errors: max. 20% Remark on further development towards CLC+: Apart from the technical specifications for the foreseen tender for CLC-Backbone, additional attribution by various data sources is an option for future enrichment of classes towards the development of CLC+. First of all, member countries can be asked to populate the resulting geometries with national data. In addition, crowd-sourced data may be used to characterize the land use within these polygons (either by in-situ point observations or by wall-to-wall datasets e.g. OSM land use). 4.5 European ancillary data sets for Level 1 hard bone creation Throughout the consultation process a number of potential datasets that could contribute to the creation of the hard bones (level 1 objects in Figure 4-2) were discussed. The team decided, after thorough consideration of all proposals, to concentrate solely on linear networks such as roads, railways and rivers as input to the hard bone. Verwijderd: the typical cloud coverage Verwijderd: a very limiting factor. Therefore, either archive data from longer periods or data Verwijderd: sensors (optical and radar) should be considered to fill gaps due to cloud coverage. There are Verwijderd: There are Verwijderd: input sources beyond EO image data that can be used in the delineation of landscape objects. Table 4-1 shows the potential Verwijderd: were analysed Verwijderd: form Verwijderd: majority Verwijderd: or persistent boundaries in the landscape. Only those data can be considered that provide an adequate spatial resolution. Those data that are finally suggested to contribute to CLC-Backbone are marked in green, whereas Verwijderd: other data will be used to attribute the thematic content of the GRIDdatabase in CLC-Core.... [10] Verwijderd:, buildings, LPIS and land cover/land use) are held on national level in many cases in higher level of detail. Some of the European data might be substituted by national data given that the criteria concerning technical and licensing issues as described in chapter 0 are met.... [11] Verwijderd: e.g. street data (e.g. Navmart HERE maps, etc.) or the TanDEM-X DEM for small woody features and structure information for tree covered areas.... [12] 31 P age

33 Table 4-2: Overview of existing relevant products which were analysed as potential input to support the construction of the geometric structure (Level 1 hard bones ) of the CLC-Backbone. Product comment Format Potential use for constructing basic geometry Roads and railways National data portals 7 National reference data line National INSPIRE data set provision; not harmonized European dataset, varying license conditions, varying levels of roads, varying information on tunnels Open Street Map roads and railways Crowd sourced data 7 At the current stage not all geographic information is available via the national INPSIRE portals Line Linear transportation network (centre lines), standardized classes, information on tunnels available Reference year Varying throughout member countries Frequent change mapping; full time history 32 P age Verwijderd: 1 Verwijderd: Copernicus land... [13] Verwijderd: product Verwijderd: MMU Verwijderd: Minimum width Verwijderd: Pan-European... [14] Verwijderd: 25 ha status layer... [15] Verwijderd: 100 m Verwijderd: Vector Verwijderd: outlines of basic... [16] Verwijderd: 2012, 6-year update cycle Verwijderd: HRL imperviousness... [17] Verwijderd: Varying from block... to [18] Verwijderd: - Verwijderd: Vector Verwijderd: Freely available and... [19] Verwijderd: 2017, Yearly updates Verwijderd: - Verwijderd: n.d. Verwijderd: Up-to date Verwijderd: )

34 Product comment Format Potential use for constructing basic geometry HERE maps (Navmart) Rivers EU Hydro (Copernicus reference layer) WISE WFD surface water bodies Operational Commercial data Free an open COPERNICUS product bottom-up dataset from official national hydrography networks Line Line Waterbodies as polygons Waterbody line as line geometry Commercial layer; licensing constraints Geometric quality derived from VHR images, very high location accuracy Based on WFD2016 reporting (UK and Slovenia only for viewing) Reference year Regular updates based on business processes (6 year update cycle, next update 2022) Verwijderd: product Verwijderd: MMU Verwijderd: Minimum width Verwijderd: n.d. Verwijderd: to replace OSM or national road databse Verwijderd: Tbd. Verwijderd: Polygon + line Verwijderd: years ( Verwijderd: Waterbodyline Verwijderd: European coastline... [20] 33 P age

35 The usage of additional geospatial information for constructing Level 1 hard bones is based on the following arguments: European landscape is a highly anthropogenic transformed landscape, where transport and river networks (beside others) form the basic subdivision of landscape The location of transport and river networks are fairly known due to geospatial data on national and/or European scale Many of the borders defined by transport and river networks can also be identified from Sentinel-2 images, but at high costs and with lower accuracies Usage of linear networks: The linear networks are used to represent line features (in the sense of being smaller than 20m) on the earth surface. Therefore, all kind of tunnels are excluded. The linear networks (centre line of roads, railways and rivers) represent relatively stable borders of landscape objects (e.g. parcels/agricultural fields/city blocks that are surrounded by roads, railways and rivers). This division of landscape is used as input to step 2 for the subsequent segmentation. Wider roads (>20m) however establish landscape objects themselves. It is important to note that no attribute information is needed for the integration of the linear networks. Any part of the linear network represents a border for the adjacent polygon, therefore in the final partition of the landscape not even the information, if the underlying border represents a road, railway or river is of any importance anymore. The vector line network is transformed into polygons (1 ha MMU). These polygons are further subdivided using Sentinel-2 data in Step 2 using automated image segmentation. Roads, railways and rivers that are wider than 20m have to be covered in step 2 as polygon features. In case automated image segmentation algorithms do not fully recover these narrow and elongated polygons a manual editing phase for sufficiently covering those wider linear networks (that comprise polygon features) has to be foreseen Criteria for ancillary data input The geospatial data for the creation of the CLC-Backbone have to meet following specifications: Provision of a consistent set of information across European countries o Scale: 1: or better o Geometric quality: +/- 5m positional accuracy o Geometric representation: Centre line Parallel lines (distance less than 20 m, e.g. driving directions) have to be merged o Completeness: no spatial or thematic data gaps Spatial completeness: >95% of network covered Thematic completeness: all types of roads from highway down to agricultural tracks covered o Network: connected network no dangles snapping tolerance 10m EAGLE 1 Verwij local Com Riparian Z have valu productio will be ma and only a Backbone EAGLE 1 Verwij directly EAGLE 1 Verwij combined EAGLE 1 Verwij are used i EAGLE 1 Verwij two differ EAGLE 1 Verwij EAGLE 1 Verwij Concretel existing C the produ for the pr and CLC-C informati EAGLE 1 Verwij rivers) EAGLE 1 Verwij existing C the produ for the pr and CLC-C informati EAGLE 1 Verwij EAGLE 1 Verwij (partially will be cr the image geometry of the anc cover cha phase of t segmenta 34 P age

36 o Specifically, for roads and railways Information on tunnels per line string Road types: Standardized and comparable road type categories across Europe all types of roads (from highway to agricultural tracks) o Specifically, for rivers Completeness: All rivers within catchments larger than 10 km2 Provision of data at no cost or at reasonable costs, where cost reduction in the overall process justifies the costs Support of the full, free and open data policy of the end product o Access to final data is based on a principle of full, open and free access as established by the Copernicus data and information policy Regulation (EU) No 1159/2013 of 12 July 2013 Provision of data to industrial service providers, potentially outside the country of the data provider. Availability of the data to the service provider at the start of the project implementation. A combination of datasets is feasible (e.g. WFD-WISE dataset within EU amended by EU-Hydro outside EU coverage), if both dataset comply with the specification criteria Ancillary input data Three different possible data sources are identified for transport network (Table 4.1): National reference data Open street map data Commercially available navigation data For rivers two European datasets are available Water framework directive WISE EU-hydro (COPERNICUS product) The choice of dataset to be used for the production has to consider the criteria defined in chapter If alternative dataset exists (e.g. availability of national reference data and OSM) the objective criteria in chapter are used to select the data with the highest fitness of purpose Open street map data Throughout the last years the Open Street Map has turned to be the major and comprehensive provider for road and railway networks worldwide. Data collected by volunteers and provided with an open-content license. One major critic when using crowd-sourced data is their incompleteness and varying quality. For OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, ) have shown a completeness in Europe of by far more than 96% for the majority of the EEA 39 countries (the figures for the completeness per country are available in the Annex 3). According to their study only Serbia, Bosnia-Hercegovina and Turkey have completeness below 90%. These three countries may be subject for a refinement of roads using additional national data sources. In the Figure 4-3 below the completeness of OSM in four countries (Estonia, Portugal, Romania and Serbia) is visualised on aerial images and Sentinel-2 background data. 8 Verwijderd: OSM Verwijderd: data Verwijderd: large Verwijderd: values Met opmaak: Lettertype:11 pt, Niet Cursief Verwijderd: Figure 4-3 Verwijderd: an impression of Verwijderd: provided based 35 P age

37 Verwijderd: Verwijderd: Verwijderd: Verwijderd: Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania (near Parta) and Serbia (near Indija) (top to bottom). Bing/Google and EOX Sentinel-2 cloud free services as Background (left to right). License: The Open Street Map data is licensed under the Open Data Commons Open Database License (ODbL) by the Open Street Map Foundation (OSMF). As the input data from OSM is restricted to geometric properties and these properties are substantially altered according to the geometric transformation from original vector data into the grid database and combined with remote sensing Verwijderd: 36 P age Verwijderd:

38 data the CLC-backbone is not regarded as a derivative database of OSM, but a collective database. Therefore, according to Art. 4,5 of the ODbL v1.0 9 the COPERNICUS standard license (guaranteeing a full, free and open access with the possibility to derive business application) can be applied to the dataset INSPIRE national portals: roads & railways The project ELF European Location Framework 10 has compiled overviews for existing national services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single point of access for harmonised reference data from National Mapping, Cadastre and Land Registry Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for European reference data, but at the current point in time the data situation even for core national data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing (semantically and geometrically) the various dataset. Therefore, ELF is currently not suited to provide data according to the criteria defined in chapter The recent survey (status: 19. Jan. 2018) in the official INSPIRE portal 11 reveals that in total 101 datasets are available from 10 member countries for download in the data theme transport networks. Some countries do only provide regional distributed datasets, but not singular national wide download services. Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: Jan. 2018) P age

39 Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across member countries (Status: Jan. 2018) The advantage of using national data instead of European wide available data has to be carefully evaluated with regard to costs and benefit. First of all, it is still a challenge to find the appropriate dataset. Although it is known that other national services exist (e.g. Spain), they cannot be found using this central access portal. This has to be considered, when replacing existing European wide data sources with national data. The current available national data are not harmonized across member countries regarding the e.g. definition of road-types or completeness of road types Rivers and lakes One example for a joint European (EU 28) dataset is the reference spatial data set of the water framework directive (WFD) that is part of the water information system for Europe (WISE) 12. They compile national information reported by the EU Member states to one homogeneous dataset. According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is available in a version 2010 and Such datasets may provide a valuable data source for the generation of hard bones, but still do not provide the optimal solution. In the case of the WISE WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as Verwijderd: <#>Provision of data to industrial service providers, potentially outside the country of the data provider. Verwijderd: <#>Support of the free and open data policy of the end product, including the nationally provided input... data. [26] P age

40 countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only available for EEA and EC for visualisation purposes). Compared to the EU-Hydro dataset they provide a higher geometric quality, but are less complete, as smaller rivers of catchments less than 10 km2 are not included by definition. A combination of these datasets is recommended Remark: land parcel identification system (LPIS) The land parcel identification system (LPIS) in Europe is at the current stage not suited to be integrated into the Level 1 hard bone process, as the data is currently either not completely available on national level (approx. 50% of countries provide data) partly only available on regional level (Spain, Germany) do not represent the same spatial scale across countries are not available with the same semantic information (e.g. grassland/arable land differentiation) if data are harmonized under INSPIRE, they are provided either under Annex II land cover or Annex III land use (but not in one homogeneous data format) not accessible for all users / uses A detailed explanation is given in the Annex 4. It is expected that the availability and homogeneity of IACS/LPIS data in Europe is improving in the next years. For a future update of the CLC-backbone this improved availability of LPIS data will play an important role. Verwijderd:... [27] Verwijderd: The project ELF European Location Framework 13 has compiled overviews for existing national services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single point of access for harmonised reference data from National Mapping, Cadastre and Land Registry Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for European reference data, but at the current point in time the data situation even for core national data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing (semantically and geometrically) the various dataset. The advantage of using national data instead of European wide available data has to be carefully evaluated with regard to costs and benefit. Verwijderd: First of all, it is still a challenge to find the appropriate dataset, and even specialised projects like ELF that would like to serve as a single-entry point do... not [28] Verwijderd: wide Verwijderd: existing data sources with national data.... [29] Verwijderd: appr Verwijderd: %) Verwijderd: Figure 4-5 illustrates Verwijderd: datasets across Verwijderd:. Availability in this sense means the possibility to physically download Verwijderd: GIS data in the respective country / region.... [30] Verwijderd: is derived using a twostep approach (Table 4-2).... [31] Verwijderd: multi-temporal observations in the order of 1-2 weeks these kind of changes can be observed on an... [32] Verwijderd: Therefore, within CLCfeatures of the landscape are defined as persistent objects that are very unlikely... to be [33] 39 P age

41 4.6 Short-Form of CLC-Backbone end product technical specifications CLC-Backbone [Example image if available.] Input data sources (EO & ancillary) EO data used in the production Spatial resolution m Spectral content Visible, NIR, SWIR, SAR Temporal resolution in minimum monthly cloud free acquisitions Source data Sentinel-2 amended by Landsat 8 where necessary Contributing data Sentinel-1 Description CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format (accompanied by a raster product layer) providing a geometric backbone with limited, but robust thematic detail on which to build other products. Verwijderd: 1. Level objects Verwijderd: 2. Level objects Verwijderd: Name Verwijderd: Hard bones Verwijderd: Soft bones Verwijderd: Method Verwijderd: Partition of landscape according to existing spatial data Verwijderd: Image segmentation Verwijderd: Persistent landscape features: Sentinel-2 complete time-stacks of one vegetation period Verwijderd: line Verwijderd:, urban outline, forest outline Verwijderd: Sentinel-2 complete timevegetation period Ancillary data used in the production Roads and railways (national datasets and/or OSM) Hydrography: WISE WFD and/or EU Hydro 40 P age

42 Thematic Content EAGLE Land Cover Components: 1 Sealed 2 Woody vegetation - coniferous 3 Woody vegetation - broadleaved 4 permanent herbaceous (i.e. grasslands) 5 Periodically herbaceous (i.e. arable land) 6 Permanent bare soil 7 Non-vegetated bare surfaces (i.e. rock and screes) 8 Water surfaces 9 Snow & ice Methodology The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step, the boundaries (hard bones) of relatively stable landscape objects are detected based on existing data sets for roads, railways and rivers (Level 1 objects). In a second step, image segmentation techniques based on multitemporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains a classification and spectraltemporal attribution (Sentinel-1 and -2) of the landscape objects derived from steps 1 and 2 using a pixel-based classification of the Land Cover components. The pixel based classification is generalized on object level using indicators like dominating land cover class and percentage mixture of land cover classes. Verwijderd: Special issue Verwijderd: Differentiation between linear networks as border of segments and (larger) linear networks that comprise polygons Verwijderd: Pre-processing (atmospheric and topographic) of S-2 images; cloud detection and cloud free observation in Nordic countries The third step contains an independent land cover classification of individual pixels. The fourth step assigns land cover class values (dominating class, percentages of classes, ) to the Level 2 polygons using the pixel map (Step 3). In addition, other attribution methods are used to characterize the polygon with spectral-temporal characteristic and indices (e.g. Sentinel-1 characteristics). Geometric resolution (Scale) ~1: Geographic projection / Reference system ETRS89-LAEA ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, etrs-laea/) 41 P age

43 Coverage EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia, Malta, Montenegro, the Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million km 2. Spatial resolution / Minimum Mapping Unit (MMU) Raster product: 100 m2 (10*10m 2 ) Vector product: 1 ha Min. Width of linear features Vector product: 20 m Topological correctness [TBD] Raster coding Encoding identical to [thematic detail] Target / reference year Geometric accuracy (positioning accuracy) Raster product: equals Sentinel-2 Vector product: Less than half a pixel based on 10 m spatial resolution input data Thematic accuracy (in %) / quality method The target will be set at 90 %. A quantitative approach will be used based on a set of stratified random point samples compared to external datasets (e.g. VHR, GoogleEarth, national orthophotos) (full vegetation season, starting partly in autumn 2017 in southern Europe) Up-date frequency [TBD] Information actuality Plus and minus 1 year from target year ( ). A longer time window may be required for the ancillary datasets. Verwijderd: Advantage Verwijderd: Independent data... source; [34] Verwijderd: Very good cost/benefit ratio; no doubling of efforts, reuse existing European geospatial data infrastructure; Verwijderd: Disadvantage Verwijderd: Actuality of data might be outdated; large vector processing facilities needed; effort to integrate and collect appropriate data Verwijderd: Reliability and reproducibility of segmentation algorithms 42 P age

44 Delivery format Raster product: Grid Vector product: Vector Data type Raster product: Grid Vector product: Vector Naming conventions [TBD] Medium [TBD] Delivery reliability [TBD] Delivery time [TBD] Archive [TBD] P age Verwijderd: Concerns... [35] Verwijderd: Verwijderd: <#>Level 1 Objects... [36] Verwijderd: <#>Lakes Verwijderd: <#> (as polygons, from... [37] Verwijderd: With the increasing... [38] Verwijderd: different polygons. Verwijderd: The actual cultivation... [39] Verwijderd:... [40] Verwijderd: <#>Appropriate size: Verwijderd: <#>Too large polygons:... [41] Verwijderd: <#>Shift of border:... max. [42] Verwijderd: <#>not more than... 10% [43] of Verwijderd: Thematic detail:... [44] Verwijderd: The descriptive... [45] Verwijderd: <#>Sealed surface... [46] Verwijderd: <#>permanent Verwijderd: <#>herbaceous (i.e.... [47] Verwijderd: <#>arable land) Verwijderd: <#>Permanent bare soil Verwijderd: <#>Non-vegetated... bare [48] Verwijderd: <#>Water surfaces... [49] Verwijderd: Member countries... [50] Verwijderd: A robust differentiation... [51] Verwijderd: This is on purpose... to [52] Verwijderd: <#>permanent... [53] Verwijderd: <#>Depending on... the [54] Verwijderd: <#> have up to several... [55] Verwijderd: A permanent and... [56] Verwijderd: Thus, a managed... [57]... [58]... [59]... [60]... [61]... [62]... [63]

45 5 CLC-CORE THE GRID APPROACH The CLC-Core product should be seen as an underpinning semantically and geometrically harmonized data container and data modeller that holds, or allows referencing of, land monitoring information from different sources in a grid-based information system. The grid database will essentially store geospatial and thematic data from sources such as CLC-Backbone, CLC, Local Component data, HRLs, in-situ as well as information provided by the MS. The contents of the grid database can be exploited directly within grid-based analyses or alternatively subsets of the grid database can be extracted and analysed by spatial queries driven by objects from an external vector dataset. In this way, the CLC-Core can support the other elements of the 2 nd generation CLC, for instance, a CLC-Backbone object could gain additional thematic information by aggregating or analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next chapter). 5.1 Background Textbooks usually differentiate between raster GIS and vector GIS (Couclelis , Congalton ). The vector GIS does, like the paper map, attempt to represent individual objects that can be identified, measured, classified and digitized. The raster GIS attempts to represent continuous spatial fields approximated by a partition of the land into small spatial units with uniform size and shape. The raster GIS therefore have structural properties that make them particularly suitable for use in modelling exercises. A grid is a spatial model falling somewhere between the vector model and the raster model. The grid looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is characterized by how much it contains of each land cover class or other information. The grid is, as a result, a more information-rich data model than the raster. The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units of e.g. in this case 1 X 1 km. The result can either be a raster where a single CLC class is assigned to each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class found at the centre of the pixel unit). Alternatively, the result is a grid where an attribute vector representing the complete composition of land cover classes inside the unit is assigned to each grid cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases lost, but the overall loss of information is less in the grid than in the raster. The loss of information when using a grid structure can be further reduced if the grid size is selected to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha Verwijderd:, Verwijderd: Member State provided Verwijderd: 15 Verwijderd: 17 Met opmaak: Lettertype:Niet Cursief Verwijderd: Figure Couclelis, H People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 77, Springer Berlin Heidelberg 18 Congalton, R.G Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: P age

46 (50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products derived from the grid-based information must also consider grid size in relation to its requirements. Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). daa is a Norwegian unit: 10 daa = 1 ha. A grid data model is not the result of mapping in any traditional sense. The grid is populated with thematic information from (one or more) existing sources (Figure 5-2). One method frequently used to populate grids with LULC information is to calculate spatial coverage of each LULC class based on a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes can furthermore involve counting or statistical processing of registered data for observations made inside each grid cell (for instance, not only the area of a class, but also the number of objects it represented). Ancillary databases, which provide important and relevant information about the content and context of the grid cells, can also be added. For instance, the number of buildings, length of roads or number of people found within the grid cells. Verwijderd: Verwijderd: Figure Populating the database The grid as a spatial model consists of square spatial units of uniform size and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are geometrically uniform polygons. Each grid cell has a unique ID that is related to a database containing the attribute data. 45 P age

47 In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see also chapter 4.4): Local Component data (Urban Atlas, N2000, Riparian Zones) High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness) CLC-Backbone Any other data, such as CLC (25 ha), MS data, Upon agreement, Member States are invited to provide also their national information to populate the information system, mainly on land use and agricultural themes. Verwijderd: in-situ Figure 5-2: Representation of real world data in the CLC-Core. 5.3 Data modelling in CLC-Core CLC-Core will be the effective engine to model, process and integrate the individual input data sets to create new, value added information. The modelling will make use of synergies (the same information provided for one cell based on different sources) as well as disagreements (different information provided for one cell by different sources) during the data processing. We would like to refer to the modelling and combination of different layers of the database, including the modelling parameters as an instance of the database. In this understanding, any output of a database modelling process using different input layers and different parameters will create a specific instance with associated metadata. Technically CLC-Core shall: Allow further characterisation (e.g. by additional input data / knowledge, see also chapter Populating the database). Verwijderd:. Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local component products. Allow the best possible transformation of National datasets into CLC+ (capturing the wealth of local knowledge existing in the countries). 46 P age

48 Storing information in a grid structure will also help to overcome the problem of different geometries: as the data are converted to grid cells, differences in vector geometries will have a lesser impact. 5.4 Database implementation approach for CLC-Core The implementation approach for the database of CLC-Core is based on paradigms originating from the semantic web and knowledge engineering. This opens up new possibilities beyond the classical object-relational database management system concept. In the following subsections, we describe the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL) concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we propose strategies to generate CLC-Core products i.e. flat shape-files or GeoJSON files based on a spatio-temporal Triple store Database concepts revisited: from Relational to NoSQL and Triple Stores The contemporary object-relational database model relies on the work of Codd (1970) 20. Although object-relational database management systems (ORDBMS) were developed further since the 1970s, their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the Semantic Web the requirements for databases have changed drastically, which led to the development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012) 23,24. ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational model defines how data are stored, retrieved and altered within databases base on n-ary relations on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an environment reasoning is done using a two-valued predicate logic. Data, stored utilizing the relational model, are viewed as tables, where each row represents an n-tuple of the relation itself. Each column is labelled with the name of the corresponding set (domain). Operations are performed using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are designed to adhere to the ACID principles atomicity, consistency, isolation and durability which are defined by Gray (1981) 26. ACID principles ensure that database transactions are performed in a reliable way. In contemporary ORDBMSs consistency is the central issue limiting performance and the ability to scale horizontally which is crucial for Web 2.0 or Big Data requirements. In addition, ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In addition, commercial and open-source ORDBMSs are mature products and offer a variety of options and extensions. Verwijderd: 19 Verwijderd: 21,22 Verwijderd: Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13 (6): 377. doi: / Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 24 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 26 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International Conference on Very Large Databases, pp , Cannes, France. 47 P age

49 The term NoSQL database emerged in 2009 (Evans, 2009) 28, as a name for a meeting of database specialists on distributed structured data storage in San Francisco on June 11, Since then the term has grown into a widely known umbrella term for a number of different database concepts (Friedland et al., 2011). NoSQL databases have the following characteristics in common: - non-relational data model - absence of ACID principles (especially consistency defined as BASE [eventual consistency and basically available, soft state eventually consistent]) - flexible schema - simple API s (no join operations) - simple replication approach - tailored towards distributed and horizontal scalability NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require the handling large data volumes with replication. In addition, NoSQL databases are designed to support horizontal scalability, which is necessary for handling big data with high turnover rates 30. Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo, Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011) 32, Sadalage & Fowler (2012)): - Column databases: have tables, rows and columns, but the column names and their format can change i.e. each row in a table can be different (Khoshafian et al., ; Abadi, ). Generally speaking, a wide column store can be regarded as 2D key-value store. Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google s Bigtable - Key-value databases: store data in the form of a key and an associated value (similar to a hash), and strictly no relations. Key value stores show exceptional performance in terms of horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g. OrientDB, Dynamo (Amazon), Redis, or Berkeley DB. - Document databases: rely on the document metaphor that implies that the data/information is contained in documents that share a given format or encoding. Thus, encodings like XML or JSON are used to represent documents. Each document can be schema free, which is a key advantage for Web 2.0 applications. Examples of document databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino. - Graph databases: utilize the notion of a mathematical graph structure for database purposes (Robinson et al., 2015) 38. A graph contains of nodes and edges that connect nodes. 28 Evans, E. (2017): NOSQL Url: visited: Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 32 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M. Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany: pp KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy for the decomposed storage model. In: ICDE, pp Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Url: visited: Nov. 08, Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data, O'Reilly Media Inc. Verwijderd: 27 Verwijderd: Non Verwijderd: 29 Verwijderd: 31 Verwijderd: 33 Verwijderd: 35 Verwijderd: P age

50 In a graph database, the data/information is stored in nodes which are connected via relationships, represented by edges. The relations can be annotated in a way that a graph database may contain semantic information. The concept allows modelling of real-world phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for populations. Graph databases are capable of storing semantics and ontologies especially in the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015) 40. They became very popular due to their suitability for social networking, and are used in systems like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013) Multi-modal: such databases are capable of supporting multiple NoSQL models/types. Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB. Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al., 2009, Heath & Bizer, 2011) 45,46. Originating from a web of documents, Bizer et al. (2009) describe an approach to develop a web of data (i.e. the data driven web) by publishing structured data such that data from different sources can be interlinked with typed links. Additionally, they formulate four characteristics of linked data: Verwijderd: 39 Verwijderd: 41 Verwijderd: 43,44 - They are published in machine-readable form - They are published in a way that their meaning is explicitly defined - They are linked to other datasets - They can be linked from other data sets In order to technically fulfil these characteristics, Burners-Lee (2009) 48 established a number of Linked Data principles: - Uniform Resource Identifiers (URI s) to denote things - HTTP URI s shall be used that things can be referred to and dereferenced - W3C standards like Resource Description Framework (RDF) should be used to provide information (Subject Predicate Object) - Data about anything should link out to other data, can contribute to the Linked Data Cloud RDF is a family of W3C specifications that was originally designed as metadata model. Over the years it has evolved to the general method to describe resources. RDF is based on the notion of making statements about resources which follow the form: subject predicate object, known as triple. Here the subject denotes the resource, the predicate represents aspects of the resource, and therefore is regarded as an expression of the relationship between the subject and object (see Figure 5-3). Verwijderd: 47 Met opmaak: Lettertype:Niet Cursief Verwijderd: Figure Lampoltshammer, T. J. & Wiegand, S. (2015). Improving the Computational Performance of Ontology-Based Classification Using Graph Databases Remote Sensing, vol. 7(7): Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern Association for Information Systems Conference, Atlanta, vol. 2324, pp Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability and web applications: emerging concepts, Heath, T. and Bizer, C Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1), Berners-Lee, T Linked Data - Design Issues. Online: visited: Nov. 09, P age

51 Figure 5-3: Example of an RDF triple (subject - predicate - object). To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema (RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage called Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples through semantic queries. Triplestores are specific graph databases, document oriented stores, or specific relational data stores that are designed for RDF data. A high number of available Triplestores is able to store spatial and temporal amended data. The following table gives an overview of the available triplestores (both open-source and commercial products). Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. Verwijderd: ): Product name Supports spatial data Supports temporal data Apache Jena Yes No Apache Marmotta Yes Yes OpenLink Virtuoso Yes Yes Oracle Yes Yes Parliament Yes Yes Sesame/RDF4J Yes No AllegroGraph No Yes Strabon Yes Yes (stsparql) GraphDB Yes No To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows to write queries that can use quantitative (typically proximity based) relations as well as qualitative relations (spatial on and temporal after, from ). Spatial and temporal extensions of SPARQL (such as stsparql and GeoSPARQL) supporting qualitative relations have been proposed but expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014) 50. Nevertheless, spatial queries, using GeoSPARQL can look as depicted in the GeoSPARQL query list 50 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In Temporal Representation and Reasoning (TIME), st International Symposium on (pp ). IEEE. Verwijderd: 49 Verwijderd: Error! Reference source not found.. In Error! Reference source not found., 50 P age

52 airports near the city of London. The result of the query is shown in Figure 5-5, the GeoSPARQL query list airports near the city of London is depicted. The result of the query is shown in Figure 5-4 the GeoSPARQL query list airports near the city of London is depicted. The result of the query is shown in a map metaphor and as JSON (Figure 5-5). One major feature of triplestores and SPARQL is the possibility to create federated queries. A federated query is a single query that is distributed to several SPARQL endpoints (capable of accepting SPARQL queries and returning results), that individually compute results. Finally, the originating SPARQL endpoint gathers the individual results, compiles them in one single output, and returns the output accordingly. This allows to query a large number of different data storages that can be geographically dispersed (e.g. over Europe). Verwijderd: is depicted. Met opmaak: Lettertype:Niet Cursief Verwijderd: Figure 5-5 Verwijderd: Figure 5-4 Verwijderd: Figure 5-5 PREFIX gn: < PREFIX xsd: < PREFIX spatial: < PREFIX geo: < SELECT?link?name?lat?lon WHERE {?link spatial:nearby( 'km').?link gn:name?name.?link gn:featurecode gn:s.airp.?link geo:lat?lat.?link geo:long?lon } Figure 5-4: GeoSPARQL query for Airports near the City of London. Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. Verwijderd: Figure P age

53 5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach The approach followed for the CLC-Core database can be based on a triplestore approach. This approach could exploit the advantages of NoSQL databases (especially triplestores) while overcoming the drawbacks of contemporary ORDBMS especially addressing horizontal scalability and consistency. Hence, we propose distributed triplestore approach, where each member country of the European Union could host their own CLC-core data in an RDF format. Each member country could host a triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and replication issues that might arise with contemporary ORDBMS. Hence, each member country can host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the member countries can be connected via federated queries, which requires only one SPARQL query at an endpoint at hand (i.e. at exactly one member country s CLC-endpoint) see Figure 5-6. If there are member countries that are not sure if they should operate a triplestore (and a SPARQL endpoint) with the CLC-Core data, they can also ask another country to host their data on their infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria could host the CLC+ data of another country if they e.g. don t have the resources to develop the infrastructure for hosting and publishing the data. Met opmaak: Lettertype:Niet Cursief Verwijderd: Figure 5-6 Verwijderd: as SPARQL query Federated SPARQL queries Verwijderd: <sp><sp><sp><sp><sp> <sp><sp><sp><sp><sp><sp><sp><sp><sp> <sp> Flow of result data Germany: CLC+ SPARQL France: CLC+ SPARQL Austria: CLC+ SPARQL Italy: CLC+ SPARQL Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint. Inevitable for such an approach is the development of an underlying shared semantical model i.e. an ontology. An abstract semantical model in RDFS or OWL can be developed that describes the abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This allows the utilization of the abstract semantic model in queries (semantic reasoning based on the developed knowledge basis). An example for semantic reasoning could be the determination of the land cover type, based on a number of properties of grid cells. 52 P age

54 5.4.3 Processing and Publishing of CLC-Core Products To generate CLC+ products based on the distributed triplestore architecture, we need to define the products requested by the customers. Based on the defined products we need to develop a migration/transition strategy describing the data flow from triplestores to the desired product. We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc queries and analyses, whereas customers would prefer a single flat file containing the CLC+ data. Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an automatic way by using a spatial Extract Transform and Load (ETL) tool like SAFE FME 52, geokettle 54 (open source), GDAL/OGR 56 (open source). The migration can be performed with e.g. different geographical or temporal coverage. These files can be hosted and published on a central server, where they are available for the public (see Fout! Verwijzingsbron niet gevonden.). Triple Store #1 Triple Store #2 Triple Store #n Spatial ETL Tool SHAPE File France SHAPE File Germany SHAPE File Austria GeoJSON France GeoJSON Germany GeoJSON Austria SHAPE File GeoJSON Finland Finland Web Server (public) Verwijderd: an Verwijderd: 51 Verwijderd: 53 Verwijderd: 55 Verwijderd: Error! Reference source not found. Verwijderd: <sp> Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture Conclusion and critical remarks The intended fine spatial and temporal resolution of CLC+ will result in high data volumes with high turnover rates (velocity) which are two characteristics of Big Data 58. Thus, it seems advisable to adhere to the nature of these datasets when planning a sustainable architecture for CLC+. As contemporary ORDBMS are not designed for Big Data applications, which manifests in their weaknesses regarding horizontal scaling, consistency and replication, we advise to have a deeper look into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to cope with the requirements of CLC+, with respect to data volume, data turnover rate, distributed architecture and semantic queries (and reasoning). Although, NoSQL datastores are widely used in business applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical foundation is still under active development. Hence, NoSQL databases do not reach the technological maturity of contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a weakness and an opportunity. By using NoSQL databases, this CLC+ is on the edge of the technological developments and is able to contribute to the future of NoSQL databases and to strengthen the spatio-temporal abilities of them. Verwijderd: Given the Verwijderd: +, which Verwijderd: Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy Review.".martinhilbert.net. visited: Nov. 08, P age

55 5.5 Short-form of CLC-Core end product technical specifications CLC-Core [Example image if available.] Description CLC-Core is a consistent, multi-use grid database repository for environmental information populated with a broad range of land cover, land use and ancillary data, forming the information engine to deliver and support tailored thematic information requirements. Input data sources (EO & ancillary) EO data used in the production No direct EO inputs. Ancillary data used in the production CLMS dataset MS datasets Other relevant environmental datasets Thematic Content EAGLE Data Model: TBD Methodology CLC-Core will be based on a grid spatial model which consists of square spatial units of uniform size and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are geometrically uniform polygons. Each grid cell has a unique ID that is related to a database containing the attribute data. In a first step, CLC-Core will be populated with existing thematic information from the CLMS, such as LoCo, HRL, CLC-Backbone, CLC, and any other data, such as insitu and MS data, mainly on land use and agricultural themes. Geometric resolution (Scale) n/a Geographic projection / Reference system ETRS89-LAEA ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, etrs-laea/) 54 P age

56 Coverage EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia, Malta, Montenegro, the Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million km 2. Geometric accuracy (positioning accuracy) Grid: Less than half a grid cell Spatial resolution / Minimum Mapping Unit (MMU) Grid: 1 ha Min. Width of linear features Grid: 100 m Topological correctness [TBD] Raster coding n/a Thematic accuracy (in %) / quality method To be explored as it will aggregate a disparate range of data types, products and sources. Target / reference year 2018 (full vegetation season, starting partly in autumn 2017 in southern Europe) Up-date frequency [TBD] Information actuality Plus and minus 1 year from target year ( ). A longer time window may be required for the ancillary datasets. Delivery format Grid Data type Grid Naming conventions [TBD] Medium [TBD] Delivery reliability [TBD] 55 P age

57 Delivery time [TBD] Archive [TBD] 56 P age

58 6 CLC+ - THE LONG-TERM VISION CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2 nd generation CLC to guarantee backwards compatibility with the original CLC datasets. That implies: addressing and overcoming the limitations of the spatial resolution of CLC; overcoming the problems created by the use of heterogeneous classes, while still allowing their creation in CLC-Legacy to maintain the landscape vision of CLC; a better discrimination of specific landscape features by introducing a limited number of new or sub-level classes; keeping the CLC nomenclature unchanged to a large degree, while allowing it to address known major shortcomings in the classical CLC nomenclature (by attribution and/ or introduction of subclasses); making use of a consistent and reproducible method of mapping changes at the 1 ha level for future land monitoring at a much higher spatial resolution than with the traditional CLC. Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m), created from CLC-Core, i.e. already including information from CLC-Backbone (e.g. pixel classification), existing HRL, LoCo and national data stored in CLC-Core. CLC+ shall be understood as one specific instance (i.e. realisation) of CLC-Core in which the stored information is aggregated to a specific nomenclature, using the geometry defined by the CLC-Core grid cells 59. The 2018 reference year provides a unique opportunity for creating both a traditional CLC update (by the MS) and a CLC-Legacy (via CLC-Backbone, CLC-Core and CLC+). The proposed nomenclature and structure of CLC+ will not deviate too much from the classical CLC to allow a comparison of both approaches. Obviously, many other land monitoring products with various nomenclatures and spatial frameworks may also be defined on the basis of CLC-Core in the future, also addressing the separation of land cover and land use (already foreseen in CLC-Core), the use of attributes in the object characterisation or the inclusion of ecosystem or habitat information. But this will be topic of future discussions and developments, not part of the current work. Despite the 1 ha MMU of CLC+, the information provided from the pixel based classification of CLC- Backbone and stored in CLC-Core will allow the reporting of fractions of 1 ha units to respond to more detailed (i.e. 0.5 ha) reporting obligations. Due to the higher geometric resolution, the use of some of the often-debated heterogeneous classes, like (complex cultivation pattern) and (land principally occupied by agriculture, 59 For this first realisation of CLC+ the vector geometry of CLC-Backbone is not used. Nonetheless, this could be considered as further option to be explored in the future. Verwijderd: CLC+ shall be understood as one instance (i.e. realisation) of the CLC- Core (possibly within the context of CLC- Backbone), enabling it to address various reporting obligations, such as the EU Energy Union, Paris Agreement, MAES and LULUCF to name just some examples. Many Verwijderd: Technically CLC+ is proposed as a raster data set with Verwijderd: spatial resolution (100 x 100 m), aggregated Verwijderd:. 57 P age

59 with significant areas of natural vegetation) will be significantly reduced, if not disappear altogether. Other conceptually heterogeneous classes like (agro-forestry) will remain as they also have a meaning at the 1 ha level. The CLC+ nomenclature should be formulated in a way that it is compatible with the traditional level-3 CLC classes, and takes into account user needs expressed e.g. in the CLC survey conducted by ETC-ULS in , former CLC production experience, as well as feasibility. Considering the frequency of classes in CLC and proposals from MS, the following are primary candidate classes for division into subclasses: 121 green energy industry separated 133 construction of artificial features and nature construction" 142 cemeteries and sport/recreation (subclasses e.g. sport arenas, sport fields and racecourses, golf courses, ski slopes, parks (outside urban fabric), allotment gardens, archaeological sites and museums, holiday villages (temporary residential areas), sport airfields) 123 agricultural grasslands (pastures, meadows) and non-agricultural grasslands under strong human impact 324 forests growth areas (succession, afforestation, reforestation) and decline of forests (clearcutting, damage) 512 natural and man-made Attributes that can be applied for a number of classes are proposed to have a priority in introduction to CLC+. These are e.g.: abandoned / out of use: can be applied for disused artificial sites / brownfields (121), uncompleted construction sites (133), suburban unused ruderal areas (231), salines (422). irrigated: all 2xx classes burnt: all 22x and 3xx classes presence of windmills: all 2xx, most 3xx classes and 523 As the original CLC nomenclature is not a pure land cover classification there will be the need to make use of more information than provided by the Copernicus Land Monitoring Service. Table 6-1 provides an overview of the traditional 44 CLC classes and information needed in addition to the information provided by the Local Components, the High Resolution Layers and the land cover information collected by CLC-Backbone in order to derive a specific CLC class. The new nomenclature will be demanding similar and probably even more ancillary information. The Hungarian test case 61 for implementing the EAGLE concept in a bottom up CLC production has clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input data on: Land Use (critical); Crops and use intensity (e.g. from LPIS) Habitat (vegetation composition) information. 60 Service Contract No 3436/R0-Copernicus/EEA Task 3: Planning of cooperation with EEA member and cooperating countries. FINAL report in preparation of the CLC2018 exercise. 61 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of on the European Environment Agency and the European Environment Information and Observation Network. Final report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country Verwijderd: more than Verwijderd:, Verwijderd: also Verwijderd: other Verwijderd: currently Verwijderd: which type of Verwijderd: would still be needed to derive a specific CLC class Verwijderd: ). Verwijderd: Obviously, the 58 P age

60 The situation is different for each CLC class and Table 6-1 clearly shows that many CLC classes cannot be derived without additional information on the land use aspect of the area. The current LoCo data are only partly able to mitigate this lack of land use information as they are only available in selected areas, not covering all of the European territory. In almost all cases this information gap can be closed with data from MS (databases or photointerpretation). In case MS data is not available, a few classes can be approximated via European level information, but most certainly at the expense of data accuracy and precision. The MS information should be provided as input to CLC-Core. In case of missing national land use information as well as no other European-level replacement data 62, traditional CLC data or production methods (e.g. photo interpretation) might need to be used as back-up solution. Verwijderd: also dependent on the available Copernicus data (i.e. UA, RZ or N2K), but Verwijderd: Table 6-12 Verwijderd: almost no Verwijderd: class can Verwijderd: Member States. Verwijderd: on Verwijderd: national Verwijderd: will Verwijderd: Areas with obvious contractions in land cover (from CLC- Backbone) vs. traditional OBS classes will need to undergo special post-processing during the phase of creation CLC+. 62 Wall-to-wall HRL can be used to address density issues or leaf types, but not they cannot be used as replacement information of land use. 59 P age

61 Table 6-1: List of current CLC classes and requirements for external information Verwijderd: The update cycle of CLC+ is proposed to be consistent with the update cycles of the major input data sets (i.e. the 3 year cycle of the LoCo or the 3-6 year cycle of CLC-Backbone). Apart from that, as CLC+ is just an instance of CLC-Core, it can be derived dynamically upon request. 60 P age

62 7 CLC-LEGACY The history of European land monitoring is a story of permanent evolution of approaches and concepts, resulting in different solutions for individual needs and specifications. In this context, the CLC specifications provided the first set of European-wide accepted de-facto standards, i.e. a nomenclature of land cover classes, a geometric specification, an approach for land cover change mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC represents a unique European-wide, if not global, consensus on land monitoring, supported by 39 countries. Being the thematically most detailed wall to wall European LC/LU dataset and representing a timeseries started in the early 1990 s CLC data became the basis of several analyses and indicator developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that the planned new European land monitoring framework (2 nd generation CLC) has to be able to ensure a possible maximum of backwards compatibility with standard CLC data. This comparability will be ensured by CLC-Legacy a data set with 25 ha MMU and the standard 44 thematic CLC classes, but derived by spatial and thematic generalisation from CLC+ (and/or CLC- Core, tbd). Considering today s already existing different CLC production methods, i.e. traditional visual photo-interpretation vs. national bottom-up solutions, shows that those specific methodologies result in CLC data with slightly different characteristics in terms of LC/LU statistics or fine geometric detail, but still respecting and adhering to the general CLC specifications. Although original CLC vector data are still used mainly for cartographic purposes, in practice the most commonly used versions of CLC data are CLC status and CLC change raster layers. That is why CLC-Legacy is also proposed as a 100 x 100m raster data set. As already mentioned in chapter Fout! Verwijzingsbron niet gevonden., 2018 offers the unique opportunity of comparing two different production approaches for CLC, i.e. the classical approach implemented by the MS (CLC2018) and a modernised approach making use of capacities from the Verwijderd: CORINE Land Cover ( Verwijderd: ) Verwijderd: with Verwijderd: 1990 years Verwijderd: a Verwijderd: system ( Verwijderd: -core & CLC+) Verwijderd:, i.e. CLC-Legacy Verwijderd: On Verwijderd: other hand, comparing the Verwijderd: even though Verwijderd: are kept. By a consequence, it should be specified in what sense the outputs of the new land monitoring... [64] Verwijderd: version Verwijderd: raster Verwijderd: Moreover, for the... [65] Verwijderd: <sp> Verwijderd:... [66] Verwijderd: foreseen to be ensured... [67] in Verwijderd: NOT to reproduce Verwijderd: much as possible original... [68] Verwijderd: resolution Verwijderd: layer, representing... the [69] Verwijderd: <#>CLC-Legacy will... be [70] Verwijderd: 6 61 P age

63 Copernicus service industry and MS. Comparing CLC2018 and CLC_Legacy_2018 will allow to assess the strengths and weaknesses of both approaches. Production method is, however, not the only source of variation. The specification of the nomenclature, combined with the variation in land cover and landscape across the European continent, is necessarily leading to large variability within each class and partial overlap between classes (Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to Sparsely vegetated areas in Norway and Moors and heathland in Sweden. Both classifications are correct.). Variation is inherent in the CLC methodology and variation due to the production method should not cause any major concern. Verwijderd: Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to Sparsely vegetated areas in Norway and Moors and heathland in Sweden. Both classifications are correct. Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to Sparsely vegetated areas in Norway and Moors and heathland in Sweden. Both classifications are correct. For the practical creation of a CLC-Legacy , the production process is facing a classical dilemma that is associated with a change of methodology: How to derive a CLC-Legacy 2018? The production process of CLC-Backbone, CLC-Core and CLC+ will only provide status information for 2018, not any changes relative to CLC2012. A possible solution to this dilemma (i.e. to map changes relative to the last update) is the approach used by Germany that moved from a traditional CLC mapping approach to an automated approach in 2012, using a status layer first approach instead of change mapping first. The result of that methodological change was a delineation of CLC2012 objects based on national high spatial resolution data with a different geometry than the previous CLC2006 objects. For an illustration, see also the areas marked in Figure 7-2. In a subsequent step it was then necessary to separate the differences between CLC2006 and the new CLC2012 into technical changes (due to the new mapping approach) and real changes. This Verwijderd: Figure The process of deriving CLC-Legacy from CLC-Backbone, CLC-Core and finally CLC+ should not be confused with the traditional creation of CLC2018 by the MS. 62 P age

64 step involved a mostly manual revision and the constitution of the change layer In subsequent update cycles it may be possible to derive changes from the new versions of CLC+, by creating a difference layer, then manually separating real changes from differences between datasets (resulting from differences in CLC Core input data). Additional manual change detection might be needed for changes that are not automatically detectable due to lack of up-to-date input information (e.g. changes of land use without land cover change). CLC2006 Figure 7-2: Result of changing the CLC implementation method in Germany CLC2012 For a future CLC2024 (assuming the 6 year update cycle) the change mapping approach could be the combination of the traditional change mapping first approach and the update first approach. Information from 1 ha CLC+2018 and CLC+2024 would be used to derive 1 ha changes with strong manual control, to separate differences from real changes. These 1 ha changes would be aggregated to 5 ha change objects (i.e. CLC-Changes ). The 5 ha change objects ( ) would then be added to CLC2018 to create a CLC2024. Methods and approaches for how to obtain 25 ha status layers and 5 ha changes from higher spatial resolution data exist already in a number of MS. 7.1 Experiences to be considered The following experiences can be considered for establishing the grounds for creating CLC-Legacy: CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC) system. The methodology was developed by ETC/ULS to create these layers, as well as for the raster generalization of CLC accounting layers 65. Methodologies for bottom-up creation of CLC data have been developed by several CLC national teams. This includes spatial and thematic aggregation of in-situ based high spatial resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain). In most cases, rules were developed to allow thematic conversion in an automated or semiautomated fashion. For spatial generalisation different techniques are available, depending on the starting point in each of the countries. Verwijderd: will Verwijderd: Generalization of adjusted CLC layers. ETC/ULS report P age

65 CLC+ national test case for Hungary 66 CLC Accounting Layers One of the strongest assets of CLC is its ability to provide consistent information on land cover changes and trends of the environment over multiple decades. Any new land monitoring framework therefore must ensure the continuity of the land cover change information, such as provided by the CLC Accounting Layers. Specific characteristics and the variety of the CLC creation and update methodologies have shown, that while CLC change data are well representing land cover changes between two certain reference dates, there are significant "breaks" in the continuity of CLC time-series both in statistical and in geometric sense. The solution applied for the harmonization of CLC time-series is based on the idea to combine CLC status and change information to create a homogenous quality time series of CLC / CLC-change layers for accounting purposes fulfilling the relation: CLC change = CLC accounting new status CLC accounting old status Additional criteria of the realization were: Add more detail to the latest CLC status layer (CLC2012) from previous CLCC information and use this "adjusted" layer as a reference Create previous CLC status layers by "backdating" of the reference, realized as subtracting CLCC based information for CLC2012 Based on the above principles, the working steps of the creation of adjusted CLC layers ( ) were as follows: 1 Include formation information from CLC-change layers into current CLC2012 status by creating adjusted CLC2012 layer a. Overwrite CLC2012 with first with code_2006 from CLC-change Intermediate result: A1_CLC2012 b. Overwrite F1_CLC2012 with code_2012 from CLC-change Result: A2_CLC Create adjusted CLC2006 by including consumption information (code 2006 from CLCchange ) into A2_CLC2012. Result: A1_CLC Create adjusted CLC2000 by including consumption information (code_2000 from CLCchange ) into A1_CLC2006. Result: A1_CLC2000 Due to the fact that the MMU is different for CLC status (25 ha) and CLC change (5 ha) layers, resulting CLC accounting layers may include at several locations more spatial detail, than original CLC status layers. Although this fact contradicts the original CLC specifications, still CLCaccounting layers are considered as the most consistent harmonized time-series of CLC data. CLC accounting layers are always created on the basis of the latest raster CLC status layer. Fine details are added to this layer from the formation codes of previous CLC-change inventories. All the former CLC accounting status layers are calculated backwards by overlaying CLC-change consumption codes. 66 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of on the European Environment Agency and the European Environment Information and Observation Network. FINAL REPORT: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country 64 P age

66 65 P age

67 Figure 7-3 to Figure 7-5 illustrate different techniques and results of national generalisation approaches to produce standard CLC from more detailed data. Verwijderd: Figure 7-13 Verwijderd: Figure 7-36 Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data. Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III Verwijderd: Verwijderd: 13 Figure 7-4: Generalization levels used in LISA generalization in Austria Verwijderd: 24 Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany Verwijderd: Verwijderd: P age

68 Verwijderd: Specific example [Example image if available.] Verwijderd: <#>TECHNICAL SPECIFICATIONS Verwijderd: <#> [Example image if available.] CLC2012 Generalized CLC, MMU = 25 ha High spatial resolution CLC, MMU = 20m cells Verwijderd: [Example image if Verwijderd: Description... [71] Verwijderd: Input data sources (EO & ancillary) Thematic Content Methodology Geometric resolution (Scale) Geographic projection / Reference system... [72] Verwijderd: Coverage... [73] Verwijderd: Geometric accuracy (positioning accuracy) Verwijderd: Less than half a pixel based on 10 m Verwijderd: input data Verwijderd: Spatial resolution / Minimum Mapping Unit (MMU) Min. Width of linear features Topological correctness Raster coding Thematic accuracy (in %) / quality method Target / reference year Up-date frequency Information actuality Delivery format Data type... [74] Verwijderd: Naming conventions Medium Delivery reliability Delivery time Archive Verwijderd: Pagina-einde... [75] 67 P age

69 Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC (top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution CLC illustrates well the potential laying in CLC+. 8 ANNEX 1 Verwijderd: : OSM DATA... [76] 68 P age

70 8 CLC NOMENCLATURE Verwijderd: LEVEL 2 69 P age

71 9 ANNEX 2: CLC-BACK BONE DATA MODEL The complete list of feature types within CLC-Backbone contain: INSPIRE o application schema: land cover vector Land cover dataset Land cover unit INSPIRE extension o Application schema: land cover extension (not endorsed) Parametric observation EAGLE o Land cover component Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects). Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover unit can consist of one or more land cover components (either in time and/or space), each of which with a specific lifespan. It is important to note that the parametric observation as defined in the (not endorsed) INSPIRE land cover extension application schema is an absolutely necessary part to attach specific observations (e.g. temporal profile of NDVI) to the objects (land cover units). INSPIRE main feature types: land cover dataset and land cover unit Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover extended application schema) 70 P age

72 EAGLE extension to land cover unit and new feature type land cover component Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component Each land cover unit can contain one or more so called parametric observations. According to INSPIRE these parametric observations are limited to percentage, countable and presence parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI, or vegetation trajectories) of a single date this parameter type is enlarged with the data type Indexparameter. Therefore, the CLC-Backbone data model includes the single date observations as ParameterType (with a specific observationdate and a measurement as real number (0 1 or 0.100). A new CLC- Backbone base model is shown in Figure 9-3. Verwijderd: Figure P age

73 Figure 9-3: The new CLC-Basis model (based on LISA) The new CLC-Backbone data model includes now three feature types: LandCoverDataset LandCoverUnit LandCoverComponent And two additional datatypes LandCoverUnit o ParameterType Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min, ) ObservationDate: date of the observation (e.g. acquisition date of satellite scene) IndexParameter: Value of e.g. NDVI_max, etc. to be stored here LandCoverComponent o Occurrence Cover percentage of a single land cover component within the LC Unit o Overlaying 1.if the LC component is overlaying another component (e.g. temporarily covered with water, snow) 0 default-value, if the component is not overlaying o validfrom: The time when the activity complex started to exist in the real world. o validto: The time when the activity complex no longer exists in the real world. Integration of temporal dimension 72 P age

74 9.1.1 Integration of temporal dimension into LISA As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for the adaptation of the model is that the land cover unit establishes the more or less stable geometry over time (>1 year continuity). Within the land cover unit several land cover components can exist with a specific life time. The land cover unit can be considered as stable container for these changing land cover components. Types of changes: Changes within a land cover unit is in general regarded as status change. We consider changes within the land cover components as cyclic changes. Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that will be defined within the products nr. 5. The implementation of the temporal dynamics is realized using a 1:many relation between the LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the specific component. Example A typical agricultural field the land cover unit could have the following LC components: Bare soil January March Herbaceous vegetation April-June Bare soil 2 nd half of June Herbaceaous vegetation July-September Bare soil 2 nd half of September Herbaceous vegetation October-December A web-based application within the GSE Cadaster Environment project (financed by ESA) is available under: to demonstrate the above-mentioned concept (Figure 9-4). Verwijderd: Figure P age

75 Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, Geoville 2017) Important note: It is not foreseen in the technical specifications that each temporal LC component is stored in the underlying CLC Base production database. The production chain may just provide the final classification of the LC unit in the final CLC-Backbone product, without giving information on the appearance of land cover components throughout Integration of multi-temporal observations into LISA Spectral profiles over time constitute a valuable information source (Figure 9-5), not only for the automated classification, but as well for the visual interpretation of object characteristic. For each CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.) can be stored in the data model using the parametricobservation. Verwijderd: Figure P age

76 Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model The sketch above illustrates the principles of the combined temporal information that is stored in the data model The graph above illustrates the variation within a year within a LandCoverUnit. The geometric borders of the unit are pre-fixed based on the Level 2 landscape object geometries o red points: single observations from each S-2 scene calculated on object level (Level 2 landscape polygon) From these observations (stored in the parametricobservation of the LandCoverUnit the complete temporal profile for the specific object can be reconstructed as graph and visually illustrated to domain experts. o Brown and green bars: lifespan of the single LandCoverComponents (green herbaceous vegetation and brown bare soil) within a Land Cover unit o Dark green bar: classification of the land cover unit according to the information of the sequential appearance of land cover components Within the LandCoverComponent the attributes validfrom and validto are used to store the life-time information of the component. If only the start, but not the end of a LandCoverComponent can be observed (e.g. due to missing observations in time), only the validfrom can be used. The element will automatically be set to invalid, if a new LandCoverComponent appears (with 100% coverage of the landcover unit). Verwijderd:... [77] 75 P age

77 10 ANNEX 3: OSM DATA 10.1 OSM road nomenclature The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1 delineation. In many countries, most roads are tagged as unclassified. Only highway = motorway/motorway_link implies anything about quality. (Source: CLC- Backbo ne Poten tial 20m width y wide highw ay y Key Value Comment highw ay y wide highw ay y y highw ay highw ay motorway trunk primary secondary tertiary A restricted access major divided highway, normally with 2 or more running lanes plus emergency hard shoulder. Equivalent to the Freeway, Autobahn, etc.. The most important roads in a country's system that aren't motorways. (Need not necessarily be a divided highway.) The next most important roads in a country's system. (Often link larger towns.) The next most important roads in a country's system. (Often link towns.) The next most important roads in a Verwijderd: Road types that are subject for visual control as they are large enough to be represented as area feature (>= 10m width) are marked in the second column. Verwijderd: 10m Verwijderd: 10m Verwijderd: 10m 76 P age

78 y y y highw ay highw ay highw ay unclassified residential service country's system. (Often link smaller towns and villages) The least most important through roads in a country's system i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets. (The word 'unclassified' is a historical artefact of the UK road system and does not mean that the classification is unknown; you can use highway=ro ad for that.) Roads which serve as an access to housing, without function of connecting settlements. Often lined with housing. For access roads to, or within an industrial estate, camp site, business park, car park etc. Can be used in 77 P age

79 conjunction with service=* to indicate the type of usage and with access=* to indicate who can use it and in what circumstanc es. Y wide highw ay y y y highw ay highw ay highw ay Link roads motorway_link trunk_link primary_link secondary_link The link roads (sliproads/ra mps) leading to/from a motorway from/to a motorway or lower class highway. Normally with the same motorway restrictions. The link roads (sliproads/ra mps) leading to/from a trunk road from/to a trunk road or lower class highway. The link roads (sliproads/ra mps) leading to/from a primary road from/to a primary road or lower class highway. The link roads (sliproads/ra mps) leading to/from a secondary road from/to a Verwijderd: y Verwijderd: Verwijderd: Verwijderd: Verwijderd: y Verwijderd: 10m 78 P age

80 y y highw ay Special road types highw ay tertiary_link track secondary road or lower class highway. The link roads (sliproads/ra mps) leading to/from a tertiary road from/to a tertiary road or lower class highway. Roads for mostly agricultural or forestry uses. To describe the quality of a track, see tracktype=*. Note: Although tracks are often rough with unpaved surfaces, this tag is not describing the quality of a road but its use. Consequentl y, if you want to tag a general use road, use one of the general highway values instead of track. Verwijderd: y Verwijderd: Verwijderd: Verwijderd: 79 P age

81 10.2 OSM roads: completeness Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh and Millard-Ball (2017) 67. A sigmoid curve was fitted to the cumulative length of contributions within OSM data and used to estimate the saturation level for each country. country ISO-Code frccomplete_fit OSM length [km] Serbia SRB 74,3% Bosnia and Herzegovina BIH 86,5% Turkey TUR 89,7% Sweden SWE 92,7% Norway NOR 93,6% Hungary HUN 93,8% Romania ROU 94,3% Ireland IRL 96,0% Spain ESP 96,2% Liechtenstein LIE 96,3% 359 Croatia HRV 96,8% Bulgaria BGR 96,9% Cyprus CYP 97,0% Albania ALB 97,6% Italy ITA 97,9% Belgium BEL 98,3% Lithuania LTU 98,5% Poland POL 98,5% Iceland ISL 98,6% France FRA 99,9% Denmark DNK 100,3% Estonia EST 100,3% Portugal PRT 100,4% United Kingdom GBR 100,4% Netherlands NLD 100,5% Greece GRC 100,8% Slovakia SVK 101,4% Austria AUT 101,4% Luxembourg LUX 101,5% Germany DEU 101,6% Latvia LVA 101,7% Montenegro MNE 102,2% Macedonia (FYROM) MKD 102,4% Switzerland CHE 103,0% Czech Republic CZE 104,0% Finland FIN 105,3% Kosovo XKO 107,4% Malta MLT 109,8% Slovenia SVN NoData Verwijderd: Verwijderd: ). 67 Barrington-Leigh C, Millard-Ball A (2017) The world s user-generated road map is more than 80% complete. PLOS ONE 12(8): e P age

82 11 ANNEX 4: LPIS DATA IN EUROPE The land parcel identification system is the geographic information system component of IACS (Integrated agricultural control system). It contains the delineation of the reference areas, ecological focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to different national priorities in the implementation of the CAP the LPIS-data are not homogeneous throughout Europe. Some countries use rather aggregated physical blocks as geometric reference and other countries use single parcel management units as smallest geographic entity within LPIS. Three types of spatial objects can in principal be differentiated within LPIS (Figure 11-1): Physical block Field parcel Single management unit Verwijderd: 2 Verwijderd: Figure A physical block is defined as the area that is surrounded by permanent non-agricultural objects like roads, rivers, settlement, forest or other features. A physical block normally contains several field management units that may belong to various farmers. A field parcel is owned by a single farmer and is the dedicated management unit for a specific agricultural main category. It contains either grassland or arable land, but never both categories. It may be split into several single sub-management units. A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow). The relation between field parcels and single management units is a 1:m relation, where 1 management unit (field) can contain 1 or many sub-management units (single parcels). Remark to the relation with the real estate database: Land properties are recorded in many countries in the real estate database. An elementary part of the real estate database is the cartographic representation of real estate parcel boundaries. These parcels should not be mixed up with the field parcels. The parcels in the real estate database constitutes owner ship rights and are documented as official boundaries. The boundaries in agricultural management units however are delineated to derive correct area figures as base for payments under the CAP. In most cases the parcel boundaries between the real estate database and the LPIS will be the same, but not in all cases. Verwijderd: field Verwijderd:. 81 P age

83 (a) Verwijderd: (b) (c) Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit From year to year the availability of national LPIS data is increasing. An overview of available and downloadable datasets has been created based on a previous ETC-ULS report by Synergise within the Horizon 2020 project LandSense (Figure 11-2). The minimum thematic detail that should be harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3) permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite heterogeneous classes on national level a harmonization is quite challenging, as they are applied quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any level could already provide valuable input given that LPIS is available for the large majority of countries. Unless a critical number of countries with available LPIS data is reached the inclusion of LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based on pure image segmentation. Verwijderd: 82 P age Verwijderd:

84 Figure 11-2: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Verwijderd:... [78] Verwijderd: The complete list of feature types within CLC-Backbone contain:... [79] Verwijderd: <#>... [80] Verwijderd:... [81] Verwijderd: Figure 12-1: INSPIRE main feature types: land cover dataset and land... [82] Verwijderd: -2: EAGLE extension to land cover unit and new feature type land... [83] Verwijderd: Figure [84] Verwijderd: -3: The new CLC-Basis model (based on LISA)... [85] Verwijderd: Figure 12-4).... [86] Verwijderd: -4: illustration of temporal NDVI profile and derived land... cover [87] Verwijderd: Spectral profiles over time constitute a valuable information... source [88] Verwijderd: ), not only for the automated classification, but as well for... the [89] 83 P age Verwijderd: Verwijderd: Verwijderd: Figure 12 Verwijderd: -5: Draft sketch to illustrate the principles of the combined... [90] Verwijderd: <#>ANNEX 5 CLC NOMENCLATURE... [91]

85 1 P age Processing and Publishing of CLC-Core Products Conclusion and critical remarks CLC+ - the long-term vision CLC-Legacy Experiences to be considered Technical specifications CLC-Backbone CLC-Core Annex 1: OSM data Crosswalk between OSM land use tags and CLC nomenclature Level OSM road nomenclature OSM roads completeness Annex 2: LPIS data in Europe Annex 3: illustration of Step 1 hardbone geometry of CLC-Backbone Annex 4: CLC-Backbone data model Integration of temporal dimension into LISA Integration of multi-temporal observations into LISA Annex 5 CLC nomenclature Pagina 4: [2] Verwijderd EAGLE 12/03/18 13:43 Figure 2-1: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC) Figure 2-2: A scale versus format schematic for the current and proposed CLMS products Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red UA, yellow - N2K, blue RZ LC) Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects based on this classification Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova), Romania (near Parta) and Serbia (near Indija) (top to bottom). Bing/Google and EOX Senitnel-2 cloud free services as Background (left to right) Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services using the ELF interface (Nov. 2017) Figure 4-5: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Figure 4-6: illustration of results of step 1 delineation of Level 1 landscape objects. The border defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data was available (not shaded) Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). daa is a Norwegian unit: 10 daa = 1 ha

86 2 P age Figure 5-2: Representation of real world data in the CLC-Core Figure 5-3: Example of an RDF triple (subject - predicate - object) Figure 5-4: GeoSPARQL query for Airports near the City of London Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data Figure 7-2: Generalization levels used in LISA generalization in Austria Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany Figure 7-4: Result of the generalisation process in Germany with technical changes in different classes Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit Figure 10-2: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component Figure 12-3: The new CLC-Basis model (based on LISA) Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, Geoville 2017) Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model Pagina 6: [3] Verwijderd EAGLE 12/03/18 13:43 Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC Table 2-2: Summary (matrix) of potential roles associated with each element / product Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which were analysed as potential input to support the construction of the geometric structure (Level 1 hard bones ) of the CLC-Backbone. The products finally proposed for further processing are marked in green Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data Table 6-1: List of CLC classes and requirements for external information Pagina 9: [4] Verwijderd EAGLE 12/03/18 13:43 process. Revised versions of this Pagina 12: [5] Verwijderd EAGLE 12/03/18 13:43

87 3 P age Figure 2-1: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC). Pagina-einde Pagina 24: [6] Verwijderd EAGLE 12/03/18 13:43 Step 1: The first level of object borders (skeleton) represent persistent objects ( hard bones ) in the landscape (Step 1). Step 2: On a second level a subdivision of the persistent landscape features (Level 1) will be achieved through image segmentation ( soft bones ), based multi-temporal Sentinel data within a defined observation period (resulting in Level 2 landscape features = polygons). Step 3: The output of CLC-Backbone the delineated objects represent spectrally and/or texturally homogeneous features that are further characterised and attributed using the EAGLE land cover component concept. The characterization of objects (segments) can be achieved using two options Attributing the segments using summary indicators based on a pixel-based classification of fairly simple land cover classes (e.g. dominating class, percentage mixture of classes for each segment) And / or attributing the segments based on spectral mean values of segments Pagina 26: [7] Verwijderd EAGLE 12/03/18 13:43 Sequence of production: Ideally, the production of CLC-Backbone and of the other Local Component data sets would be arranged in a sequential order that CLC-Backbone can build and

88 4 P age integrate on the most updated Local Component data to achieve best consistency among the layers. Realistically, the production will need to make use of selected elements of the existing Local Component layers and High Resolution Layers and of existing ancillary data that might be not identical with the reference year of the production. The time lag between input data and production reference year does not comprise a limiting factor. As the final delineation of polygons is achieved by image segmentation ( soft bones ) of Sentinel-2 images of the actual reference year, any derivation of the hard bone based geometry compared to the actual landscape structure (due to outdated data) will be compensated by the image segmentation step. Considering an annual land cover change of approx. 0,5% (on CLC spatial scale) the existing layers and dataset still be able to provide a reliable input that is adapted by the image segmentation step. Pagina 27: [8] Verwijderd EAGLE 12/03/18 13:43 Each object (delineated polygon) in CLC-Backbone will be encoded according to a quite basic land cover nomenclature (between 5-15 pure land cover classes, in line with EAGLE Land Cover Components) and additionally characterized depending on the user requirements - by a number of attributes (e.g. NDVI time series) giving more detailed information about the land cover and their dynamics inside the polygon. Pagina 28: [9] Verwijderd EAGLE 12/03/18 13:43 Sen2Cor algorithm identifies clouds and shadows in a singular scene-by-scene approach, whereas Maja (combination of MACCS (CNES/CESBIO) and ATCOR) identifies clouds and shadows according to the complete image stack over time. ESA will decide until end 2017 which algorithm and which data processing will be implemented for atmospheric correction. Current plans foresee Pagina 31: [10] Verwijderd EAGLE 12/03/18 13:43 other data will be used to attribute the thematic content of the GRID-database in CLC-Core. Geospatial data (especially Pagina 31: [11] Verwijderd EAGLE 12/03/18 13:43, buildings, LPIS and land cover/land use) are held on national level in many cases in higher level of detail. Some of the European data might be substituted by national data given that the criteria concerning technical and licensing issues as described in chapter 0 are met. Besides free and open data on European level also commercial data could be considered Pagina 31: [12] Verwijderd EAGLE 12/03/18 13:43 e.g. street data (e.g. Navmart HERE maps, etc.) or the TanDEM-X DEM for small woody features and structure information for tree covered areas. Sectie-einde (volgende pagina)

89 5 P age Pagina 32: [13] Verwijderd EAGLE 12/03/18 13:43 Copernicus land monitoring and other free and open Pagina 32: [13] Verwijderd EAGLE 12/03/18 13:43 Copernicus land monitoring and other free and open Pagina 32: [14] Verwijderd EAGLE 12/03/18 13:43 Pan-European CORINE Land Cover and accounting layers Roads and railways 25 ha status layer 5 ha change layer 100 m Vector outlines of basic vegetation types in remote areas without roads and settlement network Pagina 32: [15] Verwijderd EAGLE 12/03/18 13:43 25 ha status layer 5 ha change layer Pagina 32: [16] Verwijderd EAGLE 12/03/18 13:43 outlines of basic vegetation types in remote areas without roads and settlement network Pagina 32: [17] Verwijderd EAGLE 12/03/18 13:43 u u u HRL imperviousness 20 m pixel (0,04 ha) - Raster 20 m HR satellite imagery 2 HRL tree cover density 20 m pixel (0,04 ha) - Raster 2 HRL forest types 0,5 ha - Raster Minimum crown cover 10% 2 HLR Permanent water bodies 20 m - Raster 20 m HR satellite imagery 2 u European Settlement Map (ESM) 10*10 m pixel (0,01 ha) - Raster JRC one-off scientific product: 2.5 m VHR imagery; scientific product reference year 2012 GUF+ 10*10 m Raster S1 10m and Landsat 30m, GUF will be based on S1/S2 HRL Small woody feature 0,002 ha (raster product) Linear elements Vector and Raster (5m and 100m) Streamlining halted due to VHR 2015 concerns y HRL phenology 10 m - Raster Only attribution in CLC-Core 2 Local Components 2 f 2 2

90 6 P age Urban Atlas 0,25 ha 1 ha 10 m Vector Delineation of roads as polygon feature and outer border of settlement structure (large cities) Riparian Zone 0,5 ha 10 m Vector Delineation of rivers and roads as polygon feature. N2K product 0,5 ha 10 m Vector RPZ Green linear elements 0,05 ha 0,5 ha < 10 m width < 100 m length Vector Delineation of small woody vegetation elements National products Variety of products for land cover, land use, population, environmental variables Accompanying/Ancillary Layers IACS/LPISNational data portals 1 Varying from block to parcel level, depending on national systemnational reference data Aggregated data according to EAGLE data model line- Vector Freely available and accessible for less than 1/3 of Europe. Increasing availability from year-to-year due to INSPIRE regulation. Varying thematic content (from reference parcel only to detailed crop types)national INSPIRE data set provision; not harmonized European dataset, varying license conditions, varying levels of roads, varying information on tunnels i 2 u t m c Pagina 32: [18] Verwijderd EAGLE 12/03/18 13:43 Varying from block to parcel level, depending on national system Pagina 32: [19] Verwijderd EAGLE 12/03/18 13:43 Freely available and accessible for less than 1/3 of Europe. Increasing availability from year-to-year due to INSPIRE regulation. Varying thematic content (from reference parcel only to detailed crop types) Pagina 33: [20] Verwijderd EAGLE 12/03/18 13:43 European coastline Line Has been produced by Copernicus for HRL production. Crowd-sourced data (citizen observatories) n.a. Point Observations from citizens as promoted through Horizon 2020 research initiatives (e.g. LandSense, groundtruth 2.0) Pagina 34: [21] Verwijderd EAGLE 12/03/18 13:43 1 At the current stage not all geographic information is available via the national INPSIRE portals

91 7 P age level 1 hard bones in two different forms: As Pagina 34: [22] Verwijderd EAGLE 12/03/18 13:43 and rivers) As polygons that define a-priori Pagina 34: [23] Verwijderd EAGLE 12/03/18 13:43 Concretely we suggest the following use of existing COPERNICUS and ancillary data for the production of CLC-Backbone (i.e. mainly for the provision of geometric information) and CLC-Core (i.e. thematic related information). For more detailed information on the suggested production methodology, please refer to chapter 4.8. CLC-Backbone: Urban Atlas: Use of outer delineation of linear transport infrastructure (class 122xx roads and class 122) for Level 1 hardbone delineation Use of outer delineation of water areas (class water) for Level 1 hardbone delineation Use of outer delineation of cities as persistent segments (Level 1) for the delineation of settlement areas (class 111xx continuous urban fabric, class 112xx continuous urban fabric, isolated structures and class industrial and commercial units). Riparian Zones: Use of river delineation (class 911 interconnected water courses and 912 highly modified water courses and 913 separated water bodies) for Level 1 hardbone delineation. Use of outer delineation of cities as persistent segments (Level 1) for the delineation of settlement areas (class 1111 continuous urban fabric, class 1112 dense urban fabric, 1113 low density fabric, 1120 industrial and commercial units). Natura 2000 Using in analogy to RZ and UA the same classes for delineation of rivers, transport infrastructure and urban outline. HRL forest types: OSM roads Creation of tree mask based on the HRL forest types, as this layer already applies a useful tree cover density threshold (>30% TCD) and a MMU of 0,5 ha for the Level 1 hardbone delineation of forest boundaries.

92 8 P age EU-Hydro Roads > 10m: delineation of roads as polygons according to S-2 pixel based information Roads < 10m: transportation network outside the LoCo as a-priori border between adjacent polygons (Level 1 object) River network outside the LoCo as a-priori border between adjacent polygons (Level 1 object) HRL imperviousness Derivation of outer boundary of settlements (Level 1 objects) Refined with ESM European settlement map OSM buildings Enhancement of settlement mask using buffer methods to delineated and integrate OSM single buildings OSM land use Enhancement of road and railway infrastructure using polygon area from land use dataset with specific OSM tags (e.g. roads ); full OSM tag-list available in Annex 1 Enhancement of settlement mask in areas, where OSM-building is not complete, using polygon area from land use dataset with specific OSM tags (e.g. residential ); full OSM tag-list available in Annex 1 CLC-Core: National data (MS contribution) National datasets (generalized / resampled if not freely available in original resolution) Manual input by MS without readily available sufficient national datasets Land Use data Crop types Land Use intensity Specific data input to be defined for CLC-Core to maintain CLC-heritage and to improve thematic content (e.g. detailed LU, LC or habitat information). UA, RZ, N2K all HRLs Use of thematic information (aggregated to one single harmonized nomenclature) for populating the grid database. Use of thematic information (assigned to EAGLE land cover components) for populating the grid database. OSM land use

93 9 P age Attribution of land use according to existing cross-walk between OSM land use tags and CLC nomenclature Level 2 Crowd-sourced data Data from citizen observatories become increasingly available either as validation/verification dataset (e.g. LacoWiki) or as attributive data source (e.g. FotoQuest GO app, specific campaigns and social media like FlickR, foursquare, etc.) CLC accounting layers CLC Mineral extraction sites Data from CLC used for identification of thematic features not covered by any other (finer resolution) datasets, e.g. mineral extraction areas, airport, glaciers. Outlines of basic LC types in remote areas without road and settlement network and without forest EO parameters (from Sentinel-2 image stacks) This includes daily vegetation trajectories as well as key indicators for the characterization of the growing season (start day, end day) and biological productivity (maximum, amplitude, integral) Any geometric information ingested in Pagina 36: [24] Verwijderd EAGLE 12/03/18 13:43 Nova Pagina 36: [24] Verwijderd EAGLE 12/03/18 13:43 Nova Pagina 36: [25] Verwijderd EAGLE 12/03/18 13:43 Pagina-einde National data The use of national data replacing the foreseen European geospatial data for the creation of the CLC- Backbone could be considered if the national data fulfil the following prerequisites: Provision of a consistent set of information across European countries, i.e. similar scale and amount of geometric detail, comparable thematic information, Provision of data at no cost (or at least reasonable costs, if cost reduction in the overall process justifies this costs) Provision of data in a technically useful form (i.e. roads as a connected network, not as DXFlike line work for drawing).

94 10 P age Pagina 38: [26] Verwijderd EAGLE 12/03/18 13:43 Support of the free and open data policy of the end product, including the nationally provided input data. Deficiencies in European dataset concerning completeness and fitness for purpose in the respective country In case these prerequisites are fulfilled, the national data can be included based on an evaluation of economic (e.g. work load for creation of harmonized datasets) and technical criteria. Candidates for national data (to be used for Level 1 hard bone delineation) are the first of all datasets that fall under INSPIRE Annex I and should be available in harmonized version from November 2017 onwards: Transportation network (including information on tunnels) Buildings Water courses (running and standing water) Coastline Example: Pagina 39: [27] Verwijderd EAGLE 12/03/18 13:43 Example: roads Pagina 39: [28] Verwijderd EAGLE 12/03/18 13:43 First of all, it is still a challenge to find the appropriate dataset, and even specialised projects like ELF that would like to serve as a single-entry point do not manage to provide the full picture. In the case of roads according to the ELF-project only Iceland, Norway, Finland, Denmark and Czech Republic provide adequate services (Figure 4-4). It is very likely that other national services do exist, but cannot be found easily. This has to be considered, when replacing European Pagina 39: [29] Verwijderd EAGLE 12/03/18 13:43 existing data sources with national data.

95 11 P age Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services using the ELF interface (Nov. 2017). Example Pagina 39: [30] Verwijderd EAGLE 12/03/18 13:43 GIS data in the respective country / region.

96 12 P age Figure 4-5: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense. Specification for geometric delineation The vector geometry Pagina 39: [31] Verwijderd EAGLE 12/03/18 13:43 is derived using a two-step approach (Table 4-2). The first level ( hardbones ) is derived using the partition of landscape based on existing geospatial information (transport network, river network, urban outline) The second level ( softbones ) is derived by applying an automatic image segmentation within the first level objects The resulting polygons have to comply with the MMU and MWU and will be further attributed according to the thematic specification (chapter 4.7). Persistency in landscape is always a relative term, as landscape objects are subject to various temporal dynamics. Within the traditional CORINE Land Cover mapping approach changes were perceived as changes of land cover (change between two categories of the defined nomenclature) within a 6-year period such changes can be referred to as status changes, meaning a change between a specific number of years, but not the changes within a selected year. However, in parallel changes occur in much higher temporal frequency e.g. in agricultural management. Those changes are referred to as cyclic changes and represent frequent land cover changes mostly within one year or one vegetation cycle (e.g. on arable fields from bare soil to vegetated areas and vice versa). Due to the recent Pagina 39: [32] Verwijderd EAGLE 12/03/18 13:43 multi-temporal observations in the order of 1-2 weeks these kind of changes can be observed on an appropriate level of scale (e.g. farm management unit). Pagina 39: [33] Verwijderd EAGLE 12/03/18 13:43 Therefore, within CLC-Backbone those features of the landscape are defined as persistent objects that are very unlikely to be altered within one year. These structures (roads, settlements) are very unlikely to be removed, but may be subject to expansion (linear networks: roads, railways and rivers; physical boundaries like urban outline). A robust reference geometry ( hardbones ) built up of transportation network, river network (incl. lakes) and urban outline (and potentially forest outline) will form the first level of division of landscape based on a-priori data. Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) Pagina 42: [34] Verwijderd EAGLE 12/03/18 13:43 Independent data source; Clearly defined observation period Pagina 43: [35] Verwijderd EAGLE 12/03/18 13:43 Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres necessary

97 13 P age Pagina 43: [36] Verwijderd EAGLE 12/03/18 13:43 Level 1 Objects hardbones The idea behind the Level 1 objects ( hardbones ) is to derive a partition of landscape that is reflecting relatively stable (persistent) structures and borders. The borders of these landscape partition are built from anthropogenic and natural features. The transportation network like roads and railways (without tunnels) built the major component, followed by the partly natural and manmade structures of rivers and lakes. Other features like the borders of settlements and forest are not as persistent as roads and rivers, but give an additional partition of landscape that is fairly known through existing geospatial data. As geospatial data can be outdated depending on the reference year the hard bone partition of landscape will not be perfect. But it does not have to be perfect, as it is only an initial step and missing elements or correction of outdated polygons will be integrated within the second level (soft bones) of the production process, as the second level is based on actual and independent image segmentation. The reason why the a-priori partition of landscape is promoted here are a higher cost-benefit analysis. If an image segmentation is used from the very beginning (without the hard bone working step), much more efforts are needed to derive already known features. And some of the features (small roads) are not recognisable in a 10 by 10 m pixel scale thus leading to polygons that are not acceptable for most of the foreseen applications. When building the spatial framework, there is a long list of potential sources for persistent object geometries. Some can be derived from existing Copernicus products such as the HRL imperviousness, and Urban Atlas) or European reference layers (EU-Hydro) other additional data sources like crowsourced data (OSM) or non-operational scientific products (e.g. European settlement mask - ESM). National data may replace the named input data in case the criteria mentioned under chapter 0 are met. From the long list of input sources, the above-mentioned optimal selection (see Table 4-1) a most appropriate integration processes is selected, to provide the best available spatial framework at pan- European level. A limited visual inspection and correction of the input data in this phase is necessary for all data that do not comply with the requested accuracies (10m positional accuracy). As priority, the wider OSM roads (European major highways and roads wider than 10m) and the accuracy of the delineation of the local components has to be analysed Types of borders The first step will therefore be to build the Level 1 objects ( hard bones ) of the geometric borders from the ancillary data capturing the persistent features of the landscape such as road, rivers, railway lines etc. The information source for the hard bones will be existing vector data for roads, railroads (OSM) and river networks (CCM / Riparian Zones), supported by ancillary datasets (OSM, HRLimperviousness, ESM) and Copernicus Local components for outer boundaries of settlement areas and HRL for tree covered areas (HRL forest types). Technically speaking the first level objects will identify both persistent line objects and persistent polygon objects It is important to note that in this stage only geometric information is taken over from existing geospatial datasets, but without any thematic information. The information, if a specific border line constitutes a road, river or urban border is irrelevant for the further process in CLC-backbone, as the thematic classification is derived in the last step completely independently from any existing

98 14 P age geospatial input data, solely based on the spectral and textural information from the Sentinel-2 images. The majority of the transportation network (roads and railways) as well as smaller rivers are represented as line network, where each line identifies the centreline of the objects and present a border of a Level 1 landscape object. Pagina-einde Roads and rivers wider than the defined MMW of 10 m (still to be defined through the consultation process) have to be represented as polygons. Rivers: For wider rivers, the delineation in the Riparian Zone 2012 is considered complete and quite accurate within Europe, as the dataset covers all rivers above Strahler level 3 (and will be extended to Strahler level 2 until Q4/2018). Roads and railway: As a normal 2-lane road does not meet the requirement of being wider than 10m the selection of roads as polygons is restricted to the primary road infrastructure (highways). All OSM-roads with the special tags motorway and trunk have to be transformed from line features to polygon features using a mixed automated buffering method combined with a visual inspection and correction step based on the SENTINEL-2 data. Roads and railways within tunnels have to be excluded from the hardbone, as they do not establish a recognisable landscape feature on the ground. Beside the typical linear networks persistent features in the landscape are identified as settlement outside borders forest borders. Settlement borders: A best estimate of the outside borders of settlements can be derived in combining existing datasets first of all the LoCo Urban Atlas and outside of the coverage of the LoCO Urban Atlas using the HRL imperviousness (refined with the ESM) and in combination with a building-layer (e.g. OSM). Morphological operations are needed to adapt the combined input layer to comply with the MMUs defined in the CLC backbone product specification and to transform the mostly vector oriented input data to a raster dataset in line with the Sentinel-2 grid system. Forest borders: For the partition of landscape through forest borders all forest polygons from the LoCo product suite (UA, RZ, N2K) can be used. Outside of the LoCO the HRL forest types (FTY) layer will be used as it applies already a specific forest definition with a density threshold of >30% and a MMU of 0,5 ha. Hierarchical order of borders As the input borders partly overlap it is important to define a relative weighting and importance of the input layers. The ordering is derived from the quality, type and geometric representation (line or polygon) of the input data. The following order is specified (with increasing importance): HRL forest type (as polygon) LoCo forest classes (as polygon, from UA, RZ, N2K) Settlement outline (as polygon, from HRL imp, ESM, OSM) Road + railway (as line, excluding tunnels) Rivers (as line) Road + railway (as polygon)

99 15 P age from UA higher level streets from UA lower level streets Rivers (as polygons) from RZ (code 911) Pagina 43: [37] Verwijderd EAGLE 12/03/18 13:43 (as polygons, from EU-hydro, WFD, national data) Coastline (as line) AoI (as line) Level 2 Objects softbones THE AIM OF THIS SECOND LEVEL SEGMENTATION IS TO DERIVE OBJECTS THAT CAN BE DIFFERENTIATED SPATIALLY ACCORDING TO THEIR LAND COVER AND LAND USE ASPECTS THAT HAVE AN INFLUENCE ON THE SPECTRAL RESPONSE AND BEHAVIOUR THROUGHOUT A YEAR. 2 Pagina 43: [38] Verwijderd EAGLE 12/03/18 13:43 With the increasing number of observation dates per pixel throughout a year the likelihood to find homogeneous management units increases, but at the same time pixel based noise is increased as environmental conditions within a field parcel may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the number of acquisitions to maximize the identification of single field parcels and on the other side to reduce salt and pepper effects due to varying spectral response. THE LEVEL 2 LANDSCAPE OBJECTS REPRESENT IN THE IDEAL CASE A CONSISTENT MANAGEMENT OF FIELD PARCELS, AND/OR OBJECTS WITH A UNIQUE VEGETATION COVER AND HOMOGENEOUS VEGETATION DYNAMICS THROUGHOUT A YEAR. AS AN EXAMPLE, DIFFERENT SETTLEMENT STRUCTURES (E.G. SINGLE HOUSES OR BUILDING BLOCKS) SHOULD APPEAR SPECTRALLY DIFFERENT (DUE TO THE DIFFERENT LEVELS OF SEALING, SHAPE SPATIAL EXTEND AND SPATIAL BUILT-UP PATTERN) AND WILL THEREFORE BE DELINEATED AS 3 Pagina 43: [39] Verwijderd EAGLE 12/03/18 13:43 The actual cultivation measures applied on the land will also influence the type of delineation as e.g. in agriculture the field structure is visible in multi-temporal images due to the differences in temporal-spectral response according to the applied management practices on field level (mowing, ploughing, sowing). Pagina 43: [40] Verwijderd EAGLE 12/03/18 13:43 Accuracy of geometric delineation: The geometric accuracy is defined with Pagina 43: [41] Verwijderd EAGLE 12/03/18 13:43 Too large polygons: not more than 15% of all polygons Too small polygons: not more than 20% of all polygons

100 16 P age Any deviation of more than 20% of polygon area is considered too small or too large Appropriate delineation Pagina 43: [42] Verwijderd EAGLE 12/03/18 13:43 Shift of border: max. 20m shift Pagina 43: [43] Verwijderd EAGLE 12/03/18 13:43 not more than 10% of perimeters is shifted more than 20m Specification for thematic attribution The polygons (hard bones and soft bones) derived from the geometric specification are attributed according to a set of simple land cover classes. There are various options to attribute the thematic content of the polygons Option: Attribution with basic land cover classes based on a pixel-based classification (10m or 20m) of all pixels within the object Option: Attribution with basic land cover classes based on spectral properties of the object itself Option: combination of Option 1 and Option 2 Pagina-einde Option 1: pixel based classification Similar to the HRLs a pixel based classification of crisp land cover classes (no intensities) is conducted based on the SENTINEL-2 images. The number and type of land cover classes is discussed below. The object (polygon) is attributed with various statistics derived from the pixel based classification Dominating class (majority) Percentage of class i j Ordered list of class percentages (3-5 most dominating classes) Count of regions per class i j (one region is one spatially homogeneous area with only one class type) Option 2: Object based classification The objects (polygons) are classified using the same crisp land cover classes as above, but not on a per-pixel base, but based on the spectral values of the object itself. Statistical parameters like mean value, standard deviation and combination of bands (e.g. NDVI) are helpful to define the class. Option 3: combined pixel based and object based classification Combination of Option 2 and Option 3. In Option 3 a polygon is as well described using the pixelbased approach and in addition a classification of the polygon itself.

101 17 P age Thematic classes Accuracy: The accuracy for the thematic differentiation is defined with Option 1: 5 classes 90% pixel based overall accuracy 95% object based overall accuracy Option 2: 9 classes 85% pixel based overall accuracy 90% object based overall accuracy Pagina 43: [44] Verwijderd EAGLE 12/03/18 13:43 Thematic detail: AFTER OBTAINING THE HARD BONES AND SOFT BONES OF THE GEOMETRIC SKELETON, EACH SINGLE LANDSCAPE OBJECT IS THEN LABELLED BY SIMPLE LAND COVER CLASSES. 4 Pagina 43: [45] Verwijderd EAGLE 12/03/18 13:43 The descriptive nomenclature which is applied for the labelling contains the following EAGLE derived Land Cover Components: Option 1: 5 classes water, permanent vegetation, transient vegetation, bare ground / sealed surface, snow / ice Option 2: 9 classes Pagina 43: [46] Verwijderd EAGLE 12/03/18 13:43 Sealed surface (buildings and flat sealed surfaces) Woody coniferous Woody broadleaved Pagina 43: [47] Verwijderd EAGLE 12/03/18 13:43 5 herbaceous (i.e. grasslands) PERIODICALLY HERBACEOUS (I.E. Pagina 43: [48] Verwijderd EAGLE 12/03/18 13:43 Non-vegetated bare surfaces (i.e. rock and screes, mineral extraction sites) Pagina 43: [49] Verwijderd EAGLE 12/03/18 13:43 Water surfaces Snow & ice

102 18 P age Remark to class woody: Pagina 43: [50] Verwijderd EAGLE 12/03/18 13:43 MEMBER COUNTRIES HAVE EXPRESSED THE WISH TO FURTHER DIFFERENTIATE THE CLASS WOODY INTO TREES AND SHRUBS. THIS DIFFERENTIATION WOULD BE USEFUL BUT CAN CURRENTLY NOT BE DERIVED WITH SUFFICIENT ACCURACY FROM SPECTRAL ATTRIBUTES ALONE. 6 Pagina 43: [51] Verwijderd EAGLE 12/03/18 13:43 A robust differentiation is only possible by using a normalized difference surface model (ndsm). IN THE CLC-BACKBONE NOMENCLATURE CODE LIST NO LAND USE TERMINOLOGY HAS BEEN USED. 7 Pagina 43: [52] Verwijderd EAGLE 12/03/18 13:43 This is on purpose to avoid semantic confusions between land cover and land use. From a purist s view point only land cover can be derived from remote sensing data. However, due to the multitemporal observations a number of land use and land management characteristics can be derived as well. Nevertheless, for land cover semantic terms like grassland or arable land have to be avoided. We therefore suggest to characterize landscape objects (Level 2 polygons) according to the classes that are characterized by land cover definitions only. This is especially true for agricultural areas, where a differentiation of arable and grassland areas is principally a key element. What can be observed using RS images is however only if within one year a specific agricultural parcel is ploughed; as ploughing means a reduction of biomass and the occurrence of bare soil. Therefore, ploughed parcels can be differentiated from non-ploughed parcels as the bare soil count is a good indication for the separation of these two classes: Pagina 43: [53] Verwijderd EAGLE 12/03/18 13:43 permanent herbaceous Permanent herbaceous areas are characterized by a continuous vegetation cover throughout a year. No bare soil occurs within a year. These areas are either unmanaged or extensively managed natural grasslands or permanently managed grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or even set-aside land in agriculture. For managed grasslands, the biomass will vary over the year, depending on the number of mowing (grassland cuts) or grazing events. periodically herbaceous PERIODICALLY HERBACEOUS AREAS ARE CHARACTERIZED BY AT LEAST ONE LAND COVER CHANGE (IN THE SENSE OF EAGLE LAND COVER COMPONENTS) BETWEEN BARE SOIL AND HERBACEOUS VEGETATION WITHIN ONE YEAR. 8 Pagina 43: [54] Verwijderd EAGLE 12/03/18 13:43

103 19 P age DEPENDING ON THE MANAGEMENT INTENSITIES THESE AREAS CAN 9 Pagina 43: [55] Verwijderd EAGLE 12/03/18 13:43 have up to several changes between these two EAGLE land cover components within a year. Normally these areas are managed as arable areas. Pagina 43: [56] Verwijderd EAGLE 12/03/18 13:43 A PERMANENT AND MANAGED GRASSLAND AS DEFINED IN IACS/LPIS MAY BE SUBJECT TO BE PLOUGHED EVERY 3-5 YEARS FOR AMELIORATION PURPOSES FOLLOWED BY AN ARTIFICIAL SEEDING PHASE A RENEWAL OF VEGETATION COVER. 10 Pagina 43: [57] Verwijderd EAGLE 12/03/18 13:43 THUS, A MANAGED GRASSLAND MAY EVEN SHOW A PHASE OF BARE SOIL WITHIN A TIME FRAME OF 5-6 YEARS. 11 Pagina 43: [58] Verwijderd EAGLE 12/03/18 13:43 This has to be considered, when evaluating shorter time phases, as by occasion a managed grassland may just be subject to renewal within the observation period of 1 year. The differentiation from arable areas with much higher frequencies of ploughing events (either more than once a year, but on minimum 2 times within 3 years). All kind of complex classes either as mixture of existing land cover classes (e.g. CLC-mixed classes) or complex classes in the sense of ecological highly diverse classes (e.g. wetlands) are avoided by the classification, as it solely concentrates on the classification according to the surface structure and properties. A detailed technical description how to model the temporal changes is given in Annex 1. Option to be discussed: In addition to the class labels the following spectral attributes are attached to the landscape objects (in the data model: parametric observations): Vegetation trajectories NDVI profiles (e.g. weekly observations) within the vegetation season Aggregated indicators for vegetation indices (within vegetation season) Median of NDVI 20 % percentile 80% percentile Further indicators tbd. Band specific indicators Time specific indicators Texture indicators Pagina 43: [59] Verwijderd EAGLE 12/03/18 13:43

104 20 P age Important note: AS THE MAIN EMPHASIS IS ON THE GEOMETRIC DELINEATION OF LANDSCAPE OBJECTS NEIGHBOURING OBJECTS MAY 12 Pagina 43: [60] Verwijderd EAGLE 12/03/18 13:43 HAVE THE SAME LAND COVER CODE. THEY SHOULD BE KEPT AS UNIQUE ENTITIES, AS THEY CAN BE DIFFERENTIATED IN SPECIFIC ATTRIBUTES THAT MAY BE ADDED IN A LATER STAGE TO THIS POLYGON (E.G. 13 Pagina 43: [61] Verwijderd EAGLE 12/03/18 13:43 needle-leaved coniferous trees). APART FROM THE TECHNICAL SPECIFICATIONS FOR THE FORESEEN TENDER FOR CLC-BACKBONE ADDITIONAL ATTRIBUTION BY VARIOUS DATA SOURCES IS AN OPTION FOR FUTURE ENRICHMENT OF CLASSES. FIRST OF ALL, MEMBER COUNTRIES CAN BE ASKED TO POPULATE THE RESULTING GEOMETRIES WITH NATIONAL DATA FOLLOWING THE PRESCRIBED CLASSIFICATION SYSTEM OR EVEN MORE DETAILED EAGLE LAND COVER COMPONENTS. 14 Pagina 43: [62] Verwijderd EAGLE 12/03/18 13:43 In addition, crowd-sourced data may be used to characterize the land use within these polygons (either by in-situ point observations or by wall-to-wall datasets e.g. OSM land use). Pagina 43: [63] Verwijderd EAGLE 12/03/18 13:43 Proposal for production methodology Within this consultation version (Nov. 2017) of the CLC-Backbone specifications an outline of a possible production methodology is included to illuminate the processing that is envisaged to generate a detailed geometric structure as a geometric vector skeleton which is subsequently described according to a simple land cover classification (approx. 10 classes) on a per-pixel level. THE DETAILED GEOMETRIC STRUCTURE IS DERIVED VIA A HIERARCHICAL 2- STEP APPROACH. IN A FIRST STEP (HARDBONE) BOUNDARIES OF RELATIVELY STABLE AREAS ARE DERIVED BASED ON EXISTING (EUROPEAN 15 Pagina 61: [64] Verwijderd EAGLE 12/03/18 13:43 are kept. By a consequence, it should be specified in what sense the outputs of the new land monitoring system may and should be consistent with CLC-legacy. Pagina 61: [65] Verwijderd EAGLE 12/03/18 13:43 Moreover, for the purposes of LEAC analysis CLC information have been transformed to the form of CLC Accounting Layers. Pagina 61: [66] Verwijderd EAGLE 12/03/18 13:43

105 21 P age The backwards compatibility of CLC+ with CLC-legacy Pagina 61: [67] Verwijderd EAGLE 12/03/18 13:43 foreseen to be ensured in a similar way. This method is already applied in reality, since some of the member states are producing CLC layers in a bottom-up way and these data are then combined with manually interpreted CLC-changes. The applied methods of bottom-up CLC creation are different, but variations of the basic idea CLC+ relies on. In the sense of the above, the aim of CLC- Pagina 61: [68] Verwijderd EAGLE 12/03/18 13:43 much as possible original 25 ha MMU vector CLC as it would be if created continuously by visual photo-interpretation. Instead of this, CLC-legacy will be created by considering the following aspects: CLC-Legacy is to be created as a Pagina 61: [69] Verwijderd EAGLE 12/03/18 13:43 layer, representing the CLC (like) status at a certain reference date Pagina 61: [70] Verwijderd EAGLE 12/03/18 13:43 CLC-Legacy will be derived from CLC-Core (and potentially CLC+) in a way, which ensures homogenous and consistent content and quality across Europe. Note, several CLC layers are already derived based on similar principles in countries using bottom-up methods; all the outputs correspond to basic CLC specifications, but have slightly different characteristics in terms of geometry and statistics. Instead of 25 ha MMU vector CLC, CLC accounting layers are considered as reference layers concerning targeted content: Applying standardized raster generalization on CLC-core outputs to reach 5ha MMU (instead of 25ha), corresponding to CLC-change layers. Less significance of complex CLC classes, like 243. Standard methodology is to be developed to reproduce relevant information contained by some other complex CLC classes, like 244. CLC status layers of past reference dates (2012, 2006, 2000,..) are to be created by combining CLC-legacy with previous CLC-change data. The above concept assumes, that in the future, besides of the update of CLC-core and CLC+ status, CLC compatible change information is to be derived as well. Obviously, there are several methodological and technical details to be clarified and solved during upcoming practical test exercises. Pagina 67: [71] Verwijderd EAGLE 12/03/18 13:43 Description CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format providing a geometric backbone with limited, but robust thematic detail on which to build other products. Pagina 67: [72] Verwijderd EAGLE 12/03/18 13:43

106 22 P age Input data sources (EO & ancillary) EO Overview Spatial resolution m Spectral content Visible, NIR, SWIR, SAR Temporal revisit Bi-Monthly to achieve monthly cloud free acquisitions Acquisition approach likely to be multi-mission Candidate missions Sentinel-2 (Landsat 8). Contributing data Sentinel-1 Ancillary data OSM roads and railways (or comparable national datasets) EU Hydro (or WISE WFD) European coastline OSM buildings OSM land use HRL imperviousness (refined with ESM) LoCo Riparian Zones LoCo Urban Atlas Thematic Content EAGLE Land Cover Components (Option 2 9 classes): Sealed Woody vegetation - coniferous Woody vegetation - broadleaved permanent herbaceous (i.e. grasslands) Periodically herbaceous (i.e. arable land) Permanent bare soil Non-vegetated bare surfaces (i.e. rock and screes) Water surfaces Snow & ice Methodology The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step (hardbone) boundaries of relatively stable larger areas are detected based on existing data sets. In a second step image segmentation techniques based on multi-temporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains a classification and spectral-temporal attribution (Sentinel 1 and 2) of the landscape objects derived from steps 1 and 2 using a pixel-based classification of the Land Cover components. The pixel based classification is generalized on object level using indicators like dominating land cover class and percentual mixture of land cover classes.

107 23 P age Geometric resolution (Scale) ~1: Geographic projection / Reference system European Terrestrial Reference System 1989 (ETRS89) Pagina 67: [73] Verwijderd EAGLE 12/03/18 13:43 Coverage EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia, Malta, Montenegro, the Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million km 2. Pagina 67: [74] Verwijderd EAGLE 12/03/18 13:43 Spatial resolution / Minimum Mapping Unit (MMU) 0.5 to 5 ha for final product Min. Width of linear features m Topological correctness [TBD] Raster coding [n/a] Thematic accuracy (in %) / quality method The target will be set at 90 %. A quantitative approach will be used based on a set of stratified random point samples compared to external datasets (e.g. GoogleEarth, national orthophotos or national grassland datasets). Target / reference year 2018 (full vegetation season, starting partly in autumn 2017 in southern Europe) Up-date frequency [TBD] Information actuality Plus and minus 1 year from target year ( ). A longer time window may be required for the ancillary datasets. Delivery format Vector Data type Vector Pagina 67: [75] Verwijderd EAGLE 12/03/18 13:43

108 24 P age Pagina-einde CLC-Core Pagina 68: [76] Verwijderd EAGLE 12/03/18 13:43

109 : OSM DATA 25 P age

110 CROSSWALK BETWEEN OSM LAND USE TAGS AND 26 P age

111 27 P age 16 Pagina 75: [77] Verwijderd EAGLE 12/03/18 13:43 Source: Schulz et al (2017): Open land cover from OpenStreetMap and remote sensing, Int J Appl Earth Obs Geoinformation 63 (2017) Pagina-einde Pagina 83: [78] Verwijderd EAGLE 12/03/18 13:43

112 28 P age ANNEX 3: ILLUSTRATION OF STEP 1 HARDBONE GEOMETRY OF CLC- BACKBONE

113 29 P age