Technical specifications for implementation of a new land-monitoring concept based on EAGLE

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Technical specifications for implementation of a new land-monitoring concept based on EAGLE"

Transcription

1 EEA/IDM/R0/17/003 Technical specifications for implementation of a new land-monitoring concept based on EAGLE D3: Draft design concept and CLC-Backbone, CLC-Core technical specifications, including requirements review. Version

2 Version history Version Date Author Status and description Distribution /10/2017 S. Kleeschulte, G. Banko, G. Smith, S. Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin-Pihlatie /11/2017 S. Kleeschulte, G. Banko, G. Smith, S. Arnold, J. Scholz, B. Kosztra, G. Maucha Reviewers: G-H Strand, G. Hazeu, M. Bock, M. Caetano, L. Hallin-Pihlatie Draft for review by NRC LC Updated draft integrating the comments following the NRC LC workshop EEA, NRC LC For distribution at Copernicus User Day 1 P a g e

3 Table of Contents 1 Context Introduction Background Concept Role of industry Potential role of Member States Engagement with stakeholder community Requirements analysis CLC-Backbone the industry call for tender Spatial scale / Minimum Mapping Unit Reference year EO data Pre-processing of Sentinel-2 data Copernicus and European ancillary data sets Completeness OSM road data National data Example: Rivers and lakes Example: roads Example: land parcel identification system (LPIS) Specification for geometric delineation Level 1 Objects hardbones Level 2 Objects softbones Accuracy of geometric delineation: Specification for thematic attribution Thematic classes Proposal for production methodology Step 1: Definition and Selection of persistent boundaries Step 2: Image segmentation CLC-Core the grid approach Background Populating the database Data modelling in CLC-Core Database implementation approach for CLC-Core P a g e

4 5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores CLC-Core Database: spatio-temporal Triple Store Approach Processing and Publishing of CLC-Core Products Conclusion and critical remarks CLC+ - the long-term vision CLC-Legacy Experiences to be considered Technical specifications CLC-Backbone CLC-Core Annex 1: OSM data Crosswalk between OSM land use tags and CLC nomenclature Level OSM road nomenclature OSM roads completeness Annex 2: LPIS data in Europe Annex 3: illustration of Step 1 hardbone geometry of CLC-Backbone Annex 4: CLC-Backbone data model Integration of temporal dimension into LISA Integration of multi-temporal observations into LISA Annex 5 CLC nomenclature P a g e

5 List of Figures Figure 2-1: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC) Figure 2-2: A scale versus format schematic for the current and proposed CLMS products Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red UA, yellow - N2K, blue RZ LC) Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects based on this classification Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova), Romania (near Parta) and Serbia (near Indija) (top to bottom). Bing/Google and EOX Senitnel-2 cloud free services as Background (left to right) Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services using the ELF interface (Nov. 2017) Figure 4-5: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Figure 4-6: illustration of results of step 1 delineation of Level 1 landscape objects. The border defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data was available (not shaded) Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). daa is a Norwegian unit: 10 daa = 1 ha Figure 5-2: Representation of real world data in the CLC-Core Figure 5-3: Example of an RDF triple (subject - predicate - object) Figure 5-4: GeoSPARQL query for Airports near the City of London Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data Figure 7-2: Generalization levels used in LISA generalization in Austria Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany Figure 7-4: Result of the generalisation process in Germany with technical changes in different classes Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit68 Figure 10-2: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component Figure 12-3: The new CLC-Basis model (based on LISA) Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, Geoville 2017) P a g e

6 Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model List of Tables Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC Table 2-2: Summary (matrix) of potential roles associated with each element / product Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which were analysed as potential input to support the construction of the geometric structure (Level 1 hard bones ) of the CLC-Backbone. The products finally proposed for further processing are marked in green Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data Table 6-1: List of CLC classes and requirements for external information P a g e

7 1 CONTEXT The European Environment Agency (EEA) and European Commission DG Internal Market, Industry, Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual strategy and associated technical specifications for a new series of products within the Copernicus Land Monitoring Service (CLMS) portfolio, which should meet the current and future requirements for European Land Use Land Cover (LULC) monitoring. These products are nominally called the "2 nd generation CORINE Land Cover (CLC)". After a call for tender in 2017, the EEA has tasked the EIONET Action Group on Land monitoring in Europe (EAGLE Group) with developing an initial response to fulfilling these needs through a conceptual design and technical specifications. The approach adopted represents a sequence of development stages, where separate single elements of the whole concept could be developed relatively independently and at different rates, to allow time for broad consultation with stakeholders. This also allowed the inclusion of MS input, the exploitation of industrial production capacity, and the necessary feedback, lessons learnt and refinement to reach the ultimate goal of a coherent and harmonized European Land Monitoring Framework. The first stage of this process outlined the conceptual strategy and proposed a draft technical specification for the first product (CLC-Backbone) to be developed. The first stage also involved a presentation of the concept to the EIONET NRCs Land Cover and the collection of the NRCs feedback, which were then carefully taken into consideration for the continuation of the process into the second stage. The first stage was undertaken within the constraints outlined by EEA and DG GROW for the initial product: Industrial production by service providers, Outcome product in vector format, Highly automated production process, Short timeframe of production phase, Driven by Earth Observation (Sentinel-2), Complete the picture started by the Local Component 1 products (EEA-39). This document represents the second stage of development to further expand on the conceptual strategy and to extend the current technical specifications, to propose additional follow-up products in more detail and to continue to communicate these to the stakeholders involved in the field of European land monitoring. This second stage also aims at enlarging the circle of stakeholders to all interested Copernicus users and beyond. It is vital for the success of the project and the long-term evolution of land monitoring in Europe that all the relevant stakeholders communicate their requirements and opinions to this process. Revised versions of this document are to continue to be produced at specific milestones towards a final version in early The term component has a twofold function, 1) as the Copernicus local component products, 2) as a term for Land Cover Component as an element within the EAGLE data model 6 P a g e

8 2 INTRODUCTION This chapter provides the background to the concept, an outline of the proposed approach to be adopted, an overview of the elements of the concept and the current status of the developments. 2.1 Background Monitoring of Land Use and Land Cover (LULC) and their evolving nature is among the most fundamental environmental survey efforts required to support policy development and effective environmental management 2. Information on LULC play a key role in a large number of European environmental directives and regulations. Many current environmental issues are directly related to the land surface, such as habitats, biodiversity, phenology and distribution of plant species, ecosystem services, as well as other issues relevant to climate change. Human activities and behaviour in space (living, working, education, supplying, recreation, mobility & communication, socializing) have significant impacts on the environment through settlements, transportation and industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The land surface, who s negative change of state can only if at all be reversed with huge efforts, is therefore a crucial ecological factor, an essential economic resource, and a key societal determinant for all spatially relevant basic functions of human existence and, not least, nations sense of identity. Land thus plays a central role in all three factors of sustainable development: ecology, economy, and society. LULC products so far tend to be produced independently of each other at the global, European, national and sub-national levels, each of them focussing on similar but still different emphasis of thematic content. Such diversity leads to reduced interoperability and duplication of work, and thereby, inefficient use of resources. At the European level CORINE Land Cover (CLC) is the flagship programme for long-term land monitoring and is now part of the Copernicus Land Monitoring Service (CLMS). CLC has been produced for reference years of 1990, 2000, 2006 and 2012, with 2018 under preparation and expected to be available by late The CLC specification aims to provide consistent localized geographical information on LULC using 44 classes at level-3 in the nomenclature (see Annex 5). The vector databases have a minimum mapping unit (MMU) of 25 ha and minimum feature width (MFW) of 100 m with a single thematic class attribute per land parcel. At the European level, the database is also made available on a 100 x 100 m and 250 x 250 m raster which has been aggregated from the original vector data at 1: scale. CLC also includes a change layer which records changes between two of the 44 thematic classes with a MMU of 5 ha. Although CLC has become well established and has been successfully used, mainly at the pan- European level, there are a number of deficiencies and limitations that restrict its wider exploitation, particularly at the Member State (MS) level and below. This is partly due to the fact that many MS have access to more detailed, precise and timely information from national programs, but also to the fact that the MMU of CLC (25 ha) is too coarse to capture the fine details of the landscape at the local and regional scales. In consequence, all features of smaller size that represent landscapes diversity and complexity are not mapped either because of geometric generalization or because they are absorbed by thematically mixed classes with very broad definitions. Moreover, changes of features and landscape dynamics that are highly relevant to locally decided but globally effective policy, such as small-scale forest rotation, changes in agricultural practices and urban in-filling, may be missed due to low spatial resolution and/or thematic depth of CLC class definitions. The successful use of CLC in combination with higher spatial resolution products to detect and document 2 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project. 7 P a g e

9 urban in-filling has given a clear indication of the required direction of development for European land monitoring. To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide pan-european information on selected surface characteristics in a 20 m raster format, also available aggregated to 100 m raster cells. They provide information on basic surface properties such as imperviousness, tree cover density and permanent grassland and can be described as intermediate products. The LoCo products in vector format are based in part on very high spatial resolution EO data tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide detailed thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha. However, the LoCo products altogether do not provide wall to wall coverage of the EEA-39 countries, even when combined. Their nomenclatures are not harmonised and thus cause interoperability problems. 2.2 Concept Given the above issues a revised concept for European Land Monitoring is required which both provides improved spatial and thematic performance and builds on the existing heritage. Also, recent evolutions in the field of land monitoring (i.e. improved Earth Observation (EO) input data due to e.g. Sentinel-programme, national bottom-up approaches, processing methods, additional reporting requirements, INSPIRE directive, etc.) and desktop and cloud-computing capacities offer opportunities to deliver these improvements effectively and efficiently. The EEA in conjunction with DG GROW has identified this need and now aims to harmonise and integrate some of the CLMS activities by investigating the concept and technical specifications for a higher performance pan- European mapping product under the banner of "2 nd generation CLC". It was determined that such a 2 nd generation CLC should build upon recent conceptual development and expertise, while still guaranteeing backwards compatibility with the conventional CLC datasets. Also, the proposed approach should be suitable to answer and assist the needs of recent evolution in European policies like reporting obligations on land use, land use change and forestry (LULUCF), plans for an upcoming Energy Union or long-term climate mitigation objectives. After a successful establishment of 2 nd generation CLC there would then also be knock-on benefits for a broad range of other European policy requirements, land monitoring activities and reporting obligations. The proposed conceptual strategy consists of a number of interlinked elements (Figure 2-1) which stand for separate products and therefore can be delivered independently in separate stages. Each of the products has its own technical specification and production methodology and can be produced through its own funding / resourcing mechanisms. Throughout the documents delivered by this project the conceptual design given in (Figure 2-1) will be used as a key graphic to identify the product and stage being described. Figure 2-1: Conceptual design for the products and stages required to deliver improved European land monitoring (2 nd generation CLC). 8 P a g e

10 The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is based on the four elements shown in Figure 2-1. The reports associated with this work will expand incrementally on the conceptual design and provide technical specifications of increasing elaboration for the products. Although the four elements / products will be described in more detail later in this, and subsequent, reports they can be summarised as follows: 1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a geometric spatial structure for landscape features with limited, but robust EO-based land cover thematic detail on which to build other products. 2. CLC-Core is a consistent, multi-use grid 3 database repository for environmental information populated with a broad range of land cover, land use and ancillary data, forming the information engine to deliver and support tailored thematic information requirements. 3. CLC+ is the end point for this specific exercise and is a derived vector and raster product from the CLC-Core and CLC-Backbone and will be a LULC monitoring product with improved spatial and thematic performance, relative to the current CLC, for reporting and assessment. 4. The final element of the conceptual design, although not strictly a new product, is the ability to continue producing the existing CLC, which may be referred to as CLC-Legacy in the future, which already has a well-established and agreed specification. Table 2-1 provides a first overview of the main characteristics of the four elements that are developed in more detail in the following chapters of this document. The table allows the reader to make comparisons between the key characteristics of the products 3 A data structure whose grid cells are linked to a data model that can be populated with the information from the different sources. 9 P a g e

11 Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2 nd generation CLC. CLC-Backbone CLC-Core CLC+ CLC-Legacy Description Detailed wall to wall (EEA-39) geometric vector reference layer with basic thematic content. All-in-one data container for land monitoring information according to EAGLE data model. Thematically and geometrically detailed LULC product. A more generalised LULC product consistent with the CLC specification. Role / purpose Support to CLMS products and services at the pan-european and local levels. Thematic characterisation of CLMS products and services at the pan- European and local levels. Support to EU and national reporting and policy requirements. Maintain the time series (backwards compatibility) and support legacy business systems. Format Vector. GRID database. Raster and vector. Raster and vector. Thematic detail <10 basic land cover classes, few spectraltemporal attributes. Rich attribution of LU, LC and ancillary information. High thematic detail including LC and LU with improvements compared to CLC. CLC-nomenclature with 44 classes + changes between the 44. Geometric detail (MMU, grid size) 0.25 to 5.0 ha. 10 x 10 m to 1 x 1 km. TBD, related to CLC- Backbone. 25 ha for status, 5 ha for changes (raster: 100 x 100 m). Update cycle 3-6 years. Dynamic update as new information becomes available. 3-6 years. Standard 6 years. Reference year Production year / TBD TBD Method Geospatial data integration and image segmentation for geometric boundaries and attribution / labelling by pixelbased EO derived land cover EAGLE-Grid approach: population and attribution of regular GRID Derived by SQL from CLC-Core Derived from CLC+ or directly from CLC- Core (TBD.) Input data Geospatial data, EO images and ancillary data selected from LoCo (with visual control) and HRL products. CLC-Backbone, all available HRLs, LoCo, ancillary data, national data provisions (e.g. LU). CLC-Core and CLC-Backbone CLC + and / or CLC-Core 10 P a g e

12 Figure 2-2 is a schematic description of the conceptual design showing the relationship of the new elements / products to existing CLMS products in terms of their format and level of spatial detail. In this representation: 1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map with fewer details. 2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more spatial and thematic details. 3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific surface characteristics with high spatial details. 4. The proposed CLC-core, a grid product where the level of detail was still to be decided. 5. The CLC-Backbone and CLC+, both polygon maps with more details Figure 2-2: A scale versus format schematic for the current and proposed CLMS products. Given the context, requirements and issues, the aim should be to find a means to deliver the concept and its proposed products with an efficient mixture of industrially produced material backed up by auxiliary information from various national programmes. It is important to propose a viable system in which there is flexibility to adapt the later steps to issues of feasibility and practicality once the implemention of the earlier steps is underway. For instance, the outcomes of the CLC-Backbone production should be able to influence thematic content of the CLC-Core and / or the technical specifications of the CLC+. 11 P a g e

13 2.3 Role of industry The development and production of CLC to this pointed has had only a limited role for industry, mainly focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012 etc.), the validation of the CLC2012 products and in some MS the subcontracting of the actual production. Conversely, the production of the non-clc products within the CLMS has been dominated by industry through a series of service contracts to generate consistent products across Europe. Industry has the ability to produced operational solutions which exploit automation, can handle large data volumes and are scalable to European-wide requirements. As the amount of available EO and GI data increases, highly efficient and effective mechanisms for production will be required for at least parts of the European land monitoring process and economies of scale must be exploited. Industry therefore offers a number of capabilities which will be required at selected points within the 2 nd generation CLC. DG GROW has expressed the wish that industry should have an initial role in the production of CLC- Backbone which has therefore been designed in part to exploit industrial capabilities. Further opportunities for industrial involvement are shown in Table Potential role of Member States The MS have always been intimately linked with the CLC as its production has been the responsibility of the nominated authorities through EIONET. The MS have provided the production teams for the actual mapping to exploit local knowledge, familiarity with native landscape types and access to national datasets which may not have been possible to share further. This bottom-up production approach has been key to successful delivery of this important time series. With respect to the non-clc products within the CLMS the MS involvement has been limited so far to verification and, in the case of the 2012 products, enhancement. Although the MS have provided valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have not been fully exploited within the CLMS. Table 2-2 shows the potential for a greater role for the MS across all of elements / products. It is important that the MS experts have oversight of the specification, as is happening within this project, and opportunities to contribute data, experience and location knowledge to the production and validation activities where appropriate. 12 P a g e

14 Table 2-2: Summary (matrix) of potential roles associated with each element / product. EEA / DG GROW / EC MS CLC-Backbone CLC-Core CLC+ CLC-Legacy Definition, coordination, main user. Review of specification, validation, user. Definition, coordination, main user. Review of specification, population with national datasets, user. Industry Production. Implementation and maintenance DB infrastructure. Definition, coordination, main user. Review of specification, support to production, validation, user. Support production, validation. Definition, coordination, main user. Production, validation, user. Validation. 2.5 Engagement with stakeholder community The success of the project and the long-term development of land monitoring in Europe is intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant stakeholders are aware of this activity and contribute their requirements and opinions to this process. Revised versions of this document are foreseen at specific milestones towards a final version for deliver in early This specific deliverable, D3, is the second step towards the definition of the conceptual strategy and the potential technical specifications for a series of new CLMS products. It aims at communicating these details to the stakeholders involved in European land monitoring to elicit feedback, comments and questions. The work reported here begins with a requirements review which goes beyond the remit of the call for tender. The four elements and potential products of the 2 nd generation CLC within the CLMS are described in further detail in the following chapters. This version already includes the first feedback from the NRC LC meeting in Copenhagen in October, For CLC- Backbone, this document updates the draft versions of the technical specifications and the outline implementation methodology provided in D2. These details are still open for discussion and able to be reviewed by stakeholders and will ultimately be used as input for an open call for tenders to industrial service providers for a production to start in For CLC-Core, this document provides a more advanced outline of the technical specification and proposes a number of options for the implementation. The technical design of CLC-Core will continue to be developed based on stakeholder feedback in future steps. Similarly, the current thinking around CLC+ will continue to be developed to illicit feedback to guide expansion of the technical specifications at a future step within the project. 13 P a g e

15 3 REQUIREMENTS ANALYSIS In line with the principle of Copernicus this analysis in support of the development of new products with CLMS is driven by user needs rather than the current technical capabilities of EO sensors and processing systems. The technical issues will be dealt with in the chapters related to the products focusing on specification and methodological development. This analysis was initially based around the requirements set out in the original call for tender, but this has now been extended through inclusion of recent work by the EC and ETC to provide a broader range of stakeholder needs. As this project develops further, requirements will be integrated to support the increasingly detailed specifications towards those for a complete 2 nd generation CLC. Although the MMU and thematic issues of CLC have been extensively reported for some time, the target specification for a future improved harmonised European land monitoring product are less clear. The proposed products should support European policies, particularly assist reporting obligations on European level (and not on national level) on land use, land use change and forestry (LULUCF), plans for an upcoming Energy Union and long-term climate mitigation objectives. However, many of the policies that could exploit EO-based land monitoring information rarely give a clear quantitative requirement for spatial, temporal or thematic specifications. The higher-level strategic policies, such as the EU Energy Union often refer to the monitoring of other activities such as REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to rely on the products currently available. For instance, LULUCF assessments in Europe (independently from national reporting obligations) use CLC, while at the global scale they use Global Land Cover 2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU). Some of these initiatives have explored the potential of improved monitoring performance, such as the use of SPOT4 imagery with a 20 m spatial resolution to show land-use change between 2000 and 2010 for REDD+ reporting, but until very recently it has not been practical for these types of monitoring to become operational globally. A more fruitful avenue for user requirements in support of the 2 nd generation CLC is to consider the initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU Biodiversity Strategy to 2020 was adopted the European Commission to halt the loss of biodiversity and improve the state of Europe s species, habitats, ecosystems and the services. It defines six major targets with the second target focusing on maintaining and enhancing ecosystem services, and restoring degraded ecosystems across the EU, in line with the global goal set in Within this target, Action 5 is directed at improving knowledge of ecosystems and their services in the EU. Member States, with the assistance of the Commission, will map and assess the state of ecosystems and their services in their national territory, assess the economic value of such services, and promote the integration of these values into accounting and reporting systems. The Mapping and Assessment of Ecosystems and their Services (MAES) initiative is supporting the implementation of Action 5 and, although much of the early work in this area on European level has used inputs such as CLC, it is obvious that to fully address this action an improved approach is required especially for a better characterisation of land, freshwater and coastal habitats, particularly in watershed and landscape approaches. The spatial resolution at which ecosystems and services should be mapped and assessed will vary depending on the context and the purpose for which the mapping/assessment is carried out. However, information from a more detailed thematic characterisation and classification and at higher spatial resolution are required which are compatible with the Europeanwide classification and could be aggregated in a consistent manner if needed. A first version of a European ecosystem map covering spatially explicit ecosystem types for land and freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of 14 P a g e

16 the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully exploit the recently available Copernicus Sentinel data and land products, and move down to a higher spatial resolution. The recent EC-funded NextSpace project collected user requirements for the next generation of Sentinel satellites to be launched in the 2030 time horizon. These requirements were analysed on a domain basis and from the land domain those requirements which were given a context of land cover (including vegetation) were initially considered. This was extended so that related contexts such as glaciers, lakes, above-ground biomass, leaf area index and snow were also considered. A broad range of spatial resolutions were requested from sub 2.5 m to 10 km, but two thirds of the collated requirements wanted data in the m region. As would be expected the requested MMUs were also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field / city block level for much of Europe. The temporal resolution / update frequency was dominated by yearly revisions, although some users could accept 5 yearly updates and some wished for monthly updates, although the reason was unclear. The users were less specific about the thematic detail required and the few references given were to CLC, EUNIS, LCCS and basic land cover. The preferred accuracy of the products was in the range 85 to 90%. The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of EIONET members involved with the CLMS and CLC production considering a range of topics in advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the potential improvements that could be made towards a 2 nd generation CLC product. As expected, it was noted that some CLC classes cause problems because of their mixed nature and instances on the ground that are sometimes complex and difficult to disentangle. It was suggested that this situation could be improved by the use of a smaller MMU so that the mapped features have more homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear features to be represented more realistically. Of the 32 countries, who responded to the questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general, a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to remaining at the current 25 ha, but the majority favoured to 5 ha. Thematic refinement is also supported by around one third of the respondents, who requested improved thematic detail, separation of land cover from land use, splitting formerly mixed CLC classes and the addition of further attributes to the spatial polygons. From the requirements review so far and considering the requirements put forward by LULUCF, it is suggested that the ultimate product of a 2 nd generation product would have a MMU of 0.5 ha and be based on EO data with a spatial resolution of between 10 and 20 m. The thematic content would be a refinement of the current CLC nomenclature to cope with changing MMU, separation of land cover and land use for the needs of ecosystem mapping and assessment. The temporal update could come down from 6 years to 3 years in the short-term, and potentially to 1 year in the longer-term ha being required by LULUCF 15 P a g e

17 4 CLC-BACKBONE THE INDUSTRY CALL FOR TENDER CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-towall coverage of CLC-Backbone shall draw on, complete and amend the picture started by the LoCo (appr. 1/3 of total area) of the current local component products (Urban Atlas, Riparian Zones and Natura 2000) as are currently shown in Figure 4-1. It will be comprehensive and effectively complete the coverage of the EU-28 as a minimum, but preferably the whole EEA-39. Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red UA, yellow - N2K, blue RZ LC) CLC-Backbone will initially be produced by a mostly automated, industrial approach, where production can be separated into different levels, including different degrees and sequential order of automation and human interaction (Figure 4-2). Within CLC-Backbone landscape objects are defined as vector polygons and identified on different levels. Step 1: The first level of object borders (skeleton) represent persistent objects ( hard bones ) in the landscape (Step 1). Step 2: On a second level a subdivision of the persistent landscape features (Level 1) will be achieved through image segmentation ( soft bones ), based multi-temporal Sentinel data within a defined observation period (resulting in Level 2 landscape features = polygons). 16 P a g e

18 Step 3: The output of CLC-Backbone the delineated objects represent spectrally and/or texturally homogeneous features that are further characterised and attributed using the EAGLE land cover component concept. The characterization of objects (segments) can be achieved using two options Attributing the segments using summary indicators based on a pixel-based classification of fairly simple land cover classes (e.g. dominating class, percentage mixture of classes for each segment) And / or attributing the segments based on spectral mean values of segments Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects based on this classification For implementation of CLC-Backbone the existing European land cover monitoring framework has to be considered: Heritage: The concept has to take into account the existing pan-european and local Copernicus layers and the spatial and thematic representation of the landscape. The technical specifications to be provided by this contract are therefore based on a thorough review of available and feasible datasets and their geometry and thematic content. Integration: CLC-Backbone will integrate selected geometric information from existing Copernicus products at different steps within the process. Sequence of production: Ideally, the production of CLC-Backbone and of the other Local Component data sets would be arranged in a sequential order that CLC-Backbone can build and 17 P a g e

19 integrate on the most updated Local Component data to achieve best consistency among the layers. Realistically, the production will need to make use of selected elements of the existing Local Component layers and High Resolution Layers and of existing ancillary data that might be not identical with the reference year of the production. The time lag between input data and production reference year does not comprise a limiting factor. As the final delineation of polygons is achieved by image segmentation ( soft bones ) of Sentinel-2 images of the actual reference year, any derivation of the hard bone based geometry compared to the actual landscape structure (due to outdated data) will be compensated by the image segmentation step. Considering an annual land cover change of approx. 0,5% (on CLC spatial scale) the existing layers and dataset still be able to provide a reliable input that is adapted by the image segmentation step. 4.1 Spatial scale / Minimum Mapping Unit CLC-Backbone shall address landscape features with a predefined MMU of 0.5 ha and a minimum mapping width (MMW) of 10m (these values are subject to the ongoing consultation process and might be revised). Each object (delineated polygon) in CLC-Backbone will be encoded according to a quite basic land cover nomenclature (between 5-15 pure land cover classes, in line with EAGLE Land Cover Components) and additionally characterized depending on the user requirements - by a number of attributes (e.g. NDVI time series) giving more detailed information about the land cover and their dynamics inside the polygon. 4.2 Reference year The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e. Sentinel-2 HR layer stack. It is suggested that the image stack covers one full vegetation season ranging from 2017 to 2018, noting that the vegetation season in the south of Europe starts already in October of the previous year. 4.3 EO data The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available observations from the ESA service hubs to provide a multi-spectral, multi-date, m spatial resolution imagery as input. As the full satellite constellation of Sentinel (using 2A and 2B) will be available operationally from late 2017 onwards, it is anticipated that last month of 2017 and full year 2018 is the first full vegetation period for a comprehensive multi-temporal coverage of Europe. In case of availability of a full European-wide coverage of VHR data for 2018, its integration in the processing chain should be considered. The synergic use of Sentinel -1 (SAR) imagery shall be considered for enhancing the thematic information, especially on thematic issues like soil properties (wetness), tillage and harvesting activities. Recent studies on the use of merged optical-sar imagery as well as SAR visual products and the experiences in HRL production have confirmed their applicability for both semi-automated and visual interpretation. 18 P a g e

20 4.3.1 Pre-processing of Sentinel-2 data As Sentinel-2 is an optical system the average cloud coverage influences the number of observations significantly. Nowadays scenes are not ordered anymore according to their average cloud coverage, but all cloud free (and shadow-free) pixels of an image can be analysed. The constellation of both Sentinel-2 satellites improves the revisiting time to 3-4 days on average in Europe, having a more frequent coverage in the north according to the overlaps of the S-2 path-footprints. Each scene has to be pre-processed to correct for atmospheric conditions. ESA has evaluated two different types of atmospheric corrections software algorithm in order to produce a L2A product: Sen2Cor and Maja The Sen2Cor algorithm identifies clouds and shadows in a singular scene-by-scene approach, whereas Maja (combination of MACCS (CNES/CESBIO) and ATCOR) identifies clouds and shadows according to the complete image stack over time. ESA will decide until end 2017 which algorithm and which data processing will be implemented for atmospheric correction. Current plans foresee to start atmospheric correction using MAJA in Europe from 1/2018 onwards and to reassess the algorithms in The cloud detection is an important element in the processing chain, but for northern countries the typical cloud coverage may still be a very limiting factor. Therefore, either archive data from longer periods or data from additional sensors (optical and radar) should be considered to fill gaps due to cloud coverage. 4.4 Copernicus and European ancillary data sets There are a number of potential input sources beyond EO image data that can be used in the delineation of landscape objects. Table 4-1 shows the potential datasets that were analysed to form the majority of the hard bones or persistent boundaries in the landscape. Only those data can be considered that provide an adequate spatial resolution. Those data that are finally suggested to contribute to CLC-Backbone are marked in green, whereas all other data will be used to attribute the thematic content of the GRID-database in CLC-Core. Geospatial data (especially on roads, rivers, buildings, LPIS and land cover/land use) are held on national level in many cases in higher level of detail. Some of the European data might be substituted by national data given that the criteria concerning technical and licensing issues as described in chapter 0 are met. Besides free and open data on European level also commercial data could be considered as input e.g. street data (e.g. Navmart HERE maps, etc.) or the TanDEM-X DEM for small woody features and structure information for tree covered areas. 19 P a g e

21 Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which were analysed as potential input to support the construction of the geometric structure (Level 1 hard bones ) of the CLC-Backbone. The products finally proposed for further processing are marked in green. product MMU Minimum width Pan-European CORINE Land Cover and 25 ha status layer accounting layers 5 ha change layer Format Potential use for constructing basic geometry 100 m Vector outlines of basic vegetation types in remote areas without roads and settlement network Reference year 2012, 6-year update cycle HRL imperviousness 20 m pixel (0,04 ha) - Raster 20 m HR satellite imagery 2015, 3-year update cycle HRL tree cover density 20 m pixel (0,04 ha) - Raster 2015, 3-year update cycle HRL forest types 0,5 ha - Raster Minimum crown cover 10% 2015, 3-year update cycle HLR Permanent water bodies 20 m - Raster 20 m HR satellite imagery 2015, 3-year update cycle European Settlement Map (ESM) 10*10 m pixel (0,01 ha) - Raster JRC one-off scientific product: 2.5 m VHR imagery; scientific product reference year 2012 GUF+ 10*10 m Raster S1 10m and Landsat 30m, GUF will be based on S1/S2 HRL Small woody feature 0,002 ha (raster product) Linear Vector and Streamlining halted due to VHR 2015 elements Raster (5m concerns 2014, no update foreseen and 100m) HRL phenology 10 m - Raster Only attribution in CLC-Core 2018, likely 3- years Local Components Urban Atlas 0,25 ha 1 ha 10 m Vector Delineation of roads as polygon feature and outer border of settlement structure (large cities) P a g e

22 product MMU Minimum width Format Potential use for constructing basic geometry Riparian Zone 0,5 ha 10 m Vector Delineation of rivers and roads as polygon feature. N2K product 0,5 ha 10 m Vector RPZ Green linear elements 0,05 ha 0,5 ha < 10 m Vector Delineation of small woody width vegetation elements < 100 m length National products Variety of products for land cover, land use, population, environmental variables Accompanying/Ancillary Layers IACS/LPIS Varying from block to parcel level, depending on national system Aggregated data according to EAGLE data model - Vector Freely available and accessible for less than 1/3 of Europe. Increasing availability from year-to-year due to INSPIRE regulation. Varying thematic content (from reference parcel only to detailed crop types) Open Street Map - roads n.d. Line Linear transportation network (centre lines) Open Street Map buildings n.d. Vector - Delineation of single buildings Polygon (polygons) Open Street Map land use TBD. Vector - Polygon Polygon selection of settlement areas and transportation infrastructure according to OSM land use tags HERE maps (Navmart) n.d. Line Commercial layer to replace OSM or national road databse Reference year irregular 2017, Yearly updates Up-to date; full time history Up-to date; full time history Up-to date; full time history Regular updates 21 P a g e

23 product MMU Minimum width EU Hydro (Copernicus reference layer) WISE WFD surface water Waterbodies as polygons bodies Waterbodyline as line geometry Format Potential use for constructing basic geometry Line Geometric quality derived from VHR images, very high location accuracy Polygon + line Based on WFD2016 reporting (UK and Slovenia only for viewing) European coastline Line Has been produced by Copernicus for HRL production. Crowd-sourced data (citizen observatories) n.a. Point Observations from citizens as promoted through Horizon 2020 research initiatives (e.g. LandSense, groundtruth 2.0) Reference year Tbd. 6 years (next update 2022) Irregular 22 P a g e

24 Selected classes of the local Component products, Urban Atlas, Riparian Zones, and Natura 2000, obviously have valuable information to offer for the production of CLC-Backbone. The HRL layers will be mainly used for populating CLC-Core and only as one exception as well in CLC-Backbone. The usage of additional geospatial information for constructing Level 1 hard bones is based on the following arguments: European landscape is a highly anthropogenic transformed landscape, where transport and river networks (beside others) form the basic subdivision of landscape The location of transport and river networks are fairly known due to geospatial data on national and/or European scale Many of the borders defined by transport and river networks can be identified as well as features directly from Sentinel-2 images, but this identification is combined with partially very high costs and lower accuracies The geospatial datasets are used in the delineation of level 1 hard bones in two different forms: As linear networks that define the border of landscape objects (e.g. parcels that are surrounded by roads and rivers) As polygons that define a-priori landscape objects (e.g. wide roads, wide rivers) Concretely we suggest the following use of existing COPERNICUS and ancillary data for the production of CLC-Backbone (i.e. mainly for the provision of geometric information) and CLC-Core (i.e. thematic related information). For more detailed information on the suggested production methodology, please refer to chapter 4.8. CLC-Backbone: o Urban Atlas: Use of outer delineation of linear transport infrastructure (class 122xx roads and class 122) for Level 1 hardbone delineation Use of outer delineation of water areas (class water) for Level 1 hardbone delineation o Riparian Zones: Use of outer delineation of cities as persistent segments (Level 1) for the delineation of settlement areas (class 111xx continuous urban fabric, class 112xx continuous urban fabric, isolated structures and class industrial and commercial units). Use of river delineation (class 911 interconnected water courses and 912 highly modified water courses and 913 separated water bodies) for Level 1 hardbone delineation. Use of outer delineation of cities as persistent segments (Level 1) for the delineation of settlement areas (class 1111 continuous urban fabric, class 1112 dense urban fabric, 1113 low density fabric, 1120 industrial and commercial units). 23 P a g e

25 o Natura 2000 o HRL forest types: o OSM roads o EU-Hydro Using in analogy to RZ and UA the same classes for delineation of rivers, transport infrastructure and urban outline. Creation of tree mask based on the HRL forest types, as this layer already applies a useful tree cover density threshold (>30% TCD) and a MMU of 0,5 ha for the Level 1 hardbone delineation of forest boundaries. Roads > 10m: delineation of roads as polygons according to S-2 pixel based information Roads < 10m: transportation network outside the LoCo as a-priori border between adjacent polygons (Level 1 object) River network outside the LoCo as a-priori border between adjacent polygons (Level 1 object) o HRL imperviousness o OSM buildings o OSM land use Derivation of outer boundary of settlements (Level 1 objects) Refined with ESM European settlement map Enhancement of settlement mask using buffer methods to delineated and integrate OSM single buildings Enhancement of road and railway infrastructure using polygon area from land use dataset with specific OSM tags (e.g. roads ); full OSM tag-list available in Annex 1 Enhancement of settlement mask in areas, where OSM-building is not complete, using polygon area from land use dataset with specific OSM tags (e.g. residential ); full OSM tag-list available in Annex 1 CLC-Core: o National data (MS contribution) National datasets (generalized / resampled if not freely available in original resolution) Manual input by MS without readily available sufficient national datasets Land Use data Crop types Land Use intensity 24 P a g e

26 o Specific data input to be defined for CLC-Core to maintain CLC-heritage and to improve thematic content (e.g. detailed LU, LC or habitat information). UA, RZ, N2K o all HRLs o OSM land use Use of thematic information (aggregated to one single harmonized nomenclature) for populating the grid database. Use of thematic information (assigned to EAGLE land cover components) for populating the grid database. Attribution of land use according to existing cross-walk between OSM land use tags and CLC nomenclature Level 2 o Crowd-sourced data Data from citizen observatories become increasingly available either as validation/verification dataset (e.g. LacoWiki) or as attributive data source (e.g. FotoQuest GO app, specific campaigns and social media like FlickR, foursquare, etc.) o CLC accounting layers o CLC Mineral extraction sites Data from CLC used for identification of thematic features not covered by any other (finer resolution) datasets, e.g. mineral extraction areas, airport, glaciers. Outlines of basic LC types in remote areas without road and settlement network and without forest o EO parameters (from Sentinel-2 image stacks) This includes daily vegetation trajectories as well as key indicators for the characterization of the growing season (start day, end day) and biological productivity (maximum, amplitude, integral) Any geometric information ingested in CLC-Backbone from existing products (partially 3-6 years older than current RS data) will be cross-checked and corrected during the image segmentation process. Every geometry that is outdated due to the time lag of the ancillary data - and where recent land cover changes occurred - is updated in the phase of the Level 2 soft bone segmentation, where actual EO data are used Completeness OSM road data One major critic when using crowd-sourced data is their incompleteness and varying quality. For OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, ) have shown a completeness in Europe of by far more than 96% for the large majority of the EEA 39 countries (the figures for the completeness per country are available in the Annex). According to P a g e

27 their study only Serbia, Bosnia-Hercegovina and Turkey have completeness values below 90%. These three countries may be subject for a refinement of roads using additional national data sources. In the Figure 4-3 below an impression of the completeness of OSM in four countries (Estonia, Portugal, Romania and Serbia) is provided based on aerial images and Sentinel-2 background data. 26 P a g e

28 Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova), Romania (near Parta) and Serbia (near Indija) (top to bottom). Bing/Google and EOX Senitnel-2 cloud free services as Background (left to right). 27 P a g e

29 4.5 National data The use of national data replacing the foreseen European geospatial data for the creation of the CLC- Backbone could be considered if the national data fulfil the following prerequisites: Provision of a consistent set of information across European countries, i.e. similar scale and amount of geometric detail, comparable thematic information, Provision of data at no cost (or at least reasonable costs, if cost reduction in the overall process justifies this costs) Provision of data in a technically useful form (i.e. roads as a connected network, not as DXFlike line work for drawing). Provision of data to industrial service providers, potentially outside the country of the data provider. Support of the free and open data policy of the end product, including the nationally provided input data. Deficiencies in European dataset concerning completeness and fitness for purpose in the respective country In case these prerequisites are fulfilled, the national data can be included based on an evaluation of economic (e.g. work load for creation of harmonized datasets) and technical criteria. Candidates for national data (to be used for Level 1 hard bone delineation) are the first of all datasets that fall under INSPIRE Annex I and should be available in harmonized version from November 2017 onwards: Transportation network (including information on tunnels) Buildings Water courses (running and standing water) Coastline Example: Rivers and lakes One example for a joint European (EU 28) dataset is the reference spatial data set of the water framework directive (WFD) that is part of the water information system for Europe (WISE) 6. They compile national information reported by the EU Member states to one homogeneous dataset. According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is available in a version 2010 and Such datasets may provide a valuable data source for the generation of hard bones, but still do not provide the optimal solution. In the case of the WISE WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only available for EEA and EC for visualisation purposes). Compared to the EU-Hydro dataset they provide a higher geometric quality, but are less complete, as smaller rivers of catchments less than 10 km2 are not included by definition P a g e

30 4.5.2 Example: roads The project ELF European Location Framework 7 has compiled overviews for existing national services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single point of access for harmonised reference data from National Mapping, Cadastre and Land Registry Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for European reference data, but at the current point in time the data situation even for core national data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing (semantically and geometrically) the various dataset. The advantage of using national data instead of European wide available data has to be carefully evaluated with regard to costs and benefit. First of all, it is still a challenge to find the appropriate dataset, and even specialised projects like ELF that would like to serve as a single-entry point do not manage to provide the full picture. In the case of roads according to the ELF-project only Iceland, Norway, Finland, Denmark and Czech Republic provide adequate services (Figure 4-4). It is very likely that other national services do exist, but cannot be found easily. This has to be considered, when replacing European wide existing data sources with national data. Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services using the ELF interface (Nov. 2017) Example: land parcel identification system (LPIS) The land parcel identification system (LPIS) in Europe is not suited to be integrated into the Level 1 hard bone process, as the data is currently either not available on national level (appr. 50%) partly only available on regional level (Spain, Germany) do not represent the same spatial scale across countries are not available with the same semantic information (e.g. grassland/arable land differentiation) not accessible for all users / uses P a g e

31 Figure 4-5 illustrates the availability of LPIS datasets across Europe. Availability in this sense means the possibility to physically download the GIS data in the respective country / region. Figure 4-5: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense. 4.6 Specification for geometric g delineation The vector geometry of the CLC-backbone is derived using a two-step approach (Table 4-2). The first level ( hardbones ) is derived using the partition of landscape based on existing geospatial information (transport network, river network, urban outline) The second level ( softbones ) is derived by applying an automatic image segmentation within the first level objects The resulting polygons have to comply with the MMU and MWU and will be further attributed according to the thematic specification (chapter 4.7). Persistency in landscape is always a relative term, as landscape objects are subject to various temporal dynamics. Within the traditional CORINE Land Cover mapping approach changes were perceived as changes of land cover (change between two categories of the defined nomenclature) within a 6-year period such changes can be referred to as status changes, meaning a change between a specific number of years, but not the changes within a selected year. However, in parallel changes occur in much higher temporal frequency e.g. in agricultural management. Those changes are referred to as cyclic changes and represent frequent land cover changes mostly within one 30 P a g e

32 year or one vegetation cycle (e.g. on arable fields from bare soil to vegetated areas and vice versa). Due to the recent availability of multi-temporal observations in the order of 1-2 weeks these kind of changes can be observed on an appropriate level of scale (e.g. farm management unit). Therefore, within CLC-Backbone those features of the landscape are defined as persistent objects that are very unlikely to be altered within one year. These structures (roads, settlements) are very unlikely to be removed, but may be subject to expansion (linear networks: roads, railways and rivers; physical boundaries like urban outline). A robust reference geometry ( hardbones ) built up of transportation network, river network (incl. lakes) and urban outline (and potentially forest outline) will form the first level of division of landscape based on a-priori data. Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) 1. Level objects 2. Level objects Name Hard bones Soft bones Method Partition of landscape according to Image segmentation existing spatial data Input data Persistent landscape features: Transport infrastructure (roads, railways excluding tunnels); river network, lakes, coast line, urban outline, forest outline Special issue Advantage Disadvantage Differentiation between linear networks as border of segments and (larger) linear networks that comprise polygons Very good cost/benefit ratio; no doubling of efforts, reuse existing European geospatial data infrastructure; Actuality of data might be outdated; large vector processing facilities needed; effort to integrate and collect appropriate data Sentinel-2 complete time-stacks of one vegetation period Pre-processing (atmospheric and topographic) of S-2 images; cloud detection and cloud free observation in Nordic countries Independent data source; Clearly defined observation period Reliability and reproducibility of segmentation algorithms Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres necessary Level 1 Objects hardbones The idea behind the Level 1 objects ( hardbones ) is to derive a partition of landscape that is reflecting relatively stable (persistent) structures and borders. The borders of these landscape partition are built from anthropogenic and natural features. The transportation network like roads and railways (without tunnels) built the major component, followed by the partly natural and manmade structures of rivers and lakes. Other features like the borders of settlements and forest are not as persistent as roads and rivers, but give an additional partition of landscape that is fairly known through existing geospatial data. As geospatial data can be outdated depending on the reference year the hard bone partition of landscape will not be perfect. But it does not have to be perfect, as it is only an initial step and missing elements or correction of outdated polygons will be integrated within the second level (soft bones) of the production process, as the second level is based on actual 31 P a g e

33 and independent image segmentation. The reason why the a-priori partition of landscape is promoted here are a higher cost-benefit analysis. If an image segmentation is used from the very beginning (without the hard bone working step), much more efforts are needed to derive already known features. And some of the features (small roads) are not recognisable in a 10 by 10 m pixel scale thus leading to polygons that are not acceptable for most of the foreseen applications. When building the spatial framework, there is a long list of potential sources for persistent object geometries. Some can be derived from existing Copernicus products such as the HRL imperviousness, and Urban Atlas) or European reference layers (EU-Hydro) other additional data sources like crow-sourced data (OSM) or non-operational scientific products (e.g. European settlement mask - ESM). National data may replace the named input data in case the criteria mentioned under chapter 0 are met. From the long list of input sources, the above-mentioned optimal selection (see Table 4-1) a most appropriate integration processes is selected, to provide the best available spatial framework at pan- European level. A limited visual inspection and correction of the input data in this phase is necessary for all data that do not comply with the requested accuracies (10m positional accuracy). As priority, the wider OSM roads (European major highways and roads wider than 10m) and the accuracy of the delineation of the local components has to be analysed Types of borders The first step will therefore be to build the Level 1 objects ( hard bones ) of the geometric borders from the ancillary data capturing the persistent features of the landscape such as road, rivers, railway lines etc. The information source for the hard bones will be existing vector data for roads, railroads (OSM) and river networks (CCM / Riparian Zones), supported by ancillary datasets (OSM, HRL-imperviousness, ESM) and Copernicus Local components for outer boundaries of settlement areas and HRL for tree covered areas (HRL forest types). Technically speaking the first level objects will identify both persistent line objects and persistent polygon objects It is important to note that in this stage only geometric information is taken over from existing geospatial datasets, but without any thematic information. The information, if a specific border line constitutes a road, river or urban border is irrelevant for the further process in CLC-backbone, as the thematic classification is derived in the last step completely independently from any existing geospatial input data, solely based on the spectral and textural information from the Sentinel-2 images. The majority of the transportation network (roads and railways) as well as smaller rivers are represented as line network, where each line identifies the centreline of the objects and present a border of a Level 1 landscape object. 32 P a g e

34 Roads and rivers wider than the defined MMW of 10 m (still to be defined through the consultation process) have to be represented as polygons. Rivers: For wider rivers, the delineation in the Riparian Zone 2012 is considered complete and quite accurate within Europe, as the dataset covers all rivers above Strahler level 3 (and will be extended to Strahler level 2 until Q4/2018). Roads and railway: As a normal 2-lane road does not meet the requirement of being wider than 10m the selection of roads as polygons is restricted to the primary road infrastructure (highways). All OSM-roads with the special tags motorway and trunk have to be transformed from line features to polygon features using a mixed automated buffering method combined with a visual inspection and correction step based on the SENTINEL-2 data. Roads and railways within tunnels have to be excluded from the hardbone, as they do not establish a recognisable landscape feature on the ground. Beside the typical linear networks persistent features in the landscape are identified as settlement outside borders forest borders. Settlement borders: A best estimate of the outside borders of settlements can be derived in combining existing datasets first of all the LoCo Urban Atlas and outside of the coverage of the LoCO Urban Atlas using the HRL imperviousness (refined with the ESM) and in combination with a building-layer (e.g. OSM). Morphological operations are needed to adapt the combined input layer to comply with the MMUs defined in the CLC backbone product specification and to transform the mostly vector oriented input data to a raster dataset in line with the Sentinel-2 grid system. Forest borders: For the partition of landscape through forest borders all forest polygons from the LoCo product suite (UA, RZ, N2K) can be used. Outside of the LoCO the HRL forest types (FTY) layer will be used as it applies already a specific forest definition with a density threshold of >30% and a MMU of 0,5 ha Hierarchical order of borders As the input borders partly overlap it is important to define a relative weighting and importance of the input layers. The ordering is derived from the quality, type and geometric representation (line or polygon) of the input data. The following order is specified (with increasing importance): HRL forest type (as polygon) LoCo forest classes (as polygon, from UA, RZ, N2K) Settlement outline (as polygon, from HRL imp, ESM, OSM) Road + railway (as line, excluding tunnels) Rivers (as line) Road + railway (as polygon) o from UA higher level streets o from UA lower level streets Rivers (as polygons) o from RZ (code 911) Lakes (as polygons, from EU-hydro, WFD, national data) Coastline (as line) AoI (as line) 33 P a g e

35 4.6.2 Level 2 Objects softbones The aim of this second level segmentation is to derive objects that can be differentiated spatially according to their land cover and land use aspects that have an influence on the spectral response and behaviour throughout a year. With the increasing number of observation dates per pixel throughout a year the likelihood to find homogeneous management units increases, but at the same time pixel based noise is increased as environmental conditions within a field parcel may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the number of acquisitions to maximize the identification of single field parcels and on the other side to reduce salt and pepper effects due to varying spectral response. The level 2 landscape objects represent in the ideal case a consistent management of field parcels, and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a year. As an example, different settlement structures (e.g. single houses or building blocks) should appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial built-up pattern) and will therefore be delineated as different polygons. The actual cultivation measures applied on the land will also influence the type of delineation as e.g. in agriculture the field structure is visible in multi-temporal images due to the differences in temporal-spectral response according to the applied management practices on field level (mowing, ploughing, sowing) Accuracy of geometric delineation: The geometric accuracy is defined with Appropriate size: o Too large polygons: not more than 15% of all polygons o Too small polygons: not more than 20% of all polygons Any deviation of more than 20% of polygon area is considered too small or too large Appropriate delineation o Shift of border: max. 20m shift not more than 10% of perimeters is shifted more than 20m 4.7 Specification for thematic t attribution The polygons (hard bones and soft bones) derived from the geometric specification are attributed according to a set of simple land cover classes. There are various options to attribute the thematic content of the polygons 1. Option: Attribution with basic land cover classes based on a pixel-based classification (10m or 20m) of all pixels within the object 2. Option: Attribution with basic land cover classes based on spectral properties of the object itself 3. Option: combination of Option 1 and Option 2 34 P a g e

36 Option 1: pixel based classification Similar to the HRLs a pixel based classification of crisp land cover classes (no intensities) is conducted based on the SENTINEL-2 images. The number and type of land cover classes is discussed below. The object (polygon) is attributed with various statistics derived from the pixel based classification Dominating class (majority) Percentage of class i j Ordered list of class percentages (3-5 most dominating classes) Count of regions per class i j (one region is one spatially homogeneous area with only one class type) Option 2: Object based classification The objects (polygons) are classified using the same crisp land cover classes as above, but not on a per-pixel base, but based on the spectral values of the object itself. Statistical parameters like mean value, standard deviation and combination of bands (e.g. NDVI) are helpful to define the class. Option 3: combined pixel based and object based classification Combination of Option 2 and Option 3. In Option 3 a polygon is as well described using the pixelbased approach and in addition a classification of the polygon itself Thematic classes Accuracy: The accuracy for the thematic differentiation is defined with Option 1: 5 classes o 90% pixel based overall accuracy o 95% object based overall accuracy Option 2: 9 classes o 85% pixel based overall accuracy o 90% object based overall accuracy Thematic detail: After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape object is then labelled by simple land cover classes. The descriptive nomenclature which is applied for the labelling contains the following EAGLE derived Land Cover Components: Option 1: 5 classes 1. water, 2. permanent vegetation, 3. transient vegetation, 4. bare ground / sealed surface, 5. snow / ice 35 P a g e

37 Option 2: 9 classes 1. Sealed surface (buildings and flat sealed surfaces) 2. Woody coniferous 3. Woody broadleaved 4. permanent herbaceous (i.e. grasslands) 5. Periodically herbaceous (i.e. arable land) 6. Permanent bare soil 7. Non-vegetated bare surfaces (i.e. rock and screes, mineral extraction sites) 8. Water surfaces 9. Snow & ice Remark to class woody: Member countries have expressed the wish to further differentiate the class woody into trees and shrubs. This differentiation would be useful but can currently not be derived with sufficient accuracy from spectral attributes alone. A robust differentiation is only possible by using a normalized difference surface model (ndsm). In the CLC-Backbone nomenclature code list no land use terminology has been used. This is on purpose to avoid semantic confusions between land cover and land use. From a purist s view point only land cover can be derived from remote sensing data. However, due to the multi-temporal observations a number of land use and land management characteristics can be derived as well. Nevertheless, for land cover semantic terms like grassland or arable land have to be avoided. We therefore suggest to characterize landscape objects (Level 2 polygons) according to the classes that are characterized by land cover definitions only. This is especially true for agricultural areas, where a differentiation of arable and grassland areas is principally a key element. What can be observed using RS images is however only if within one year a specific agricultural parcel is ploughed; as ploughing means a reduction of biomass and the occurrence of bare soil. Therefore, ploughed parcels can be differentiated from non-ploughed parcels as the bare soil count is a good indication for the separation of these two classes: permanent herbaceous o Permanent herbaceous areas are characterized by a continuous vegetation cover throughout a year. No bare soil occurs within a year. These areas are either unmanaged or extensively managed natural grasslands or permanently managed grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or even set-aside land in agriculture. For managed grasslands, the biomass will vary over the year, depending on the number of mowing (grassland cuts) or grazing events. periodically herbaceous o Periodically herbaceous areas are characterized by at least one land cover change (in the sense of EAGLE land cover components) between bare soil and herbaceous vegetation within one year. Depending on the management intensities these areas can have up to several changes between these two EAGLE land cover components within a year. Normally these areas are managed as arable areas. 36 P a g e

38 A permanent and managed grassland as defined in IACS/LPIS may be subject to be ploughed every 3-5 years for amelioration purposes followed by an artificial seeding phase a renewal of vegetation cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6 years. This has to be considered, when evaluating shorter time phases, as by occasion a managed grassland may just be subject to renewal within the observation period of 1 year. The differentiation from arable areas with much higher frequencies of ploughing events (either more than once a year, but on minimum 2 times within 3 years). All kind of complex classes either as mixture of existing land cover classes (e.g. CLC-mixed classes) or complex classes in the sense of ecological highly diverse classes (e.g. wetlands) are avoided by the classification, as it solely concentrates on the classification according to the surface structure and properties. A detailed technical description how to model the temporal changes is given in Annex 1. Option to be discussed: In addition to the class labels the following spectral attributes are attached to the landscape objects (in the data model: parametric observations): Vegetation trajectories o NDVI profiles (e.g. weekly observations) within the vegetation season Aggregated indicators for vegetation indices (within vegetation season) o Median of NDVI o 20 % percentile o 80% percentile Further indicators tbd. o Band specific indicators o Time specific indicators o Texture indicators Important note: As the main emphasis is on the geometric delineation of landscape objects neighbouring objects may have the same land cover code. They should be kept as unique entities, as they can be differentiated in specific attributes that may be added in a later stage to this polygon (e.g. broadleaved trees vs. needle-leaved coniferous trees). Apart from the technical specifications for the foreseen tender for CLC-Backbone additional attribution by various data sources is an option for future enrichment of classes. First of all, member countries can be asked to populate the resulting geometries with national data following the prescribed classification system or even more detailed EAGLE land cover components. In addition, crowd-sourced data may be used to characterize the land use within these polygons (either by insitu point observations or by wall-to-wall datasets e.g. OSM land use). 4.8 Proposal for production methodology ology Within this consultation version (Nov. 2017) of the CLC-Backbone specifications an outline of a possible production methodology is included to illuminate the processing that is envisaged to generate a detailed geometric structure as a geometric vector skeleton which is subsequently described according to a simple land cover classification (approx. 10 classes) on a per-pixel level. 37 P a g e

39 The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step (hardbone) boundaries of relatively stable areas are derived based on existing (European wide or comparable national) data sets. In a second step image segmentation techniques based on multitemporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains an attribution of the landscape objects derived from steps 1 and 2 based on pixel-based pure land cover classification (EAGLE land cover components). The term geometric skeleton is used to stress that the added value of CLC-Backbone is the breakdown of the landscape into homogenous entities. These entities represent zones (delineated areas) that can be attributed using spectral indices and classified according to basic land cover classes. The basic land cover classes are identical with a spatial aggregation and temporal sequence of EAGLE land cover components Step 1: Definition and Selection of persistent boundaries As various geospatial data are integrated their order a hierarchy is of major importance. In chapter the hierarchy of input data is defined that may be adapted to improve the quality of the results. OSM data delivers in addition to the centreline of roads and railways, a polygon layer expressing the land use. As the land use is labelled using a free text assignment (actually any text can be attributed to the polygons) the land use is not modelled using a controlled vocabulary. However, the project OSM-land use ( has shown that it is possible to transform the land use OSM labels into CORINE Land Cover nomenclature (see Annex 1). For roads, the land use labels roads and traffic and for railways the label railway were used to define the transportation areas (land use class). Tunnels are identified with a special tag in OSM and have to be EXCLUDED from the analysis. Woody vegetation and settlements are as well rather persistent landscape features. Their outer boundary can be initiated from HRL forest types and the Local component products (boundary taken from aggregation of polygons on Level 1 of nomenclature). The geometric outline of settlements can be initiated using a density threshold (e.g. >5%) of the HRL imperviousness (refined with European Settlement Mask ESM) combined with a buffer and shrink gap filling algorithm (e.g. 50m) to reduce the typical salt and pepper effect especially in settlements due to smaller green garden areas within built-up areas. The commission errors in the HRL imp and ESM are relatively low, but omission errors (settlements that are not detected) occur occasionally. The settlement outline border will be amended using the OSM building vector layer and the OSM land use layer (residential and industrial). As the built-up area is always larger than the area occupied by buildings, a buffer of 10m can be applied to derive from single OSM buildings a first built-up outline. The same buffer and shrink operation (using 40m buffer) can be applied to this built-up outline. All urban data sets together will form a realistic outer boundary of settlements. The HRL imperviousness has to be refined in this step, as it also includes sealed areas OUTSIDE of settlements (road infrastructure). The combination of the two different settlement layers (HRL and ESM, OSM) should reveal the most realistic 8 outer boundary of settlements. The inner differentiations of settlements are derived (1) through the linear networks (transportation and rivers) and (2) through the subsequent image segmentation. 8 But not perfect outer boundary post-processing steps will be needed to refine these boundaries. 38 P a g e

40 The urban outline can be automatically checked for omissions (entirely missing villages or larger parts of villages) using the Global Urban Footprint + (GUF+) raster data set on sealed areas from DLR. Data access conditions to GUF+ are considered as free and open. For the delineation of the forest outline outside the local components the HRL forest types layer will be used as it applies already a specific forest definition with a density threshold of >30% and a MMU of 0,5 ha. The outcome of this first step shall be a coarse but robust delineation of landscape objects, including the a-priori information of the kind of land cover according to the integrated datasets, which can be used as further indication for the subsequent classification of smaller landscape objects. Any potential error (systematic or random) that is introduced by the method ( burning in from ancillary data) can be corrected in the subsequent Step 2, given that the error is larger than the MMU and is detectable from S-2 images. The final delineated objects have to be cleaned according to the mandatory MMU of 0.5 ha. All Level 1 landscape objects smaller than the MMU have to be merged with the neighbouring polygons, using the method the longest common border. In Annex 2 an illustration of the working procedure is provided. A look-and-feel illustration of the final geometric delineation in step 1 (colour coded according to the source data) is provided Figure 4-6. Figure 4-6: illustration of results of step 1 delineation of Level 1 landscape objects. The border defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data was available (not shaded). 39 P a g e

41 A cleaning of endpoints is the last step in the process. All lines with open endpoints have to be removed from the final geometric skeleton Step 2: Image segmentation A second landscape object level (Level 2) represents the soft bones of landscape and is based on the automatic segmentation of available EO data. These second level objects subdivide the first level landscape objects into spectrally and/or texturally homogeneous subunits according to the predefined minimum mapping unit of 0.5 ha. For this second level assistance and available data from MS can be foreseen in the long term and for a short-term solutions the image segmentation derived directly from Sentinel-2 HR data can provide a first and intermediate solution. The segmentation based on spectral information from multi-temporal EO-imagery to cover spatial differences caused by seasonal variation will be used to subdivide the area into spectrally and/or texturally homogeneous and stable spatial units, within the constraint set by a pre-defined MMU and the hard bones framework (Level 1 landscape objects). The resulting segments still represent pixel-based borders and will be vectorised and integrated with the hard bones to produce a polygon map of spectrally homogeneous areas. There is no classification involved at this stage only segmentation. The Level 2 landscape objects will be used in step 3 as input for the labelling and description of these polygons. 40 P a g e

42 5 CLC-CORE THE GRID APPROACH The CLC-Core product should be seen as an underpinning semantically and geometrically harmonized data container and data modeller that holds, or allows referencing of, land monitoring information from different sources in a grid-based information system. The grid database will essentially store geospatial and thematic data sources, such as CLC-Backbone, CLC, Local Component data, HRLs, in-situ as well as Member State provided information. The contents of the grid database can be exploited directly within grid-based analyses or alternatively subsets of the grid database can be extracted and analysed by spatial queries driven by objects from an external vector dataset. In this way, the CLC-Core can support the other elements of the 2 nd generation CLC, for instance, a CLC-Backbone object could gain additional thematic information by aggregating or analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next chapter). 5.1 Background Textbooks usually differentiate between raster GIS and vector GIS (Couclelis , Congalton ). The vector GIS does, like the paper map, attempt to represent individual objects that can be identified, measured, classified and digitized. The raster GIS attempts to represent continuous spatial fields approximated by a partition of the land into small spatial units with uniform size and shape. The raster GIS therefore have structural properties that make them particularly suitable for use in modelling exercises. A grid is a spatial model falling somewhere between the vector model and the raster model. The grid looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is characterized by how much it contains of each land cover class or other information. The grid is, as a result, a more information-rich data model than the raster. The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units of e.g. in this case 1 X 1 km. The result can either be a raster where a single CLC class is assigned to each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class found at the centre of the pixel unit). Alternatively, the result is a grid where an attribute vector representing the complete composition of land cover classes inside the unit is assigned to each grid cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases lost, but the overall loss of information is less in the grid than in the raster. The loss of information when using a grid structure can be further reduced if the grid size is selected to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha (50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products derived from the grid-based information must also consider grid size in relation to its requirements. 9 Couclelis, H People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 77, Springer Berlin Heidelberg 10 Congalton, R.G Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: P a g e

43 Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). daa is a Norwegian unit: 10 daa = 1 ha. A grid data model is not the result of mapping in any traditional sense. The grid is populated with thematic information from (one or more) existing sources (Figure 5-2). One method frequently used to populate grids with LULC information is to calculate spatial coverage of each LULC class based on a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes can furthermore involve counting or statistical processing of registered data for observations made inside each grid cell (for instance, not only the area of a class, but also the number of objects it represented). Ancillary databases, which provide important and relevant information about the content and context of the grid cells, can also be added. For instance, the number of buildings, length of roads or number of people found within the grid cells. 5.2 Populating the database The grid as a spatial model consists of square spatial units of uniform size and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are geometrically uniform polygons. Each grid cell has a unique ID that is related to a database containing the attribute data. 42 P a g e

44 In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see also chapter 4.4): Local Component data (Urban Atlas, N2000, Riparian Zones) High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness) CLC-Backbone Any other data, such as CLC (25 ha), in-situ data, Upon agreement, Member States are invited to provide also their national information to populate the information system, mainly on land use and agricultural themes. Figure 5-2: Representation of real world data in the CLC-Core. 5.3 Data modelling in CLC-Core Core CLC-Core will be the effective engine to model, process and integrate the individual input data sets to create new, value added information. The modelling will make use of synergies (the same information provided for one cell based on different sources) as well as disagreements (different information provided for one cell by different sources) during the data processing. We would like to refer to the modelling and combination of different layers of the database, including the modelling parameters as an instance of the database. In this understanding, any output of a database modelling process using different input layers and different parameters will create a specific instance. Technically CLC-Core shall: Allow further characterisation (e.g. by additional input data / knowledge, see also chapter Populating the database). Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local component products. Allow the best possible transformation of National datasets into CLC+ (capturing the wealth of local knowledge existing in the countries). 43 P a g e

45 Storing information in a grid structure will also help to overcome the problem of different geometries: as the data are converted to grid cells, differences in vector geometries will have a lesser impact. 5.4 Database implementation approach a for CLC-Core Core The implementation approach for the database of CLC-Core is based on paradigms originating from the semantic web and knowledge engineering. This opens up new possibilities beyond the classical object-relational database management system concept. In the following subsections, we describe the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL) concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we propose strategies to generate CLC-Core products i.e. flat shape-files or GeoJSON files based on a spatio-temporal Triple store Database concepts revisited: from Relational to NoSQL and Triple Stores The contemporary object-relational database model relies on the work of Codd (1970) 11. Although object-relational database management systems (ORDBMS) were developed further since the 1970s, their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the Semantic Web the requirements for databases have changed drastically, which led to the development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012) 12,13. ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational model defines how data are stored, retrieved and altered within databases base on n-ary relations on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an environment reasoning is done using a two-valued predicate logic. Data, stored utilizing the relational model, are viewed as tables, where each row represents an n-tuple of the relation itself. Each column is labelled with the name of the corresponding set (domain). Operations are performed using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are designed to adhere to the ACID principles atomicity, consistency, isolation and durability which are defined by Gray (1981) 14. ACID principles ensure that database transactions are performed in a reliable way. In contemporary ORDBMSs consistency is the central issue limiting performance and the ability to scale horizontally which is crucial for Web 2.0 or Big Data requirements. In addition, ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In 11 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13 (6): 377. doi: / Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 13 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 14 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International Conference on Very Large Databases, pp , Cannes, France. 44 P a g e

46 addition, commercial and open-source ORDBMSs are mature products and offer a variety of options and extensions. The term NoSQL database emerged in 2009 (Evans, 2009) 15, as a name for a meeting of database specialists on distributed structured data storage in San Francisco on June 11, Since then the term has grown into a widely known umbrella term for a number of different database concepts (Friedland et al., 2011). NoSQL databases have the following characteristics in common: - Non-relational data model - absence of ACID principles (especially consistency defined as BASE [eventual consistency and basically available, soft state eventually consistent]) - flexible schema - simple API s (no join operations) - simple replication approach - tailored towards distributed and horizontal scalability NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require the handling large data volumes with replication. In addition, NoSQL databases are designed to support horizontal scalability, which is necessary for handling big data with high turnover rates 16. Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo, Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011) 17, Sadalage & Fowler (2012)): - Column databases: have tables, rows and columns, but the column names and their format can change i.e. each row in a table can be different (Khoshafian et al., ; Abadi, ). Generally speaking, a wide column store can be regarded as 2D key-value store. Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google s Bigtable - Key-value databases: store data in the form of a key and an associated value (similar to a hash), and strictly no relations. Key value stores show exceptional performance in terms of horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g. OrientDB, Dynamo (Amazon), Redis, or Berkeley DB. - Document databases: rely on the document metaphor that implies that the data/information is contained in documents that share a given format or encoding. Thus, encodings like XML or JSON are used to represent documents. Each document can be schema free, which is a key advantage for Web 2.0 applications. Examples of document databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino. 15 Evans, E. (2017): NOSQL Url: visited: Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 17 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M. Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany: pp KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy for the decomposed storage model. In: ICDE, pp Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Url: visited: Nov. 08, P a g e

47 - Graph databases: utilize the notion of a mathematical graph structure for database purposes (Robinson et al., 2015) 20. A graph contains of nodes and edges that connect nodes. In a graph database, the data/information is stored in nodes which are connected via relationships, represented by edges. The relations can be annotated in a way that a graph database may contain semantic information. The concept allows modelling of real-world phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for populations. Graph databases are capable of storing semantics and ontologies especially in the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015) 21. They became very popular due to their suitability for social networking, and are used in systems like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013) Multi-modal: such databases are capable of supporting multiple NoSQL models/types. Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB. Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al., 2009, Heath & Bizer, 2011) 23,24. Originating from a web of documents, Bizer et al. (2009) describe an approach to develop a web of data (i.e. the data driven web) by publishing structured data such that data from different sources can be interlinked with typed links. Additionally, they formulate four characteristics of linked data: - They are published in machine-readable form - They are published in a way that their meaning is explicitly defined - They are linked to other datasets - They can be linked from other data sets In order to technically fulfil these characteristics, Burners-Lee (2009) 25 established a number of Linked Data principles: - Uniform Resource Identifiers (URI s) to denote things - HTTP URI s shall be used that things can be referred to and dereferenced - W3C standards like Resource Description Framework (RDF) should be used to provide information (Subject Predicate Object) - Data about anything should link out to other data, can contribute to the Linked Data Cloud RDF is a family of W3C specifications that was originally designed as metadata model. Over the years it has evolved to the general method to describe resources. RDF is based on the notion of making statements about resources which follow the form: subject predicate object, known as triple. Here the subject denotes the resource, the predicate represents aspects of the resource, and therefore is regarded as an expression of the relationship between the subject and object (see Figure 5-3). 20 Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data, O'Reilly Media Inc. 21 Lampoltshammer, T. J. & Wiegand, S. (2015). Improving the Computational Performance of Ontology-Based Classification Using Graph Databases Remote Sensing, vol. 7(7): Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern Association for Information Systems Conference, Atlanta, vol. 2324, pp Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability and web applications: emerging concepts, Heath, T. and Bizer, C Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1), Berners-Lee, T Linked Data - Design Issues. Online: visited: Nov. 09, P a g e

48 Figure 5-3: Example of an RDF triple (subject - predicate - object). To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema (RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage called Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples through semantic queries. Triplestores are specific graph databases, document oriented stores, or specific relational data stores that are designed for RDF data. A high number of available Triplestores is able to store spatial and temporal amended data. The following table gives an overview of the available triplestores (both open-source and commercial products): Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. Product name Supports spatial data Supports temporal data Apache Jena Yes No Apache Marmotta Yes Yes OpenLink Virtuoso Yes Yes Oracle Yes Yes Parliament Yes Yes Sesame/RDF4J Yes No AllegroGraph No Yes Strabon Yes Yes (stsparql) GraphDB Yes No To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows to write queries that can use quantitative (typically proximity based) relations as well as qualitative relations (spatial on and temporal after, from ). Spatial and temporal extensions of SPARQL (such as stsparql and GeoSPARQL) supporting qualitative relations have been proposed but expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014) 26. Nevertheless, spatial queries, using GeoSPARQL can look as depicted in Error! Reference source not found.. In Error! Reference source not found., the GeoSPARQL query list airports near the city of 26 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In Temporal Representation and Reasoning (TIME), st International Symposium on (pp ). IEEE. 47 P a g e

49 London is depicted. The result of the query is shown in Figure 5-5, the GeoSPARQL query list airports near the city of London is depicted. The result of the query is shown in Figure 5-4 the GeoSPARQL query list airports near the city of London is depicted. The result of the query is shown in a map metaphor and as JSON (Figure 5-5). One major feature of triplestores and SPARQL is the possibility to create federated queries. A federated query is a single query that is distributed to several SPARQL endpoints (capable of accepting SPARQL queries and returning results), that individually compute results. Finally, the originating SPARQL endpoint gathers the individual results, compiles them in one single output, and returns the output accordingly. This allows to query a large number of different data storages that can be geographically dispersed (e.g. over Europe). PREFIX gn: < PREFIX xsd: < PREFIX spatial: < PREFIX geo: < SELECT?link?name?lat?lon WHERE {?link spatial:nearby( 'km').?link gn:name?name.?link gn:featurecode gn:s.airp.?link geo:lat?lat.?link geo:long?lon } Figure 5-4: GeoSPARQL query for Airports near the City of London. Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. 48 P a g e

50 5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach The approach followed for the CLC-Core database can be based on a triplestore approach. This approach could exploit the advantages of NoSQL databases (especially triplestores) while overcoming the drawbacks of contemporary ORDBMS especially addressing horizontal scalability and consistency. Hence, we propose distributed triplestore approach, where each member country of the European Union could host their own CLC-core data in an RDF format. Each member country could host a triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and replication issues that might arise with contemporary ORDBMS. Hence, each member country can host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the member countries can be connected via federated queries, which requires only one SPARQL query at an endpoint at hand (i.e. at exactly one member country s CLC-endpoint) see Figure 5-6. If there are member countries that are not sure if they should operate a triplestore (and a SPARQL endpoint) with the CLC-Core data, they can also as another country to host their data on their infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria could host the CLC+ data of another country if they e.g. don t the resources to develop the infrastructure for hosting and publishing the data. SPARQL query Federated SPARQL queries Flow of result data Germany: CLC+ SPARQL France: CLC+ SPARQL Austria: CLC+ SPARQL Italy: CLC+ SPARQL Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint. Inevitable for such an approach is the development of an underlying shared semantical model i.e. an ontology. An abstract semantical model in RDFS or OWL can be developed that describes the abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This allows the utilization of the abstract semantic model in queries (semantic reasoning based on the developed knowledge basis). An example for semantic reasoning could be the determination of the land cover type, based on a number of properties of grid cells. 49 P a g e

51 5.4.3 Processing and Publishing of CLC-Core Products To generate CLC+ products based on the distributed triplestore architecture, we need to define the products requested by the customers. Based on the defined products we need to develop a migration/transition strategy describing the data flow from triplestores to the desired product. We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc queries and analyses, whereas customers would prefer a single flat file containing the CLC+ data. Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an automatic way by using a spatial Extract Transform an Load (ETL) tool like SAFE FME 27, geokettle 28 (open source), GDAL/OGR 29 (open source). The migration can be performed with e.g. different geographical or temporal coverage. These files can be hosted and published on a central server, where they are available for the public (see Error! Reference source not found.). Triple Store #1 SHAPE File France SHAPE File Germany GeoJSON France GeoJSON Germany Triple Store #2 Spatial ETL Tool SHAPE File Austria SHAPE File Finland GeoJSON Austria GeoJSON Finland Triple Store #n Web Server (public) Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture Conclusion and critical remarks Given the intended fine spatial and temporal resolution of CLC+, which will result in high data volumes with high turnover rates (velocity) which are two characteristics of Big Data 30. Thus, it seems advisable to adhere to the nature of these datasets when planning a sustainable architecture for CLC+. As contemporary ORDBMS are not designed for Big Data applications, which manifests in their weaknesses regarding horizontal scaling, consistency and replication, we advise to have a deeper look into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to cope with the requirements of CLC+, with respect to data volume, data turnover rate, distributed architecture and semantic queries (and reasoning). Although, NoSQL datastores are widely used in business applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical foundation is still under active development. Hence, NoSQL databases do not reach the technological maturity of contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a weakness and an opportunity. By using NoSQL databases, this CLC+ is on the edge of the technological developments and is able to contribute to the future of NoSQL databases and to strengthen the spatio-temporal abilities of them Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy Review.".martinhilbert.net. visited: Nov. 08, P a g e

52 6 CLC+ - THE LONG-TERM VISION CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2 nd generation CLC to guarantee backwards compatibility with the original CLC datasets. CLC+ shall be understood as one instance (i.e. realisation) of the CLC-Core (possibly within the context of CLC-Backbone), enabling it to address various reporting obligations, such as the EU Energy Union, Paris Agreement, MAES and LULUCF to name just some examples. Many other land monitoring products with various nomenclatures and spatial frameworks may also be defined on the basis of CLC-Core. Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m), aggregated from CLC-Core. Due to the higher geometric resolution, the use of some of the often debated heterogeneous classes, like (complex cultivation pattern) and (land principally occupied by agriculture, with significant areas of natural vegetation) will be significantly reduced, if not disappear. As the original CLC nomenclature is more than a pure land cover classification, there will be the need to make use also of other information than currently provided by the Copernicus Land Monitoring Service. Table 6-1 provides an overview of the 44 CLC classes and which type of information would still be needed to derive a specific CLC class in addition to the information provided by the Local Components, the High Resolution Layers and the land cover information collected by CLC-Backbone. The Hungarian test case 31 for implementing the EAGLE concept in a bottom up CLC production has clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input data on: Land Use (critical); Crops and use intensity (e.g. from LPIS). Obviously, the situation is different for each CLC class and also dependent on the available Copernicus data (i.e. UA, RZ or N2K), but Table 6-1 clearly shows that almost no CLC class can be derived without additional information on the use aspect of the area. In almost all cases this information gap can be closed with data from Member States. In case MS data is not available, a few classes can be approximated via European level information, but most certainly on the expense of data accuracy and precision. The national information should be provided as input to CLC-Core. In case of missing national land use information as well as no other European-level replacement data, traditional CLC data will be used as back-up solution. Areas with obvious contractions in land cover (from CLC-Backbone) vs. traditional OBS classes will need to undergo special post-processing during the phase of creation CLC+. 31 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of on the European Environment Agency and the European Environment Information and Observation Network. Final report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country 51 P a g e

53 Table 6-1: List of CLC classes and requirements for external information 52 P a g e

54 The update cycle of CLC+ is proposed to be consistent with the update cycles of the major input data sets (i.e. the 3 year cycle of the LoCo or the 3-6 year cycle of CLC-Backbone). Apart from that, as CLC+ is just an instance of CLC-Core, it can be derived dynamically upon request. 53 P a g e

55 7 CLC-LEGACY The history of European land monitoring is a story of permanent evolution of approaches and concepts, resulting in different solutions for individual needs and specifications. In this context, the CORINE Land Cover (CLC) specifications provided the first set of European-wide accepted de-facto standards, i.e. a nomenclature of land cover classes, a geometric specification, an approach for land cover change mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC represents a unique European-wide, if not global, consensus on land monitoring, supported by 39 countries. Being the thematically most detailed wall to wall European LC/LU dataset and representing a timeseries started with the early 1990 years CLC data became a basis of several analyses and indicator developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that the planned new European land monitoring system (CLC-core & CLC+) has to be able to ensure a possible maximum of backwards compatibility with standard CLC data, i.e. CLC-Legacy. On the other hand, comparing the different CLC production methods, i.e. traditional visual photointerpretation vs. national bottom-up solutions, shows that those specific methodologies result in CLC data with slightly different characteristics in terms of LC/LU statistics or fine geometric detail, even though the general CLC specifications are kept. By a consequence, it should be specified in what sense the outputs of the new land monitoring system may and should be consistent with CLClegacy. Although original CLC vector data are still used mainly for cartographic purposes, the most commonly used version of CLC data are raster CLC status and change layers. Moreover, for the purposes of LEAC analysis CLC information have been transformed to the form of CLC Accounting Layers. Specific characteristics and the variety of the CLC creation and update methodologies have resulted, that while CLC change data are well representing land cover changes between two certain reference dates, there are significant "breaks" in the continuity of CLC time-series both in statistical and in geometric sense. The solution applied for the harmonization of CLC time-series is based on the idea to combine CLC status and change information to create a homogenous quality time series of CLC / CLC-change layers for account purposes fulfilling the relation: CLC change = CLC accounting new status CLC accounting old status Due to the fact that the Minimum Mapping Unit (MMU) is different for CLC status (25 ha) and CLC change (5 ha) layers, resulting CLC accounting layers may include at several locations more spatial detail, than original CLC status layers. Although this fact contradicts to original CLC specifications, still CLC-accounting layers are considered as the most consistent harmonized time-series of CLC data. CLC accounting layers are created always on the basis of the latest raster CLC status layer. Fine details are added to this layer from the formation codes of previous CLC-change inventories. All the former CLC accounting status layers are calculated backwards by overlaying CLC-change consumption codes. 54 P a g e

56 The backwards compatibility of CLC+ with CLC-legacy is foreseen to be ensured in a similar way. This method is already applied in reality, since some of the member states are producing CLC layers in a bottom-up way and these data are then combined with manually interpreted CLC-changes. The applied methods of bottom-up CLC creation are different, but variations of the basic idea CLC+ relies on. In the sense of the above, the aim of CLC-Legacy is NOT to reproduce as much as possible original 25 ha MMU vector CLC as it would be if created continuously by visual photo-interpretation. Instead of this, CLC-legacy will be created by considering the following aspects: CLC-Legacy is to be created as a 100m resolution raster layer, representing the CLC (like) status at a certain reference date. CLC-Legacy will be derived from CLC-Core (and potentially CLC+) in a way, which ensures homogenous and consistent content and quality across Europe. Note, several CLC layers are already derived based on similar principles in countries using bottom-up methods; all the outputs correspond to basic CLC specifications, but have slightly different characteristics in terms of geometry and statistics. Instead of 25 ha MMU vector CLC, CLC accounting layers are considered as reference layers concerning targeted content: o Applying standardized raster generalization on CLC-core outputs to reach 5ha MMU (instead of 25ha), corresponding to CLC-change layers. o Less significance of complex CLC classes, like 243. Standard methodology is to be developed to reproduce relevant information contained by some other complex CLC classes, like 244. o CLC status layers of past reference dates (2012, 2006, 2000,..) are to be created by combining CLC-legacy with previous CLC-change data. The above concept assumes, that in the future, besides of the update of CLC-core and CLC+ status, CLC compatible change information is to be derived as well. Obviously, there are several methodological and technical details to be clarified and solved during upcoming practical test exercises. 7.1 Experiences to be considered The following experiences will be considered for establishing the grounds for creating CLC-Legacy: CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC) system. The methodology was developed by ETC/ULS to create these layers, as well as for the raster generalization of CLC accounting layers 32. Methodologies for bottom-up creation of CLC data have been developed by several CLC national teams. This includes spatial and thematic aggregation of in-situ based high resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain). In most cases, rules were developed to allow thematic conversion in an automated or semiautomated fashion. For spatial generalisation different techniques are available, depending on the starting point in each of the countries. Figure 7-1 to Figure 7-3 illustrate different techniques and results of national generalisation approaches to produce standard CLC from more detailed data. 32 Generalization of adjusted CLC layers. ETC/ULS report P a g e

57 Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data. Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III Figure 7-2: Generalization levels used in LISA generalization in Austria Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany Specific example During the transition phase of traditional CLC to CLC-Legacy special attention needs to be paid to the handling of geometries and resulting geometric differences vs. changes. Error! Reference source not found.error! Reference source not found. shows the results from Germany where the process changed from mapping to a bottom-up approach using existing national data in While on the left side of the river there are very little changes (located in Luxembourg), the German side shows clearly some differences due to the changed approach. The CLC change layer is then used to differentiate technical changes from real changes. This step is needed once to make this transition. Future updates using the bottom-up approach will be consistent with the 2012 situation. shows the results from Germany where the process changed from mapping to a bottom-up approach using existing national data in While on the left side of the river there are very little changes (located in Luxembourg), the German side shows clearly some differences due to the changed approach. The CLC change layer is then used to differentiate technical changes from real changes. This step is needed once to make this transition. Future updates using the bottom-up approach will be consistent with the 2012 situation. shows the results from Germany where the process changed 56 P a g e

58 from mapping to a bottom-up approach using existing national data in While on the left side of the river there are very little changes (located in Luxembourg), the German side shows clearly some differences due to the changed approach. The CLC change layer is then used to differentiate technical changes from real changes. This step is needed once to make this transition. Future updates using the bottom-up approach will be consistent with the 2012 situation. shows the results from Germany where the process changed from mapping to a bottom-up approach using existing national data in While on the left side of the river there are very little changes (located in Luxembourg), the German side shows clearly some differences due to the changed approach. The CLC change layer is then used to differentiate technical changes from real changes. This step is needed once to make this transition. Future updates using the bottom-up approach will be consistent with the 2012 situation. CLC 2012 CLC 2006 Figure 7-4: Result of the generalisation process in Germany with technical changes in different classes 57 P a g e

59 8 TECHNICAL SPECIFICATIONS 8.1 CLC-Backbone CLC-Backbone [Example image if available.] Description CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format providing a geometric backbone with limited, but robust thematic detail on which to build other products. Input data sources (EO & ancillary) EO Overview Spatial resolution m Spectral content Visible, NIR, SWIR, SAR Temporal revisit Bi-Monthly to achieve monthly cloud free acquisitions Acquisition approach likely to be multi-mission Candidate missions Sentinel-2 (Landsat 8). Contributing data Sentinel-1 Ancillary data OSM roads and railways (or comparable national datasets) EU Hydro (or WISE WFD) European coastline OSM buildings OSM land use HRL imperviousness (refined with ESM) LoCo Riparian Zones LoCo Urban Atlas 58 P a g e

60 Thematic Content EAGLE Land Cover Components (Option 2 9 classes): Sealed Woody vegetation - coniferous Woody vegetation - broadleaved permanent herbaceous (i.e. grasslands) Periodically herbaceous (i.e. arable land) Permanent bare soil Non-vegetated bare surfaces (i.e. rock and screes) Water surfaces Snow & ice Methodology The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step (hardbone) boundaries of relatively stable larger areas are detected based on existing data sets. In a second step image segmentation techniques based on multi-temporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains a classification and spectral-temporal attribution (Sentinel 1 and 2) of the landscape objects derived from steps 1 and 2 using a pixel-based classification of the Land Cover components. The pixel based classification is generalized on object level using indicators like dominating land cover class and percentual mixture of land cover classes. Geometric resolution (Scale) ~1: Geographic projection / Reference system European Terrestrial Reference System 1989 (ETRS89) Coverage EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia, Malta, Montenegro, the Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million km 2. Geometric accuracy (positioning accuracy) Less than half a pixel based on 10 m spatial resolution input data Spatial resolution / Minimum Mapping Unit (MMU) 0.5 to 5 ha for final product Min. Width of linear features m Topological correctness [TBD] 59 P a g e

61 Raster coding [n/a] Thematic accuracy (in %) / quality method The target will be set at 90 %. A quantitative approach will be used based on a set of stratified random point samples compared to external datasets (e.g. GoogleEarth, national orthophotos or national grassland datasets). Target / reference year 2018 (full vegetation season, starting partly in autumn 2017 in southern Europe) Up-date frequency [TBD] Information actuality Plus and minus 1 year from target year ( ). A longer time window may be required for the ancillary datasets. Delivery format Vector Data type Vector Naming conventions [TBD] Medium [TBD] Delivery reliability [TBD] Delivery time [TBD] Archive [TBD] 60 P a g e

62 8.2 CLC-Core Core CLC-Core [Example image if available.] Description CLC-Core is a consistent, multi-use grid database repository for environmental information populated with a broad range of land cover, land use and ancillary data, forming the information engine to deliver and support tailored thematic information requirements. Input data sources CLMS products Corine Land Cover, CLC1990, CLC2000, CLC2006, CLC2012 High Resolution Layers (Imperviousness, Forest, Grassland, Wetland & Water) Reference datasets (EU-DEM, EU-Hydro) Related pan-european products (ESM) Local components (Urban Atlas, Riparian Zones, Natura 2000, Coastal) Ancillary data Open Street Map Member State data, particularly land use information Thematic Content Input data sources stored within a version of the EAGLE data model. Methodology TBD, see chapter 5.4 Geometric resolution (Scale) TBD Geographic projection / Reference system European Terrestrial Reference System 1989 (ETRS89) Coverage EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia, Malta, Montenegro, the Netherlands, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million km P a g e

63 Geometric accuracy (positioning accuracy) Less than half a pixel based on 10 m spatial resolution input data Spatial resolution / Minimum Mapping Unit (MMU) 0.5 ha to 1 km 2 for final product Min. Width of linear features Grid size Topological correctness Grid spatial structure Raster coding [n/a] Thematic accuracy (in %) / quality method TBD Target / reference year 2018 Up-date frequency [TBD] Information actuality Plus and minus 1 year from target year ( ). A longer time window may be required for the ancillary datasets. Delivery format Grid Data type Grid Naming conventions [TBD] Medium [TBD] Delivery reliability [TBD] Delivery time [TBD] Archive [TBD] 62 P a g e

64 9 ANNEX 1: OSM DATA 9.1 Crosswalk between OSM land use tags and CLC nomenclature Level 2 Source: Schulz et al (2017): Open land cover from OpenStreetMap and remote sensing, Int J Appl Earth Obs Geoinformation 63 (2017) P a g e

65 9.2 OSM road nomenclature The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1 delineation. Road types that are subject for visual control as they are large enough to be represented as area feature (>= 10m width) are marked in the second column. In many countries, most roads are tagged as unclassified. Only highway = motorway/motorway_link implies anything about quality. CLC- Potential Key Value Comment Backbone 10m width y 10m highway motorway A restricted access major divided highway, normally with 2 or more running lanes plus emergency hard shoulder. Equivalent to the Freeway, Autobahn, etc.. y highway trunk The most important roads in a country's system that aren't motorways. (Need not necessarily be a divided highway.) y 10m highway primary The next most important roads in a country's system. (Often link larger towns.) y highway secondary The next most important roads in a country's system. (Often link towns.) y highway tertiary The next most important roads in a country's system. (Often link smaller towns and villages) y highway unclassified The least most important through roads in a country's system i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets. (The word 'unclassified' is a historical artefact of the UK road system and does not mean that the classification is unknown; you can use highway=road for that.) y highway residential Roads which serve as an access to housing, without function of connecting settlements. Often lined with housing. y highway service For access roads to, or within an industrial estate, camp site, business park, car park etc. Can be used in conjunction with service=* to indicate the type of usage and with access=* to indicate who can use it and in what circumstances. y Link roads y 10m highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway from/to a motorway or lower class highway. Normally with the same motorway restrictions. y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road from/to a trunk road or lower class highway. y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road from/to a primary road or lower class highway. y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary road from/to a secondary road or lower class highway. y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road from/to a tertiary road or lower class highway. y Special road types y highway track Roads for mostly agricultural or forestry uses. To describe the quality of a track, see tracktype=*. Note: Although tracks are often rough with unpaved surfaces, this tag is not describing the quality of a road but its use. Consequently, if you want to tag a general use road, use one of the general highway values instead of track. 64 P a g e

66 9.3 OSM roads completeness Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh and Millard-Ball (2017). A sigmoid curve was fitted to the cumulative length of contributions within OSM data and used to estimate the saturation level for each country. country ISO-Code frccomplete_fit OSM length [km] Serbia SRB 74,3% Bosnia and Herzegovina BIH 86,5% Turkey TUR 89,7% Sweden SWE 92,7% Norway NOR 93,6% Hungary HUN 93,8% Romania ROU 94,3% Ireland IRL 96,0% Spain ESP 96,2% Liechtenstein LIE 96,3% 359 Croatia HRV 96,8% Bulgaria BGR 96,9% Cyprus CYP 97,0% Albania ALB 97,6% Italy ITA 97,9% Belgium BEL 98,3% Lithuania LTU 98,5% Poland POL 98,5% Iceland ISL 98,6% France FRA 99,9% Denmark DNK 100,3% Estonia EST 100,3% Portugal PRT 100,4% United Kingdom GBR 100,4% Netherlands NLD 100,5% Greece GRC 100,8% Slovakia SVK 101,4% Austria AUT 101,4% Luxembourg LUX 101,5% Germany DEU 101,6% Latvia LVA 101,7% Montenegro MNE 102,2% Macedonia (FYROM) MKD 102,4% Switzerland CHE 103,0% Czech Republic CZE 104,0% Finland FIN 105,3% Kosovo XKO 107,4% Malta MLT 109,8% Slovenia SVN NoData P a g e

67 10 ANNEX 2: LPIS DATA IN EUROPE The land parcel identification system is the geographic information system component of IACS (Integrated agricultural control system). It contains the delineation of the reference areas, ecological focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to different national priorities in the implementation of the CAP the LPIS-data are not homogeneous throughout Europe. Some countries use rather aggregated physical blocks as geometric reference and other countries use single parcel management units as smallest geographic entity within LPIS. Three types of spatial objects can in principal be differentiated within LPIS (Figure 10-1): Physical block Field parcel Single management unit A physical block is defined as the area that is surrounded by permanent non-agricultural objects like roads, rivers, settlement, forest or other features. A physical block normally contains several field management units that may belong to various farmers. A field parcel is owned by a single farmer and is the dedicated management unit for a specific agricultural main category. It contains either grassland or arable land, but never both categories. It may be split into several single sub-management units. A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow). The relation between field parcels and single management units is a 1:m relation, where 1 field management unit can contain 1 or many sub-management units. Remark to the relation with the real estate database: Land properties are recorded in many countries in the real estate database. An elementary part of the real estate database is the cartographic representation of real estate parcel boundaries. These parcels should not be mixed up with the field parcels. The parcels in the real estate database constitutes owner ship rights and are documented as official boundaries. The boundaries in agricultural management units however are delineated to derive correct area figures as base for payments under the CAP. In most cases the parcel boundaries between the real estate database and the LPIS will be the same, but not in all cases. 66 P a g e

68 67 P a g e

69 Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit From year to year the availability of national LPIS data is increasing. An overview of available and downloadable datasets has been created based on a previous ETC-ULS report by Synergise within the Horizon 2020 project LandSense (Figure 10-2). The minimum thematic detail that should be harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3) permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite heterogeneous classes on national level a harmonization is quite challenging, as they are applied quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any level could already provide valuable input given that LPIS is available for the large majority of countries. Unless a critical number of countries with available LPIS data is reached the inclusion of LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based on pure image segmentation. 68 P a g e

70 Figure 10-2: Overview of available GIS datasets LPIS and their thematic content Synergise, project LandSense 69 P a g e

71 11 ANNEX 3: ILLUSTRATION OF STEP 1 HARDBONE GEOMETRY OF CLC- BACKBONE 70 P a g e

72 71 P a g e

73 12 ANNEX 4: CLC-BACKBONE DATA MODEL The complete list of feature types within CLC-Backbone contain: INSPIRE o Land cover dataset o Land cover unit EAGLE o Land cover component Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects). Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover unit can consist of one or more land cover components (either in time and/or space), each of which with a specific lifespan. 72 P a g e

74 INSPIRE main feature types: land cover dataset and land cover unit Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit EAGLE extension to land cover unit and new feature type land cover component Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component 73 P a g e

75 Each land cover unit can contain one or more so called parametric observations. According to INSPIRE these parametric observations are limited to percentage, countable and presence parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI, or vegetation trajectories) of a single date this parameter type is enlarged with the data type Indexparameter. Therefore, the CLC-Backbone data model includes the single date observations as ParameterType (with a specific observationdate and a measurement as real number (0 1 or 0.100). A new CLC- Backbone base model is shown in Figure Figure 12-3: The new CLC-Basis model (based on LISA) The new CLC-Backbone data model includes now three feature types: LandCoverDataset LandCoverUnit LandCoverComponent And two additional datatypes LandCoverUnit o ParameterType Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min, ) ObservationDate: date of the observation (e.g. acquisition date of satellite scene) IndexParameter: Value of e.g. NDVI_max, etc. to be stored here 74 P a g e

76 LandCoverComponent o Occurrence Cover percentage of a single land cover component within the LC Unit o Overlaying 1.if the LC component is overlaying another component (e.g. temporarily covered with water, snow) 0 default-value, if the component is not overlaying o validfrom: The time when the activity complex started to exist in the real world. o validto: The time when the activity complex no longer exists in the real world. Integration of temporal dimension Integration of temporal dimension into LISA As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for the adaptation of the model is that the land cover unit establishes the more or less stable geometry over time (>1 year continuity). Within the land cover unit several land cover components can exist with a specific life time. The land cover unit can be considered as stable container for these changing land cover components. Types of changes: Changes within a land cover unit is in general regarded as status change. We consider changes within the land cover components as cyclic changes. Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that will be defined within the products nr. 5. The implementation of the temporal dynamics is realized using a 1:many relation between the LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the specific component. Example A typical agricultural field the land cover unit could have the following LC components: Bare soil January March Herbaceous vegetation April-June Bare soil 2 nd half of June Herbaceaous vegetation July-September Bare soil 2 nd half of September Herbaceous vegetation October-December A web-based application within the GSE Cadaster Environment project (financed by ESA) is available under: to demonstrate the above-mentioned concept (Figure 12-4). 75 P a g e

77 Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, Geoville 2017) Important note: It is not foreseen in the technical specifications that each temporal LC component is stored in the underlying CLC Base production database. The production chain may just provide the final classification of the LC unit in the final CLC-Backbone product, without giving information on the appearance of land cover components throughout Integration of multi-temporal observations into LISA Spectral profiles over time constitute a valuable information source (Figure 12-5), not only for the automated classification, but as well for the visual interpretation of object characteristic. For each CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.) can be stored in the data model using the parametricobservation. 76 P a g e