GENERALISATION: THE GAP BETWEEN RESEARCH AND PRACTICE

GENERALISATION: THE GAP BETWEEN RESEARCH AND PRACTICE J.E. Stoter ITC, International Institute for geo-information science and earth observation P.O. Box 6, 7500 AA Enschede, the Netherlands stoter@itc.nl ABSTRACT This paper reports on the main outcomes of the generalisation workshop that was organised on 31 March and 1 April, at ITC, Enschede, the Netherlands. The workshop was attended by twelve National Mapping Agencies (NMAs). The two main goals of the workshop were: a) to get an overview of the state-of-the-art on generalisation within NMAs and b) to get insight into the topics that still need to be addressed by research to better support generalisation processes in NMAs. These questions will be addressed in this paper. An overview will be given of current generalisation practice in the participating NMAs. In addition a list with research topics is defined that need further attention in order to better serve generalisation practice. 1. INTRODUCTION Generalisation has been a popular research topic since the start of geo-science. This has yielded interesting results. But what is the state-of-the art of generalisation in practice? Do National Mapping Agencies (NMAs) profit from research results and which topics need more attention? To answer these questions a generalisation workshop was organised on 31 March and 1 April, 2005 at ITC, Enschede, the Netherlands. The workshop was attended by twelve NMAs: Belgium, Catalonia (Spain), Denmark, France, Great Britain, the Netherlands, Ireland, Sweden, Switzerland and three Bundesländer from Germany: North Rhine Westphalia, Baden-Württemberg and Lower Saxony. In this paper the most important outcomes of the workshop will be described. First the generalisation process within the participating NMAs, including their policies on generalisation, will be summarised (section 2). A more extensive description of current generalisation processes within the participating NMAs can be found in (Stoter, 2005). In section 3 the research topics will be outlined that need further attention in order to better serve practice. This paper will end with concluding remarks (section 4). It should be noted that this paper specifically focuses on datasets that are maintained by NMAs in order to represent topography, i.e. datasets such as cadastral data (in some countries also maintained by NMAs), ortophotos and DTMs will not be considered. 2. GENERALISATION IN PRACTICE All participating NMAs maintain vector datasets at different scales in order to support their production processes. Either one, seamless database is maintained per scale or several databases are maintained for one scale based on (old) map sheets (one database per map sheet). All participating NMAs recognise the importance to reconsider current production processes to introduce automatic generalisation (or at least as automatic as possible). Some NMAs have made more fundamental steps towards automatic generalisation than others (see table 1), as will be seen in this section. The introduction of automatic generalisation can be divided into four phases. These phases will be described in this section, together with the progresses that the different NMAs have made in the specific phase. 2.1 Renewed data models The first phase is the restructuring of existing datasets into data models that meet today s requirements of geo-information, such as data delivery within an SDI (Spatial Data Infrastructure), history management, unique IDs, object oriented datasets, and datasets no longer divided into map sheets. This step has more or less been taken by all participating NMAs. At least for the base datasets new data models were designed. In the future the smaller scales will also be restructured into new data models. 1

! "! # $ $ % &'()* +, $ $ - -.. &'/(* 0 * $ % - % &'&))* &'()* 0 1 $ $ - $ &'&))* 0!" $ $ $ &'()* 0 0!% $ $ % &'()* 0 # % %.. 0 2% $ %.. &'&))* 0 + $ %.. 0 $ %.. 0 3 $ % -. +.%!..%&'(*&'&)*&'/(*4$&'()*&'/()* Table 1: Summary of generalisation practice within NMAs that participated in the generalisation workshop 2.2 Conceptual architecture The second phase is the design of a conceptual architecture in which the (automatic) generalisation process should be implemented. Generalisation is already an important concept in NMAs since the availability of digital products. Smaller scale datasets are already updated based on generalising the updates from the base dataset. The new approach that is followed by all NMAs is to maintain currently available datasets and to convert these into the new data models. After this step, the smaller scales are updated by generalising the updates from the base dataset. This means that generalisation within NMAs still focuses mainly on generalisation of updates. In some countries the smaller scale datasets are newer than the base dataset due to different update cycles. In these cases the smaller scales are independently updated from the base scale dataset (1:100k dataset in Denmark, 1:50k dataset in Belgium, 1:100k dataset in the Netherlands). If the specific dataset did not yet exist (1:50k dataset in Germany and Denmark, 1:25k database in Catalonia, 1:100k dataset in France) a first edition has been generated once by generalising a large scale dataset, after which only the updates are generalised. Dynamic generalisation (in which a smaller scale dataset is produced dynamically on demand) is not considered as realistic for the nearby future due to the interaction that is expected to be still needed (see section 2.3). Also generalisation that leads to datasets at a wide variety of scales instead of datasets at predefined standard scales is not feasible for the nearby future. An important decision concerning the conceptual architecture is whether to follow the ladder or star approach (see figure 1a and 1b; EuroGeographics, 2005). In the ladder approach (followed by Belgium and Germany) the (updates of) smaller scales are derived from (the updates from) a large scale dataset in steps 2

(scale by scale). The alternative is the star approach in which every small scale dataset is generalised from the same base dataset. Denmark, France, Switzerland and Catalonia have chosen for a mixture of both. In the mixed approach the large to middle-scale datasets are derived from the base dataset while smaller scales are derived from one middle-scale dataset. Great Britain, Ireland, the Netherlands and Sweden have still to decide on the ladder or star approach. Figure 1a: Ladder approach in generalisation Figure 1b: Star approach in generalisation Another important decision is whether to distinguish between model generalisation (aiming at producing a lower resolution database) and cartographic generalisation (aiming at producing readable cartographic output). NMAs such as Denmark, the LGN approach in Germany (see further) and Catalonia argue that also in databases objects should not overlap and should therefore be displaced if necessary. If customers want non-displaced objects they should use a larger scale dataset (with topographical precision instead of cartographical precision). Other NMAs consider the displacements of objects only appropriate when producing readable cartographic output. 2.3 Implementation The third phase is the implementation of automatic generalisation (or as automatic as possible) in production lines. The implementation exists of two main elements: a) automatic propagation of relevant updates and b) the actual generalisation of the updates. The relevance of updates is important since updates are not always meaningful for all scales. For example a geometry change of 5 m is important for scale 1:10k, but is irrelevant for a 1:50k dataset. Objects should only be updated if there is a gain of information. In all participating NMAs specific generalisation operations have been automated. However full automatic generalisation processes do not yet exists. The NMAs of Catalonia, Denmark, Germany, France and Great Britain have made major steps towards automatic generalisation. Although during the workshop it was concluded that also in the new production processes generalisation will still need human interaction due to the complexity of the generalisation processes (see section 3). A wide variety of software is currently used: Laser-Scan (Lamps2/Clarity), CHANGE/PUSH, SICAD/Open, MicroStation, ArcGIS, Safe (FME) and also many self developed algorithms (Belgium, 3

Catalonia, Denmark, France, Great Britain, Sweden, North Rhine Westphalia, Baden-Württemberg and Lower Saxony). Four NMAs are working on bundling their experiences and knowledge on generalisation. IGN Belgium, IGN France, KMS (Denmark) and OSGB (Great Britain) work together in the MAGNET (Mapping Agencies Generalisation NETwork) group, which is a user s group of Clarity (Laser-Scan). The MAGNET group which may be extended with other Clarity-users in the future, meets twice a year aiming at exchanging experiences and algorithms. The progress of the implementation of automatic generalisation within the participating NMAs will be briefly described below. Great Britain In current production in Great Britain most generalisation still needs a lot of human interaction. Automatic generalisation has until now only been implemented in research environment (prototypes). Most focus has been on generalising the 1:50k map from the master map. It is anticipated that the knowledge acquired and possibly the prototypes developed, will be gradually reused to improve the current map production flow lines. Catalonia Catalonia has worked on automatic generalisation since the beginning of the nineties. However human interaction in Catalonia is still an important part of the generalisation. Catalonia produces and maintains topographic vector databases as well as digital maps. The products are derived from each other using generalisation software (CHANGE and self developed software) and human interaction (see figure 2). It should be noted that the 1:50k and 1:250k digital maps are updated separately from and more frequently than the underlying databases since it requires a lot of work to produce the maps from the vector databases. Since the up-to-date requirements for the maps are larger than for the databases, the emphasis is on updating the maps first. This leads to the unusual situation that the maps are newer than the underlying databases. Figure 2: Generalisation workflow in Catalonia Denmark At this moment Denmark is currently deriving a dataset at scale 1:50k from the 1:10k dataset as automatically as possible using Lamps2/Clarity and various self written algorithms. The output from this automated process is manually checked and edited afterwards. The generalisation of buildings is currently being improved by self developed algorithms in Clarity. 4

Germany Current generalisation workflow in Germany is illustrated in figure 3. Datasets at 1:250k and 1:1000k are maintained for whole Germany by the Bundesamt für Kartographie und Geodäsie (Federal Agency for Cartography and Geodesy). Every province (Bundesland) in Germany has its own mapping agency responsible for topographic datasets and maps at scales 1:10k to 1:100k. Common goals have been identified for all mapping agencies in Germany. This has led to specifications for a Base-DLM and a DLM50 for the whole of Germany in the ATKIS project (Authoritative Topographic-Cartographic Information System) (Birth, 2003). Figure 3: Datasets and generalisation in the ATKIS project The Base-DLM has been filled with data by the individual Bundesländer using orthophotos and by digitizing existing large scale maps. The DLM50 is currently being derived once from the Base-DLM. After the DLM50 has been obtained once, only updates will be generalised. Three approaches are followed in Germany to obtain the DLM50. The AdV project (followed by seven Bundesländer, among which North Rhine Westphalia and Baden- Württemberg), the LGN application (followed by eight Bundesländer, among which Lower Saxony) and self-developed application of Bavaria. Bavaria did not participate in the workshop in Enschede, but its approach is very similar to the LGN application. The AdV project distinguishes between model generalisation (resulting in DLM50.1) and cartographic generalisation (resulting in DLM50.2). The model generalisation is already in production and is performed 100% automatically. It should be noted that the topographic geometry of the Base-DLM is maintained in the produced DLM50, i.e. no displacements occur, although the geometry type can be changed and be generalised. Explicit references are built and stored between objects in the Base-DLM and in the DLM50. The cartographic generalisation is in development and will be carried out by a mix of commercial software and self-developed software. It is expected to have still 30% to 40% human interaction. The LGN approach does not distinguish between a topographic dataset and a cartographic dataset. Only one dataset is produced (called the DLM50) based on both model and cartographic generalisation. In this process CHANGE (for building) as well as self developed algorithms in SICAD/Open are used. The developed algorithms are similar to those in the AdV project: selection, elimination, aggregation, merging, reclassification, typification, change of geometry type (e.g. from area to point representation), point reduction (Douglas-Peucker) and smoothing. Interactive assistance is applied in complex situations based on SICAD/open software (see figure 4). The result is a vector dataset with cartographic accuracy (compare with the AdV project). The Base-DML and DLM50 are stored independently and updates are performed in parallel for both datasets (in contrast to the AdV project). Note that problems may occur in the DLM50 where Bundesländer following two different approaches meet. For example roads that are displaced in the LGN project may not connect to the non-displaced roads in the AdV project. 5

(a) (b) Figure 4: When applying automated model generalisation all ponds would be eliminated because each pond is too small. In the LGN project the ponds are aggregated in such a way that the style of the landscape is maintained. The ponds that should be aggregated are identified (a) and aggregated (b) France The NMA of France has a special research laboratory in Geographic Information. In this laboratory there is a special team in generalisation. Research on generalisation started in 1992 and has yielded a lot of results in the areas of developments of platforms, generalisation algorithms, space characterisation, generalisation models, knowledge acquisition and evaluation of the generalised result. The research results are tested in projects, and if these projects achieve good results the research results are implemented in the production line. Most generalisation focus has been on cartographic generalisation. France maintains two basic topographic datasets: BDTopo and BDCarto (see figure 5). France is currently working on finalising BDTopo, which is a topographic 2.5D vector map with 1 meter accuracy (~1:10k) (one database is maintained for every map sheet). In 2007 BDTopo will cover the whole of France. BDCarto is a geographic vector database, with 10 meters accuracy (~1:50k) and already available for the whole of France. BDCarto is maintained to produce a cartographic representation at scale 1:100k, a departmental map at scale 1:120k (which includes less themes then the 1:100k map) and a cartographic representation at scale 1:250k. The first edition was digitised from the old 1:50k map. The semantic resolution between BDTopo and BDCarto differs, as well as the capture process. BDTopo is captured by stereo-restitution from aerial pictures, whereas in BDCarto a selection of features was captured based on existing 1:50k map sheets. However the 1:50k map is not produced from BDCarto. The semantic resolution of BDCarto is not as good as the semantic resolution of a traditional 1:50k map, as for instance buildings have not been captured. Currently it is studied how to produce a Top50 from BDTopo. An example of automatic generalisation in the French production process is the generalisation of updates in BDCarto to the 1:100k dataset which is the result of a project, Carto2001. In this project research results were evaluated successfully. The process and system set up by this project are based on the Lamps2/Agent technology for the generalisation part. The process produces a seamless 1:100k cartographic dataset from BDCarto. The generalisation includes contextual generalisation of the road network, internal generalisation of roads to avoid coalescence and propagation of the induced displacements on other themes (hydrography, administrative boundaries, settlement areas). The generalisation phase is more than 95% automated and some semi-automatic tools are provided to do the necessary interactive work. The links between BDCarto and the 1:100k dataset are kept, so that from the first occurrence of this dataset only the updates need to be propagated and generalised. The first version of the departemental map (1:120k) was produced from BDCarto from 1998-2004 as one cartographic dataset per map sheet. The generalisation performed was minor (only displacement of highways). It was performed interactively in a commercial GIS (not in a research platform). After the results of the Carto2001 project for the 1:100k, the production of the departemental map was renewed. A new cartographic dataset was first produced (completed February 2005) with the same generalisation as before, i.e. displacment of highways. But this dataset was seamless (one dataset for entire France), and the links between BDCarto and this dataset are kept. The generalisation was performed interactively with 6

support of semi-automatic displacement tools developed by Carto2001 in Lamps2. The current updating process is the same as the one developed for the 1:100k. (a) (b) Figure 5: Topographic base datasets (displayed in reduced scale) maintained by the French NMA: BDTopo (a) and BDCarto (b) Belgium, the Netherlands, Ireland, Sweden and Switzerland The other participating NMAs are also faced with requirements for automatic generalisation in order to provide major improvements in the production line. However these NMAs have not yet put many resources on the problem. They are in the phase of answering other fundamental questions. Belgium is faced with the question whether to maintain one reference database at scale 1:10k and generalise as automatically as possible the datasets at scales 1:50k and 1:100k or keep the different datasets while only generalising the updates (as automatic as possible). The integration of Top10v-GIS and Top50v- 7

GIS is now being studied. One of the issues identified is the difference of update date, which complicates the feature linking process. First focus in the Netherlands is on converting the old 1:10k databases to a renewed data model (TOP10NL). Next step is the design of model generalisation to produce the 1:50k dataset based on TOP10NL according to new specifications of the 1:50k dataset. Also Ireland is converting its base dataset into a seamless Oracle database, based on a renewed data model (expected to be complete in 2006). An extra problem that Ireland is faced with is that the scale of the base dataset varies depending on the type of area: 1:1k in urban area, 1:2.5k in non-urban area and 1:5k in remote areas. At this moment the different specifications are being reviewed and compared in order to indicate differences and similarities. For example in the current dataset it occurs that the 1:1k data contains less information for some themes than the 1:2.5k and 1:5k data. Sweden is redesigning its data models for all scales while trying to keep the coherence between them. Switzerland has just finished a fundamental study on how to implement automatic generalisation. A data model for a base dataset has been defined (2.5D, scale 1:10k). For automatically generalising to all other products while maintaining the relationships between them a tender has been prepared. It is expected that still a lot of interactive enhancement will be needed after the automatic generalisation process in order to refine the results. The new system is expected to be in production in 2008. 2.4 Relationships between different scales The last phase is the creation of explicit links between data models, as well as between objects at different scales. The AdV project in Germany provides already the possibility to build and maintain references between different datasets. Catalonia has adjusted its data models at different scales in order to keep the coherence between semantics at different scales. In France relationships are maintained between BDCarto on the one side and the 1:100k and 1:120k data set on the other side. The other NMAs are or will remodel(ling) their data models in order to assure consistency and they are or will build(ing) relationships between different datasets.. 3. RESEARCH NEEDED TO IMPROVE CURRENT GENERALISATION PRACTICE The second important question in this paper is: which research is needed to improve current generalisation practice? During the discussions of the workshop it was concluded that research results that were realised during the last decennia have not always found their way to practice due to three main reasons. Firstly, results have to be implemented in commercial software to become available for NMAs. However generalisation requirements are very divers and NMA specific, dependent on data models, software, scale that have to be produced, specifications of different scales etc. It might be hard for software vendors to provide a general off-the-shelf solution while taking individual NMA demands into account. The second reason for a difficult introduction of research results into practice is the robustness requirements from practice. Specifically in automatic generalisation robustness is a problem, since generalisation is applied to existing datasets that may contain errors or has limitations with respect to today s requirements of geo-information (e.g. topology, object orientation, lack of information on semantic, geometric and topological relationships between objects which is needed to avoid conflicts in generalisation). The last reason for a difficult introduction of research results into practice is the subjectivity of generalisation. When two cartographers are given the same generalisation rules for the same area they will come to different results. Exceptions are common in the generalisation process and specifically the interchange and the sequence of generalisation rules are extremely important. Generalisation processes are therefore not easy to translate into an ordered set of if then...else rules that can be understood by computers. When formalising generalisation rules a but clause is needed for the (much) occurring exceptions, which will not be easy to formalise. Nonetheless during the workshop a list with topics was created that need more attention. Here we will not answer the question whether these topics should be addressed by software vendors or by the research community: A system that contains a comprehensive and wide approach of generalisation functionality (for both geometry and attributes) that takes the context (e.g. mountains, rural, urban) and relationships 8

between objects (such as a building area cannot overlap road area) into account. According to (Weibel and Dutton, 1999) such a system should include: o data representations that facilitate the proper functioning of generalisation algorithms (e.g. take specific characteristics into account, such as bend of roads) o data models that support context-dependent generalisation by: allowing representation of relevant (metric) proximity, topological, and semantic object relationships within and across feature classes; enabling object modelling including differentiation between primitives and features, complex objects and shared primitives; and by permitting the integration of auxiliary data structures such as triangulations, uniform grids for computing and representing proximity relationships o structure and shape recognition o generalisation algorithms including model generalisation o knowledge-based methods o human-computer interaction o generalisation quality assessment General generalisation functions that are adaptable to specific data models that contain already existing datasets. This means that: o implicit data should be made explicit by developing ontologies and formalising semantics o current databases need to be enriched, i.e. information necessary for generalisation should be added, such as shape characteristics, object density and distribution, relative importance of objects, semantic and topological relationships between objects (Weibel and Dutton, 1999) o algorithms in software should work on different data models o knowledge of cartographers need to be formalised Support for Multi-Representation Databases: maintenance of links between derived and original dataset, automatic updating of derived datasets, relevance check during updates Generalisation of contour lines Generalisation of map names Intersubjective, repeatable and quantitative methods to evaluate generalisation results It is important to note that these issues were defined by the NMAs and based on daily problems. This means that the topics are practice oriented, aiming at solving generalisation problems in the nearby future. For science it is a challenge to continue and intensify more ambitious and more fundamental problems, such as: formalising generalisation output, i.e. formalising the aim of generalisation (instead of using a visual description), this also means that objectives of generalisation in a digital context need to be identified and specified by discriminating between data reduction and cartographic presentation dynamic generalisation and generalisation within a continuous scale range (see for example Oosterom, 2005) data structures and data models that are specifically suitable for applying generalisation algorithms Multi-Representation Databases based on true object oriented DBMSs data structures and data models to manage (implicit) Multi-Representations 4. CONCLUDING REMARKS Apart from general generalisation characteristics, the way generalisation is approached also depends on NMA specific factors such as: Type of data: o scale of data sets to be produced o characteristics of landscape, such as mountains (hairpin bends), configuration of towns and villages, remote areas, size of the country o one common scale for the largest scale data set or variance across the country according to the type of area (rural/urban/mountainous) Technical aspects: o distinguishing between model and cartographic generalisation or not o data sets are stored in data bases versus data sets are stored in files 9

o different scales are maintained and updated separately or not o main focus on model generalisation versus main focus on cartographic generalisation o ladder or star approach in production line Organisational aspects: o special resources available for strategic research or not o type of customers to serve (internal/external, private/governmental, map users/gis users) o NMA benefits from produced data versus produced data is free (as in Catalonia) To introduce a comprehensive generalisation system into own, specific production lines requires therefore also implementation as well as remodelling efforts of NMAs themselves. However, an international, common approach to the generalisation problem may support individual NMAs to decide on what steps to take based on experiences of others. Science can help in this process by addressing the fundamental research topics as defined above and by continuing communicating on research results. On the other hand NMAs and software vendors also have their responsibility to become aware of research developments and to report about their experiences. The directions for future research as stated by (Müller et al., 1995) seem still valid: Research cooperation between NMAs and academic research should be intensified. NMAs should state their requirements with respect to generalisation functions more clearly and academic research should take up these issues; Likewise, the third player in R&D, software vendors, should be in close contact with developments taking place at NMAs and sponsor research at academic institutions ACKNOWLEDGEMENTS I could never have written this paper without the active contribution of all participants of the workshop. Therefore I would like to thank Jan De Waele and Anne Féchir (Belgium), Nicolas Regnauld (Great Britain), Ernst Jäger (Lower Saxony), Sabine Urbanke (Baden-Württemberg), Ulrich Düren (North Rhine Westphalia), Bernard Farell and Colin Bray (Ireland), Anne Ruas and Cécile Duchêne (France), Nico Bakker and Jeroen de Vries (The Netherlands), Novit Kreiter (Switzerland), Inger Persson (Sweden), Marlene Meyer and Peter Rosenstand (Denmark), Maria Pla and Blanca Baella (Catalonia), Peter Woodsford (EuroSDR) and Menno-Jan Kraak (ITC). BIBLIOGRAPHY Birth, 2003, Projekt Modell- und kartographische Generalisierung, Kartographische Nachrichten 3/2003 EuroGraphics, 2005, Generalisation Processes: a benchmark study of the expert group on quality, February, 2005, http://www.eurogeographics.org/eng/05_quality_reports.asp Oosterom, van, P.J.M., 2005, Scaleless topological data structures suitable for progressive transfer: the GAP-face tree and GAP-edge forest, Autocarto 2005, 21-23 March 2005, Las Vegas, USA Müller, J.C., R.Weibel, J.P. Lagrange and F. Salgé, 1995, Generalization: state of the art and issues, in GIS and Generalization, edited by J.C. Müller, J.P. Lagrange and R. Weibel, Taylor&Francis Stoter, 2005, Generalisation within NMAs in the 21st century, International Cartographic Conference (ICC), July, 2005, A Coruna, Spain Weibel, R. and G. Dutton, 1999, Generalising spatial data and dealing with multiple representations in Geographical Information Systems, edited by P.A. Longley, M.F. Goodchild, D.J. Maguire and D.W. Rhind, John Wiley & Sons 10