NcML-G ML : encoding NetCDF datasets using GML S. Nativi*, J. Caron^ and B. Domenico^ * University of Florence at Prato Piazza Ciardi, 25 59100 Prato Italy stefano.nativi@pin.unifi.it ^ Unidata Program Center University Corporation for Atmospheric Research Boulder, CO 80307 - USA {caron, ben}@ucar.edu Topics: GML, Geographic Information Metadata Abstract In this paper we describe the NcML-G ML : an extension of the NetCDF Markup Language (NcML). Such extension allows to encode NetCDF dataset resorting to GML 3.0 elements. It was conceived to facilitate the interoperability between the Atmospheric Science and the GIS Communities: NcML-G ML leverages the NcML effectiveness for encoding multi-dimensional arrays, encoding geometadata in GML. The abstract and the general content model which lead to the NcML-G ML are described. 1 Introduction Atmospheric Science Community using protocols like those of the OpenDAP (Open-source Project for a Network Data Access Protocol) from University of Rhode Island [1] or the ADDE (Abstract Distributed Data Environment) from the SSEC at the University of Wisconsin [2], can access and transfer datasets encoded in NetCDF format. NetCDF (network Common Data Form) [3] is an interface for array-oriented data access and a library that provides an implementation of the interface. The netcdf library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. The netcdf software was developed at the Unidata Program Center in Boulder, Colorado. The NetCDF Markup Language (NcML) [4] is an XML representation of netcdf metadata. There are three parts to NcML with separate schema documents: NcML Core Schema represents the netcdf data model; NcML Coordinate System extends the core schema and extends the netcdf data model to add explicit support for general and georeferencing coordinate systems; it works with NetCDF files which follow standard conventions [5]; NcML Dataset extends the core schema to use NcML to define a netcdf file as well as to redefine, aggregate, and subset existing netcdf files. NcML Geography (NcML-G) extends the NcML Coordinate System schema to facilitate the use of NetCDF datasets by GIS. Figure 1 shows the use scenario: GIS import NetCDF datasets as NcML-G documents. DOS, ADDE datasets NetCDF with Conventions NetCDF file API Figure 1 NcML-G use scenario 2 The NcML-G NcML-G ML document The main objective of the GIS extension model for NcML is to facilitate the interoperability between the Atmospheric Science and the GIS Communities. GIS
That allows the Atmospheric Science Community to better provide information to Society. The presented extension introduces a set of optional metadata structures which capture and formalize the GIS facets of NetCDF dataset. These facets are taken from the geospatial standard models introduced by OGC OpenGIS and ISO TC211. The expected result is to facilitate the utilisation of NetCDF datasets into GIS, providing them with all the necessary metadata for interoperability. The NcML-G specification introduces: 1. An abstract model for the extension; 2. A general content model for the extension. Two XML encoding implementations of the content model were introduced: i. A GML-based implementation, called NcML- G ML ; ii. A self-consistent inner implementation, called NcML-G IN. The present work deals with the GML-based implementation. NcML-G ML was conceived to facilitate as much as possible GIS Community to use NcML datasets, by: encoding Atmospheric Science geographic aspects, using GML 3.0 [6]; using GIS Community semantics. 2.1 The abstract model The extension abstract model represents the approach used to interconnect Atmospheric Science and GIS data models, achieving model interoperability. 2.1.1 From Dataset to Coverage The model of Atmospheric Science dataset is generally different from GIS data model. In particular, most of Atmospheric Science data is based on a composite approach; besides, most of GIS data is organized following a geo-relational approach[7]. Composite approach is characterized by a bottom-up way to organize data (i.e. from single measurement values to eventual aggregated entities made up of measurement values). On the contrary, geo-relational approach is characterized by a topdown way to organize data (i.e. from meaningful aggregation entities to their actual measurements content). GIS consider two fundamental geographic types: features and coverages. Coverages can be used to map composite data; indeed they are considered a special case of feature. Therefore, we needed to map the NetCDF composite model into the general coverage model of GIS. The introduced abstract model facilitates the interconnection between these two models, as far as NcML[4] and standard Coverage models [8] are concerned. The following diagrams depict such scenario: Figure 1 and 2 present the considered composite and geo-relational models; Figure 3 shows the introduced abstract model for the extension. Figure 1 Atmospheric Science Composite model As shown in Figure 3 the key concept is defining a Geo-Coordinate System element for NcML datasets. Hence, NcML-CS model is the starting point for GIS extension. Referring to the abstract model diagram, there are three main extension steps: 1 To generate geo-extent information for NetCDF dataset; it is useful for online catalogue applications, for discovering, classifying and filtering datasets; 2 To generate geo-location information for NetCDF dataset; it is useful to represent dataset values on referenced maps and overly them on geo-referenced layers (e.g. themes). 3 To generate one or more coverage from NetCDF dataset. Coverage RangeSet elements are generated from NetCDF variable elements. That is useful to express datasets components as logical geo-
relational features, allowing their integration with other GIS features. 4 To link the NcML variables elements with the corresponding coverage range sets. Figure 2 Coverage geo-relational model (i.e. a NcML-G:GeoCoordinateSystem object) which is characterized by extent metadata (i.e. a NcML- G:DomainExtent object). In turn, a GeoCoordinateSystem object can be specialized becoming either a referenced geographic coordinate system (i.e. a Ncml- G:GeoReferencedCoordinateSystem object) characterized by location metadata (i.e. a Ncml- G:DomainLocation object) or a coverage geographic coordinate system (i.e. a NcML- G:CoverageGeoCoordSystem object). A particular type of CoverageGeoCoordSystem object is represented by a referenced CoverageGeoCoordSystem object (i.e. a NcML- G:CoverageGeoRefCoordSystem object). Figure 4 depicts the described object model; different colors refer to the different XML schemas, the objects are encoded by. According to GIS semantics, a coverage associates a position within a spatial/temporal domain to a value of a defined data type. A coverage is a function from a spatial/temporal domain to an attribute domain. Domain can be reference or unreferenced. Figure 5 depicts the object for NcML-G unreferenced coverages (only the yellow objects) and referenced ones (the entire schema). Figure3 The Abstract Model of NcML-G These extensions introduce metadata needed to enable interoperability. 2.2 The general content model The content model defines the objects introduced for the extension. It implements the abstract model and is expressed using the UML notation. The content model is based on the NcML-CS model [4] and the ISO 19123 [8] coverage model. A NcML CoordinateSystem object can be specialized becoming a geographic coordinate system Figure 4 The NcML-G Coordinate System model Five object packages constitutes the content model: GISExtension package: contains objects describing referenced and unreferenced geocoordinate system.
Figure 5 The NcML-G coverage model GeoDomainExtent package: contains objects describing the spatio-temporal extent of the dataset. GeoDomainLocation package: contains objects describing the spatio-temporal coordinate reference system of the dataset. UnreferencedCoverage package: contains objects describing unreferenced coverages. ReferencedCoverage package: contains objects describing referenced coverages. 2.3 The encoding model The general content model was encoded into a semi-structured model (i.e. XML schema) introducing a GML-based set of elements. Such encoding is called NcML-G ML. Figure 6 depicts the encoding architecture. 2 UnreferencedCoverage.xsd; it specifies metadata for un-referenced coverage associated with a geo coordinate system. It must used in alternative of the UnreferencedCoverage.xsd. 3 ReferencedCoverage.xsd; it specifies metadata for referenced coverage associated with a geo coordinate system. These schemas uses elements defined in the following GML 3.0 schemas: feature.xsd; temporal.xsd; coordinatereferencesystems.xsd; grids.xsd; coverage.xsd. Figure 6 and 7 shows a simplified view of the NcML-GML geocoordinatesystem and coveragegeorefcoordinatesystem and schemas, respectively. Figure 6 The NcML-G ML geocoordinatesystem simplified schema NcML-CS NcML- NcML-G ML G ML XSD/XML Figure 6 The NcML-G ML erchitecture Three main XML schemas were introduced: 1 GeoCoordinateSystem.xsd; it specifies metadata for geo coordinate systems defined from NcML Dataset. Figure 7 The NcML-G ML coveragegeorefcoordinatesystem simplified schema 5 An XML Example
The following example reports the generation of a GIS coverage from a NetCDF dataset. The Coverage has a geo-referenced grid geometry. <netcdf xmlns="http://www.ucar.edu/schemas/netcdf" xmlns:gml="http://www.opengis.net/gml".. > <dimension name="x" length="3"/> <dimension name="y" length="1"/> <variable name="x" type="int"> <values separator=",">11.0,11.5,12.0</values> <variable name="y" type="int"> <values separator=",">44.0,44.5,45.0</values> <variable name="temperature" type="double" shape="x,y"> <values separator=",">237.6,258.7,260.2,276.3,270.4,269.8,271.1,2 70.4,268.6</values> <variable name="wv" type="double" shape="x,y"> <values separator=",">4.6,4.7,5.2,5.3,5.4,6.8,6.1,6.4,6.6</values> <coveragegeorefcoordsystem name="geographicsystem"> <coordinateaxisref ref="x"/> <coordinateaxisref ref="y"/> <geodomainextent description="epsg:4326"> <spatialextent> <gml:envelope>... </gml:envelope> </spatialextent> </geodomainextent> <geodomainlocation> <gml:geographiccrs> <crsid> <code>4326</code> <codespace>epsg</code>. </crsid>... </gml:geographiccrs> </geodomainlocation> <gridcoverage> <grid dimension="2"> <limits> <gridenvelope>... </gridenvelope> </limits> <axisname>longitude</axisname> <axisname>latitude</axisname> </grid> <rangevalues> <rangeinfo> <values> <variable>temperature <variable>wv </values> </rangeinfo> </rangevalues> </gridcoverage> </coveragegeorefcoordsystem> </netcdf> It is noteworthy that the rangeinfo element contains the values sub-element which points to one or more NcML variable elements; it leverages the fact that NcML already manages value collections in an explicit way. Naturally, it is also possible to resort to the GML ValueCollectionType elements. Acknowledgements This work has been partially funded by the THematic Real-time Environmental Distributed Data Services (THREDDS) project of the American NSF. References [1] P. Cornillon, J. Gallagher, T. Sgourosy. OPENDAP: Accessing Data in a Distributed, Heterogeneous Environment. Data Science Journal, Volume 2, 5 November. pages 164-174, 2003 [2] SSEC. McIDAS Learning Guide: ADDE http://www.ssec.wisc.edu/mug/learn_guide/2002/add e.html. [3] Unidata Program Center - University Corporation for Atmospheric Research. NetCDF. http://www.unidata.ucar.edu/packages/netcdf/ [4] The NcML Working Group, The NetCDF Markup Language (NcML), http://www.unidata.ucar.edu/packages/netcdf/ncml/ [5] Unidata Program Center - University Corporation for Atmospheric Research. NetCDF Conventions. http://www.unidata.ucar.edu/packages/netcdf/conven tions.html. [6] Open GIS Consortium, Inc. OpenGIS Geography Markup Language (GML) Implementation Specification version 3.00. OpenGIS project document OGC 02 023r4. 29 January 2003. [7] M. Molenaar. Status and problems of geographical information systems. The necessity of a geo-informatio theory. ISPRS J. Protogramm. Remote Sensing, Elsevier Science Publisher, B.V. pages 85-103, 1991.
[8] ISO 19123, Schema for coverage geometry and functions, ISO TC211 document N1038, 18 Jan 2001.