Distinguishing Instances and Evidence of Geographical Concepts for Geospatial Database Design

Similar documents
Mappings For Cognitive Semantic Interoperability

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY

A General Framework for Conflation

The Architecture of the Georgia Basin Digital Library: Using geoscientific knowledge in sustainable development

Semantic Granularity in Ontology-Driven Geographic Information Systems

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Conceptual Modeling in the Environmental Domain

An Object Model for Geologic Map Information

Taxonomies of Building Objects towards Topographic and Thematic Geo-Ontologies

Investigation, Conceptualization and Abstraction in Geographic Information Science: Some Methodological Parallels with Human Geography

Derivation and implementation of a semantic GIS data model informed by principles of cognition

From Research Objects to Research Networks: Combining Spatial and Semantic Search

A Case Study for Semantic Translation of the Water Framework Directive and a Topographic Database

MULTIDIMENSIONAL REPRESENTATION OF GEOGRAPHIC FEATURES. E. Lynn Usery U.S. Geological Survey United States of America

Interoperability In Practice: Problems in Semantic Conversion from Current Technology to OpenGIS

A Practical Example of Semantic Interoperability of Large-Scale Topographic Databases Using Semantic Web Technologies

The Concept of Geographic Relevance

Intelligent GIS: Automatic generation of qualitative spatial information

Modeling Discrete Processes Over Multiple Levels Of Detail Using Partial Function Application

Using Image Moment Invariants to Distinguish Classes of Geographical Shapes

Quality Assessment and Uncertainty Handling in Uncertainty-Based Spatial Data Mining Framework

Relative adjacencies in spatial pseudo-partitions

Twenty Years of Progress: GIScience in Michael F. Goodchild University of California Santa Barbara

RESEARCG ON THE MDA-BASED GIS INTEROPERABILITY Qi,LI *, Lingling,GUO *, Yuqi,BAI **

for an Informed Analysis of A Socio-Economic Perspective Adrijana Car, Marike Bontenbal and Marius Herrmann

Ontology Summit 2016: SI Track: SI in the GeoScience Session 1: How is SI Viewed in the GeoSciences"

An Ontology-based Framework for Modeling Movement on a Smart Campus

Cognitive modeling with conceptual spaces

Cell-based Model For GIS Generalization

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

Using Ontologies for Integrated Geographic Information Systems

Creative Objectivism, a powerful alternative to Constructivism

ArchaeoKM: Managing Archaeological data through Archaeological Knowledge

Towards Process-Based Ontology for Representing Dynamic Geospatial Phenomena

GeoAgent-based Knowledge Acquisition, Representation, and Validation

Paper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary,

Bottom-Up Propositionalization

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

Spatial Intelligence. Angela Schwering

The Role of Knowledge Representation in Geographic Knowledge Discovery: A Case Study

Using Ontologies for Integrated Geographic Information Systems

Ontology Summit Framing the Conversation: Ontologies within Semantic Interoperability Ecosystems

Knowledge Discovery Based Query Answering in Hierarchical Information Systems

FUNDAMENTALS OF GEOINFORMATICS PART-II (CLASS: FYBSc SEM- II)

CHAPTER 12. Geo-Ontologies. Frederico Fonseca and Gilberto Câmara Introduction Ontology and ontologies

Animating Maps: Visual Analytics meets Geoweb 2.0

USE OF RADIOMETRICS IN SOIL SURVEY

SoLIM: A new technology for soil survey

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Using C-OWL for the Alignment and Merging of Medical Ontologies

A spatial literacy initiative for undergraduate education at UCSB

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

Spatial Data Science. Soumya K Ghosh

A Spatial Analytical Methods-first Approach to Teaching Core Concepts

Designing and Evaluating Generic Ontologies

MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING

CHAPTER 3 RESEARCH METHODOLOGY

Towards Geographic Information Observatories

Convex Hull-Based Metric Refinements for Topological Spatial Relations

Course Introduction II

A Sketch of an Ontology of Spaces

Mapping Landscape Change: Space Time Dynamics and Historical Periods.

A MULTISCALE APPROACH TO DETECT SPATIAL-TEMPORAL OUTLIERS

An Introduction to Geographic Information System

An Exploration into the Definition, Operationalization and Evaluation of Geographical Categories

Specialist Meeting on Spatial Concepts in GIS and Design Santa Barbara, California. Geo-Spatial Design

Research on Object-Oriented Geographical Data Model in GIS

New Frameworks for Urban Sustainability Assessments: Linking Complexity, Information and Policy

The Importance of Spatial Literacy

Exploring Visualization of Geospatial Ontologies Using Cesium

a system for input, storage, manipulation, and output of geographic information. GIS combines software with hardware,

Are Objects Ontologically Dependent on Processes?

GIS = Geographic Information Systems;

Classification Based on Logical Concept Analysis

SPACETIME HOLISM AND THE PASSAGE OF TIME

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

Appropriate Selection of Cartographic Symbols in a GIS Environment

Web Visualization of Geo-Spatial Data using SVG and VRML/X3D

USING THE INTERNET TO ACCESS GEOGRAPHIC INFORMATION: AN OPEN GIS PROTOTYPE

Table of Contents. Foreword... xiii Michael F. GOODCHILD Introduction... xix Stéphane ROCHE and Claude CARON

GIS at UCAR. The evolution of NCAR s GIS Initiative. Olga Wilhelmi ESIG-NCAR Unidata Workshop 24 June, 2003

Semantic Evolution of Geospatial Web Services: Use Cases and Experiments in the Geospatial Semantic Web

Economic and Social Council 2 July 2015

Welcome to GST 101: Introduction to Geospatial Technology. This course will introduce you to Geographic Information Systems (GIS), cartography,

Outline Introduction Background Related Rl dw Works Proposed Approach Experiments and Results Conclusion

AS/NZS ISO :2015

of a landscape to support biodiversity and ecosystem processes and provide ecosystem services in face of various disturbances.

OBJECT BASED IMAGE ANALYSIS FOR URBAN MAPPING AND CITY PLANNING IN BELGIUM. P. Lemenkova

Cognitive Engineering for Geographic Information Science

Geosciences Data Digitize and Materialize, Standardization Based on Logical Inter- Domain Relationships GeoDMS

The Building Blocks of the City: Points, Lines and Polygons

Location Based Concierge Service with Spatially Extended Topology for Moving Objects

Learning Computer-Assisted Map Analysis

LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE

Conceptual Modeling of Formal and Material Relations Applied to Ontologies

Lecture 1: Geospatial Data Models

On flexible database querying via extensions to fuzzy sets

Reducing Consumer Uncertainty

Transcription:

Distinguishing Instances and Evidence of Geographical Concepts for Geospatial Database Design Boyan Brodaric and Mark Gahegan GeoVISTA Center, Department of Geography, The Pennsylvania State University, University Park, PA 16802. Ph: +1-814-865-2612; Fax: +1-814-863-7643; Email bmb184@psu.edu; mng1@psu.edu Abstract. In many geoscientific disciplines concepts are regularly discovered and modified, but the architecture of our geospatial information systems is primarily aimed at supporting static conceptual structures. This results in a semantic gap between our evolving understanding of these concepts and how they are represented in our systems. The research reported here provides better database support for geographical concepts that evolve with particular situations. To reduce the potential for schema change in such environments, we develop an analysis of the structure and function of situated geographical concepts and directly model the results in an UML schema. The developed schema explicitly contextualizes geographic information and concepts, enabling the extraction of contexts and interpretations from databases. This aids (1) the addition of empirical components to a geoscientific ontology, (2) the representation of context in geo-databases, and (3) the uncovering of the implicit aspects of data, and the sharing of meaning, via the represented contexts. Prototype implementations that show promise for managing geoscientific ontologies and databases are also briefly discussed. 1. Introduction The importance of determining what geographic concepts exist and how to represent and process them computationally is a significant research thrust in GIScience. This thrust is leading to richer and more complete conceptual models in our systems, but is also requiring us to extend the representations we employ for capturing knowledge about our domains and recording our understanding of them. In particular, advanced representations must tackle the thorny issue of missing knowledge, in that much of the knowledge required to validly interpret information stored in GIS, and indeed in other information systems, is implied and not explicit (Rubenstein-Montano, 2000) it depends on various tacit agreed conventions to enable the communication of meaning between information producers and consumers. Common carriers of meaning are the names and definitions we give to concepts and categories, but these capture only a fragment of the meaning possessed by humans. Our representations must therefore become richer to reduce misunderstandings between producers and consumers. On the one hand this involves explicitly representing more meaning currently carried by producers, and on the other hand, it involves stimulating implicit meaning in information consumers. 1

The computer science field of ontologic engineering provides a significant approach to the richer representation of explicit meaning in computing systems. In ontologic frameworks greater meaning is achieved by associating data with one or more logical theories consisting of concepts that are described, inter-related and that participate in specific axioms. Meaning is contextual in such frameworks, derived from the particular theory ontology being applied to the data. For example, a geographic region could be variously conceptualized from many perspectives (Fonseca, et al., 2000): e.g. geological, soil science, engineering, surveying, etc. However, in addition to perspective, a geographical conceptualization might also depend on (1) motivation, why a certain perspective is chosen over another, (2) evolution, how and why a perspective changes, and (3) creation, how and why a perspective and its constituents are conceptualized in the first place. To account for these factors contextual meaning could also include the specific situations involved: e.g. the conceptualization of a region may be tied to specific intentions, observations, locations, times, actions, background knowledge, etc. Although initially formulated as the contents of space-time segments (Barwise and Perry, 1983), situations have been broadly and loosely expanded to mean a coherent collection of influences deemed to be meaningful by information producers (Sowa, 2000). Aspects of these influences will no doubt remain implicit, locked inside information producer s mental models, but others might be explicitly represented to enhance context and to thereby promote the stimulation of similar knowledge in consumers. This enhanced notion of context naturally fits with a developmental view of concepts and theories, by providing the dimensions of evidence and justification to ontologies in addition to the largely definitional thrust currently supported by ontologic frameworks (e.g. Fensel et al., 2000; Noy et al., 2000). From a database viewpoint, situated context positions a concept within a strategic network of broadly-defined influencing data, as well as within a network of described, related and logically defined concepts. Accordingly, we have two aims in this paper: first, extending the representation of geographic concepts by including links to situational influences which we broadly call evidence and second, developing a UML (Universal Modeling Language; Rumbaugh, 1999) representation of this extension for geoscience databases. The paper proceeds as follows: section 2 proposes some conceptual requirements for geoscientific databases; section 3 develops a conceptualization and UML representation of non-geographic concepts, which is extended in section 4 to geographic concepts; section 5 briefly describes prototype implementation activities in geoscience and we conclude in section 6. 2. Geoscientific Databases A developmental and situated viewpoint on geoscientific ontology has significant consequences for geoscientific database design, in that database schema founded on evolving concepts will themselves need to evolve, leading to maintenance and usage headaches (Roddick, 1995). Our solution to this problem is a general database schema design founded on meta-concepts, such as ontologies, models, concepts, etc., rather than on unstable concepts in the domain (Brodaric & Gahegan, 2001; Brodaric and 2

Hastings, 2002). A database schema developed on this principle will have the benefit of being a dynamic repository and registry for multiple scientific knowledge components and their relations, and could include an enhanced situated representation of concepts. This presupposes that at least some geoscientific knowledge components are dynamic and contextualized, or that a meta-concept organization for database schema is preferable for reasons of simplicity, practicality, etc. A dynamic and contextual account of geoscientific concepts involves: (1) discovery/evolution: a static view of geoscientific knowledge does not accommodate the learning or discovery of new knowledge, or facilitate the multiple interpretation or re-interpretation of existing data fundamental objectives of any science. Modeling an open system, such as geoscientific knowledge, with the closed world assumptions is inherently problematic (Frank, 1997); embedding such assumptions in fixed database schema will inevitably lead to a program of continual schema adjustments, imprecision and heterogeneity, or to pre-set limits on the types of knowledge acceptable, escalating representational inaccuracy and hindering scientific creativity. (2) generality/specificity: concepts and theories are thought to range in generality, from broadly applicable to domain specific (Guarino, 1997; Rosch. 1978). This range of generality suggests that in the lower conceptual tiers concept evolution might be more prevalent. Conceptual change at the highest levels may be nonexistent, whereas at intermediate levels it may be infrequent and regarded as a paradigm shift (Kuhn, 1963), and instability in less general concepts may be seen as ongoing learning, such as the evolving knowledge of some region. For example, the general concept of lake may remain relatively fixed within a perspective, but the concept of polluted lake may be situated and evolutionary, altering with changing definitions of pollution, with evolving lists of pollutants and developments in pollution measurement, and with fluctuating pollution levels linked to environmental conditions. (3) context/experience: concepts are recognized as being increasingly contextual and multi-dimensional. They can be organized around goals (Barsalou, 1985), theories (Murphy, 1993), and functions (Solomon, 2000) and utilize historical and situational knowledge (Smith & Samuelson, 1997). A reliance on context and historical experience might explain, for example, the often diverse descriptions and conceptualizations of a geospatial region developed by geoscientists (e.g. Bie & Beckett, 1973; Brodaric & Gahegan, 2001), or why some people see a polluted lake and others do not; the basis for conceptual similarity, on the other hand, might lie in a common cognitive infrastructure and shared situations and experience (Lakoff, 1987). The premise that conceptual instability entails schema change arises from the practice of founding schema on concepts extracted from the domain at a specific level of abstraction. Of the five levels of abstraction identified by Brachman (1979), three are relevant here: the epistemological level contains concept structuring rules and primitives such as tuples, relations, objects, classes, attributes, slots, etc.; the conceptual level contains concepts identified in the domain, their properties, relations, and constraints; and the linguistic level contains data and relations. These levels apply to database design in a top-down fashion: at the first level epistemological 3

frameworks are selected; at the second level concepts are elicited from the domain and represented using an epistemological framework in three ways: as (1) a technology-neutral conceptual schema, (2) a technology-aware logical schema, and finally, as (3) a technology-specific physical schema intended for a particular hardware/software system; the third level is the level of data, which resides in the physical system. Thus, schema developed upon unstable concepts identified at the conceptual level will inevitably be prone to change, as seems to be the case with many scientific databases (Tamzalit & Oussalah, 1999; 2000). In contrast to versioning mechanisms that focus on managing schema change (Roddick, 1995), we concentrate here on a broadly applicable conceptual schema design founded on general concepts that are presumably more stable. Such concepts might be drawn from an additional level, the ontologic level (Guarino, 1994; 1995), which serves to increase the meaning of epistemological or conceptual elements by connecting them to broader conceptual-logical systems, or top, domain/task and application ontologies, ranging in generality from universal to increasingly specific. However, though general ontologic concepts might be identified within a geoscientific domain, the nature of open systems and the knowledge discovery imperative, quite on their own and without reference to domain characteristics, argue for the need to model unexpected and variable domain relations and properties that cannot be fully predefined, or when approximated, result in a complex network-like schema structure that is difficult to use and maintain. In effect, it is difficult to ascribe global regularities of structure to domain objects and relations in open systems. To overcome these limitations we utilize the primitives of the more abstract epistemological level as the basis for schema design. We also suggest that many concepts inherent in scientific geospatial domains, such as the concepts for surveyed regions in geology, soils, ecology, etc., are especially driven by the contexts surrounding human observation and interpretation of geospatial data (Brodaric & Hastings, 2002; Brodaric & Gahegan, 2001) and would therefore benefit from this design. Specifically, we develop a technology-neutral UML conceptual schema for concept and data interaction, one that might be logically and physically adapted in databases or in other applications that represent concepts, such as geoscientific ontology systems. As part of raising the abstraction level for database schema, we incorporate both the top-down ontologic approach, from concepts to data, and the bottom-up situated approach, from data to concepts. This bidirectional relationship between data and concepts contrasts with other unidirectional, top-down, geospatial approaches that introduce spatial and/or temporal constructs at the epistemological or ontologic levels (Benslimane, et al., 2000; Camara, 1994; DeOliveira, 1997; Fonseca et al., 2000; Hadzilacos & Tryfona, 1996; Kosters, et al., 1997; Pullar & Stock, 1999; Renolen, 2000; Shekar, et al., 1997; Smith et al., 1991). It also contrasts with non-geographical meta-representations in which the link between data and concepts is mainly unidirectional, either top-down (e.g. Gruber, 1993; Noy et al., 2000; Pepper, 2000; Tudhope et al. 2001), or bottom-up but not situated (e.g. Fayyad, 1996; Brachman et al., 1999; Wille, 1996). 4

3. Non-Geographical Concepts General interest in concepts has foundations in two main traditions, the philosophical and cognitive, which respectively emphasize logical and mental representations of concepts. In both traditions concepts possess intension and extension: extension refers to the group of objects considered to exemplify the concept, whereas intension refers to the essential meaning encapsulated by the concept. In this section we investigate intension and extension to develop a workable database representation for conceptual structure and development. 3.1 Concept Structure An important role of the intension is the specification of a concept s properties, including its attributes, functions, constraints, relations to other concepts, etc., and the provision of a classification mechanism for differentiating objects that are instances of the concept from those that are not. For example, the concept lake has properties such as name, size, shape, depth, a recreation function, a commercial function, etc., and it might be differentiated (possibly non-uniquely) from other water bodies based on specifications of size, function, etc. (after Smith & Mark, 1998). (This contrasts with the exemplar view in which the concept merely provides identify, but no summary properties, for the objects that exemplify the concept (Murphy & Medin, 1985). We follow the former notion in which concept intensions possess properties. ) In a reciprocal relation to the intension, the extension groups together the instances that pass a concept s classification mechanism, e.g. the concept lake has as its extension the collection of all lakes in the world (Hampton & Dubois, 1993). The members of the extension generally reflect the properties of the intension, such as in the case of the small, oblong, shallow, fish-farming lake. The intension can thus be regarded as defining the entire possibility space for the members of the extension, whereas the extension denotes what has actually been encountered and assigned to the concept. This perspective is also widely used within the database and data modeling communities (Ullman, 1988; Rumbaugh et al., 1999). Some terminological differences exist between research communities: in the classical view a concept s extension is called a class (Sutcliffe, 1993) and a very general concept, such as Aristotle s substance, is called a category (Sowa, 2000, ch. 2); whereas in the cognitive realm a category refers to a concept s extension (Rosch, 1978, p 30, cf. Sutcliffe, 1993). Furthermore, the process of determining the extension of a concept is referred to as classification in the classical sense, and categorization in the cognitive sense. We will use the classical designation of class to refer to the grouping of elements comprising the concept s extension, and the terms instantiation and classification to denote the process of connecting a class with the elements in the extension. We reserve the terms category, categorization and clustering for slightly different purposes developed below. The term occurrence is used to specify an individual entity that might be placed into a class, including a geospatial object, while an instance refers to an occurrence that has been placed into a class. 5

Figure 1 depicts a fragment of a database schema that shows the traditional relationships between a concept and its extension. Note that there is agreement between the cognitive and philosophic traditions on aspects of the semantics of these relations: where there are distinct kinds of categories, the associated concepts will also be distinct (Medin, et al., 2000), indicating a particular extension applies to a single concept, and that concepts need not have extension but may be abstract e.g. quality. Moreover, occurrences can be classified in multiple ways as it is logically possible for one and the same list of objects to be in one-to-one correspondence with the extensions of substantially distinct concepts (Sutcliffe, 1993). Note that we do not model a concept s properties in Figure 1, as that is beyond our immediate purpose. We recognize a concept s properties might be structured using frames, objects, slots, etc. (see Barsalou & Hale, 1993), without detracting from our aims. Concept 1..1 extension classification / instantiation Class Occurrence intension 0..1 0..* 1..* Instance Fig. 1. The traditional relations between a concept and its extension. 3.2 Concept Development Note that the terms instantiation and classification are traditionally used in subtly different ways: both imply the addition of an occurrence to an existing class, creating an instance; but instantiation implies the creation of a new occurrence, such as object creation in object-oriented software engineering (Rumbaugh et al., 1999), whereas classification implies the placement of an existing occurrence into a class, such as in remote sensing when an image is classified via the labeling of its pixels. Instance development can be viewed as being deductive in that membership in a class is entailed when (occurrence) data passes the classification rule. We refer to each occurrence contributing to the development of an instance as evidence, and we denote the collection of evidence contributing to an instance s development as a situation. We also distinguish two senses of intension development, both of which can modify a concept s properties: in the first sense, which we call clustering, a collection of occurrences is segregated into groups and an intension is developed for each group. Clustering is modestly innovative in this sense, in that intensions arise from various combinations of a fixed set of properties. On the other hand, we denote categorization as being highly innovative in that new properties are inferred from the data in combination with existing domain theory; this leads to the identification of conceptually different or even to novel concepts (Wisniewski, 1998). For example, in image analysis spectral classes are distinguished from information classes: spectral classes are data-driven, expressed in terms of the fixed properties of the image, whereas information classes are expressed in terms of some domain knowledge. Clustering thus refers to the development of spectral classes, whereas categorization refers to the development of information classes: e.g. clustering reveals a spectral 6

class that might be categorized (interpreted) as a type of geologic structure, and in doing so might even modify our notion of that type of structure. Finally, the term category refers to the situation involved in intension development. We thus use situation in a dual way to refer to either the evidential occurrences that lead to intension development and/or that lead to instance development. A summary of our nomenclature is presented in Table 1: Table 1. A summary of key terms term concept occurrence intension extension class instance situation evidence category classification instantiation clustering categorization usage a unit of knowledge for understanding an aspect of the world (Murphy & Medin, 1985) an individual entity that can exemplify a concept the identity and properties of a concept the group of occurrences exemplifying the concept a concept s extension an occurrence that has been placed into a class the group of occurrences used to develop or refine an instance, or a concept s identity or properties an occurrence in a situation a concept s situation: the group of occurrences used to develop or refine a concept s identity or properties placement of an existing occurrence into a class creation of a new occurrence and its placement into a class development of an intension conceptually similar to its evidence development of an intension conceptually distant from its evidence 4. Geographical Concepts Geographical concepts are thought to differ from other concepts in many qualitative ways, including the spatio-temporal nature of the occurrences being categorized (Smith & Mark, 1998). Specifically, some occurrences in the geospatial domain are less tangible because they can be: (1) the product of social agreements and norms, such as geopolitical boundaries (Smith & Mark, 1998); (2) historical artifacts, such as past phenomena whose residual effects still exert causal influence on the present, e.g. prior volcanic activity; (3) interpreted objects, such as the objects discretized from continuous data; 7

(4) sampled artifacts, that are partially or indirectly detectable (by humans and/or machines), or detectable at an inappropriate level of detail or scale; and (5) mereotopologically contingent, where their identity is dependent on their parts and/or topologic relations (Smith and Mark, 1998). Such geospatial occurrences may not therefore be wholly observed or observable, but may be indirectly apprehended. This indirection draws particular attention to: (1) evidence: the situation instigating the recognition of an occurrence or concept, and facilitating their understanding. (2) inference: the process used to identify or evolve an occurrence, or learn and revise a concept, in a situation. (3) discovery: the fact that many occurrences and their concepts are unknown beforehand: e.g. geologists mapping new territory develop not only new spatiotemporal regions but also revise or devise regional concepts. These emphases suggest a reliance on creativity in the recognition of situation dependent geographical concepts and occurrences, and this points to instantiation and categorization as being particularly relevant to geographical discovery. Figure 2 shows a simplified geologic mapping scenario using our data-representational terms, where local concepts and instances are discovered and evolve. Each circle represents the same geographic area and contains various observed and interpreted features; the boxes contain concepts, and understanding of the area increases from left to right. A B C D p 1 p 2 b 1 b 2 p 3 r rock type RT 1, RT 2 region R boundary B local boundary B* local region R* Prior Concepts Developed Concepts clustering classification categorization instantiation Fig. 2. An example scenario showing geoscientific concept and instance development. 8

A. occurrences of point-located observations are recognized: the occurrences are clustered into two existing rock type concepts which classify the occurrences into rock type instances; a boundary concept is categorized (inferred) from topologic and other relations observed amongst the occurrences; B. the inferred boundary is instantiated from an existing boundary concept; the specific characteristics of this boundary are deemed representative of boundaries in the area, categorizing a new local boundary concept. C. the local boundary concept instantiates instances from additional evidence; D. a new region concept is categorized from prior region concepts, the boundary and rock type instances, and other data; the region instance is finally deduced. 4.1 Database Design The schema shown below in Figure 3 incorporates our terminological distinctions. Figure 4 further illustrates the schema via a tabular representation of some data from Figure 2. Conceptual variation is modeled in the schema by dividing the notion of a concept into one object for identity, called Concept Identity, and another object for state, called Concept. A similar approach is taken for occurrences. In consequence, variations in concept intension or extension would each be viewed as new Concept states, and variations in occurrence properties, such as those for spatial or temporal description, or classification, would in the same way be seen as distinct Occurrence states. Concepts can thus be seen as possessing constant identity linked to multiple change-precipitated states consisting of classes, categories, and properties; likewise, occurrences with constant identity could possess multiple property states (Bonfatti & Pazzi, 1995; Hornsby & Egenhofer, 2000). We leave the problem of identity change for further work. The schema also enables the representation of types of situations, categories and evidence, which we do not elaborate but also leave for future work. extension 0..1 Class 0..* classification / instantiation Instance Concept Identity 1..1 1..1 Concept 1..* intension 1..1 0..1 Category 0..1 0..1 Situation 1..* Occurrence 1..1 Occurrence Identity 1..* 1..* Evidence 0..* clustering / categorization Fig. 3. A data model representing the interaction between concepts, occurrences and supporting evidence. Relationships between concepts and between occurrences are not shown. 9

Con CONCEPT 1 R (Geologic Unit) Concept 2 R* (X Formation) 3 B (Geologic Boundary) 4 B* (X Formation Contact) 5 RT (Rock Type) 6 RT 1 (granite) 7 RT 2 (sandstone) Con CLASS Class 1 8 2 9 3 10 4 11 5 12 6 13 7 14 CATEGORY Con Sit 2 31 4 32 INSTANCE SITUATION EVENCE OCCUR. Class Instance Occ Sit Sit Evi Sit Occ Occ Occ 9 22 16 28 28 33 28 17 16 r 11 23 17 29 29 34 28 21 17 b1 11 24 18 30 30 35 29 19 18 b2 13 25 19 31 36 29 20 19 p1 14 26 20 32 37 30 21 20 p2 13 27 21 38 31 17 21 p3 39 32 17 28 sit28 33 evi33 Fig. 4. A tabular representation of part of the schema shown in Figure 3, using some data from Figure 2; for illustration purposes only e.g. the list of occurrences and tables is incomplete, and the tables shown are not fully normalized. 4.2 Context A particularly useful query suggested by the schema described above involves extracting from a database the context for geographical concepts or instances. Context is defined below as a recursive traversal of the evidential situations supporting a concept or instance. Let X denote a concept, o an occurrence, i an instance and s a situation containing a set of evidence, then: 10

o.context ( ) ->o (1) i.context ( ) -> ( X.i, X.i.o, X.Context ( ), X.i.s.Context ( )) (2) X.Context ( ) -> ( X, X.s.Context ( )) (3) s.context ( ) -> ( s, s 1.Context ( ),, s n.context ( )), s j are the elements of s (4) This recursive formulation results in a network of evidential situations expanding from an origin consisting of a concept or an occurrence. Such networks can be represented by a graph that serves to explain a geoscientific instance (after Voissard, 1999), or additionally in our case, that contextualizes a concept. However, such graphs quickly become very complex. A simplified context, called an interpretation, can reduce this complexity by limiting the context s scope to a unique path through the graph, with the various paths through the graph then designating multiple interpretations for the occurrence or concept (also after Voissard, 1999). An interpretation is defined herein as: o.interp ( ) -> o (5) i.interp ( ) -> ( X.i, X.i.o, X, X.s.Interp ( ) or X.i.s.Interp ( )) (6) X.Interp ( ) -> ( X, X.s.Interp ( )) (7) s.interp ( ) -> ( s, s j.interp ( )), s j is one element of s (8) Figures 5 and 6 illustrate an interpretation for instance r and concept R*, respectively, using data from Figure 4. Note that multi-media objects for reports and diagrams, photos, descriptions of motivations, actions and social interactions, etc., could also be included as evidence and hence reflected in contexts and interpretations. r r.o R* r.s r.s.p 3 p 3.o RT 1 Fig. 5. An interpretation of region r. 11

R* r.s r.s.b 1 b 1.o B* b 1.s b 1.s.p 2 p 2.o RT 2 Fig. 6. an interpretation of the R* concept. 5. Application We are in the preliminary stages of implementing and testing the schema fragment developed above, mainly by investigating the semantic fit of the schema to the data. Aspects of the schema are also being implemented by some government agencies responsible for managing geoscientific dataases, such as the national geologic map databases of Canada (www.cgkn.net) and the USGS (http://ncgmp.usgs.gov/ngmdbproject/). An immediate concern in these databases is the structure and explication of the map unit concepts that label regions on thousands of maps. Equally important are the structure and explication of the science language used to describe those concepts and regions. Both of these elements are being compiled mainly from the legends of digital geologic maps; information about specific geoscientific occurrences is also significant, but is a secondary concern as that information is often difficult to obtain. The map unit concepts and their descriptive language are used heterogeneously: both types of terms may have different descriptions in various geographic regions and even among individuals, and different terms often have identical meanings. Solutions to this problem are progressing in both top-down and bottom-up modes: top-down normative definitions are being developed for many of the terms (http://geology.usgs.gov/dm/steering/teams/language/charter.shtml); at the same time, in a bottom-up approach, existing terms are being input into databases with a view to finding empirical regularities in the use of the terms and coordinating these with the definitions. The aim is to retain local terms and characteristics, as these apply directly to the maps, but to fit these terms into a uniform system for interoperability purposes. The notion is that such a system would grow and evolve with scientific advances and 12

shifts. Consequently, the situated nature of the information itself, and the expected dynamic nature of the geoscientific ontology being envisioned, require a system that can provide contexts to facilitate the understanding of concepts and eventually the map objects. Implementation is proceeding using both relational and object-oriented database systems. Additional schema elements (not described here) are required to manage concept and occurrence properties, including spatial, temporal and other descriptions, as well as relations within concepts and within occurrences. The spatial descriptions in particular are implemented by linking to native spatial objects in commercial GIS for visualization and query purposes, with the remaining non-spatial parts maintained in the GIS or in databases exterior to it. Implementations exist in varying configurations and stages of progress with the Arc8, GE SmallWorld, and the webenabled MapGuide and MapServer GIS software, as well as the Oracle and SQL Server databases. Initial results are positive and encourage continuation. 6. Conclusions We have re-examined the notions of concept, class, and category in terms of geographical occurrences and the human activity of generating regions and regional taxonomies. We focus on concepts that evolve with specific situations and develop a framework for representing the situated components, which we call evidence. Specifically, we: (1) propose that geoscientific environments in which knowledge is situated, evolves or is regularly discovered, would benefit from a abstract database schema modeled on epistemological primitives such as concepts, occurrences, etc., rather than on domain concepts. Schema evolution and complexity is consequently reduced, and multiple scientific views are accommodated in one structure. However, such increased flexibility and abstraction requires more rigorous ontologic knowledge engineering, which may challenge practices in certain geoscientific disciplines, but which also may result in better semantics and improved interoperability of geoscientific databases. The full effects of this shift in abstraction in schema design are still largely undetermined. (2) distinguish and describe four functions integral to a data-representational notion of concepts, and that enable the discovery and generation of such concepts and related instances: classification, clustering, instantiation, and categorization. (3) identify the notions of situation and evidence as integral elements of situated concepts and instances. This leads to terminology refinement: we denote the group of evidence supporting the development of the intension as the concept s category, and the group of instances exemplifying the concept as its class. This also contributes to the notion of a dynamic geoscientific ontology, one that evolves with empirical evidence or in relation to other real-world situations. (4) enhance the notion of geographical context as a layered set of relevant evidence associated with geographical concepts or instances. Users can use this capability 13

to trace the development pattern of a concept or occurrence through a spiraling abstraction pathway, thereby perhaps stimulating implicit knowledge related to the concept or occurrence and improving meaning communication. (5) develop a database schema fragment to model the interaction between concepts, instances and their evidence, and discuss an early implementation of the schema in the context of developing geologic map databases and geologic ontologies. This overall approach is potentially applicable in many domains where knowledge is regularly discovered and evolved. In geospatial terms these ideas can be directly applied to machine or human classified maps, detailing practices and elements previously undocumented. Moreover, the design inherently provides structural support for some aspects of knowledge discovery, such as those for the visual exploration of data where categories and concepts are regularly developed dynamically. It also has promise for recording aspects of the geoscientific reasoning process, providing an electronic record for future scientific use, and may provide a unifying basis for approaches to geospatial error and uncertainty, by explicating the linkage of intension and extension, a potential source of uncertainty (Freska & Barkowsky, 1996). Computability of the recursive context structure is an ongoing concern. Future work entails increased testing with diverse data, generating better data capture tools to populate the richer structure proposed, extending the schema to include multiple ontolgies, and formalizing many of the concepts presented herein. Acknowledgements We gratefully acknowledge the support of the Geological Survey of Canada and the U.S. Geological Survey, and we also thank Nadine Schuurman and the anonymous reviewers for their helpful comments. References Barsalou, L.W. (1985). Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11:629-654. Barsalou, L.W., Hale, C.R. (1993). Components of Conceptual Representation: from feature lists to recursive frames. In: Mechelen, I.V., Hampton, J., Michalski, R.S., and Theuns, P. (Eds.) Categories and Concepts: theoretical views and inductive data analysis. Academic, New York, 97-144. Barwise, J., Perry, J. (1983). Situations and Attitudes. MIT Press, Cambridge, MA. Benslimane, D., Leclercq, E., Savonnet, M., Terrasse, M.N., Yetongnon, K. (2000). On the definition of generic multi-layered ontologies for urban applications. Computers, Environment, and Urban Systems, 24:191-214. Bie, S. W. and Beckett, P. H. T. (1973). Comparison of four independent soil surveys by airphoto interpretation, Paphos area (Cyprus). Photogrammetrica, 29:189-202. Bonfatti, F., Pazzi, L. (1995). Ontological foundations for state and identity within the objectoriented paradigm. International Journal of Human-Computer Studies, 43:891-906. 14

Brachman, R. J. (1979). On the epistemological status of semantic networks. In: Findler, N. V. (Ed.), Associative Networks. Academic Press, New York, 3-50. Brachman, R.J., McGuinness, D.L., Patel-Schneider, P.F., Borgida, A. (1999). Reducing CLASSIC to practice: Knowledge representation theory meets reality. Artifical Intelligence 114:203-237. Brodaric, B., Gahegan, M. (2001). Learning geoscience categories in-situ: implications for geographic knowledge representation. Proceedings of the Ninth ACM International Symposium on Advances in GIS, Atlanta, GA, Nov. 9-10, 2001. ACM Press, New York, 130-135. Brodaric, B., Hastings, J. (2002) An object model for geologic map information. In: Advances in Spatial Data Handling, Springer-Verlag, New York. Camara, G., Freitas, U. M., Souza, R., Casanova, M., Hemerly, A., Medeiros, C. (1994). A model to cultivate objects and manipulate fields. Proceedings 2 nd ACM Workshop on Advances in GIS, 20-28. De Oliveira, J.L., Pires, F., Medeiros, C. B. (1997). An environment for modeling and design of geographic applications. GeoInformatica, 1, 29-58. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, Fall 1996, 37-54. Fensel, D., Horrocks, I., Van Harmelen, F., Decker, S., Erdmann, M., Klein, M. (2000). OIL in a Nutshell. 2th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2000), Juan-les-Pins, France, 2000. Fonseca, F.T., Egenhofer, M.J., Clodoveu, A.D. Jr., Borges, K.A.V. (2000) Ontologies and knowledge sharing in urban GIS. Computers, Environment, and Urban Systems 24(3):251-272. Frank, A.U. (1997). Spatial ontology: a geographical point of view. In: Stock, O., (ed.), Spatial and Temporal Reasoning, Dordrecht:Kluwer, 135-153. Freska, C., Barkowsky, T. (1996). Relations between spatial concepts and geographic objects. In: Geographic Objects with Indeterminate Boundaries, Burrough, P. A., Frank, A. U. (Eds), Taylor & Francis, London, 99-121. Guarino, N. (1994). The ontologic level. In: Casati, R., Smith, B., White, G. (Eds.), Philosophy and the Cognitive Science. Holder-Pichler-Tempsky, Vienna, 443-456. Guarino, N. (1995). Formal ontology, conceptual analysis and knowledge representation. International Journal of Human and Computer Studies, 43(5/6):625-640. Guarino, N. (1997). Understanding, building, and using ontologies. International Journal of Human-Computer Studies, 46, 293-310. Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199-220. Hadzilacos, T., Tryfona, N. (1996). Logical data modeling for geographical applications. International Journal of Geographical Information Science, 10(2): 179-203. Hampton, J., Dubois, D. (1993). Psychological models of concepts: introduction. In: Mechelen, I.V., Hampton, J., Michalski, R.S., and Theuns, P. (Eds.) Categories and Concepts: theoretical views and inductive data analysis. Academic, New York, 11-34. Hornsby, J., Egenhofer, M.J. (2000). Identity-based change: a foundation for spatio-temporal knowledge representation. International Journal of Geographical Information Science, 14(3):207-224. Kosters, G., Pagel, B., and Six, H. (1997). GIS application development with GeoOOA. International Journal of Geographic Information Science, 11(4):307-335. Kuhn, T. S., 1962. The Structure of Scientific Revolutions. University of Chicago Press, Chicago. Lakoff, G. (1987). Women, Fire and Dangerous Things. University of Chicago Press, Chicago. Medin, D.L., Lynch, E.B., Solomon, K.O. (2000). Are there kinds of concepts? Annual Review of Psychology, 51:121-147. 15

Murphy, G.L. (1993). Theories and concept formation. In: Mechelen, I. V., Hampton, J., Michalski, R. S., Theuns, P. (Eds.), Categories and Concepts: theoretical views and inductive data analysis. Academic, New York, 173-199. Murphy, G.L., Medin, D.L. (1985). The role of theories in conceptual coherence. Psychological Review, 92(3):289-316. Noy, N.F., Fergerson, R.W., Musen, M.A. (2000). The knowledge model of Protege-2000: Combining interoperability and flexibility. 2th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2000), Juan-les-Pins, France, 2000. Pepper, S. (2000). The TAO of topic maps. In: Proceedings, XML Europe 2000, June 12-16, 2000, Paris, France. http://www.gca.org/papers/xmleurope2000/papers/s11-01.html. Pullar, D. and Stock, K. (1999). Geospatial modeling: a case study for a statewide land information strategy. In: Goodchild, M.F., Egenhofer, M.J., Fegeas, R., Kottman, C. (Eds.), Interoperating Geographic Information Systems. Kluwer, Boston, 181-194. Renolen, A. (2000). Modeling the real world: conceptual modeling in spatiotemporal information system design. Transactions in GIS, 4(1), 23-42. Roddick, J.F. (1995). A survey of schema versioning issues for database systems. Information and Software Technology, 37(7):383-393. Rosch, E. (1978). Principles of categorization. In: Rosche, E. and Lloyd, B., (Eds), Cognition and Categorization. Erlbaum, Hillsdale, 27-48. Rubenstein-Montano, B. (2000). A survey of knowledge-based information systems for urban planning: moving towards knowledge management. Computers, Environment, and Urban Systems, 24:155-172. Rumbaugh, J., Jacobson, I., Booch, G. (1999). The unified modeling language reference manual. Addison-Wesley, Reading, MA. Shekar, S., Coyle, M., Goyal, B., Liu, D. and Sarkar, S. (1997). Data models in geographic information systems. Communications of the ACM, 40(4), 103-111. Smith, B., Mark, D. M. (1998). Ontology and geographic kinds. In: Poiker, T.K., Chrisman, N., (Eds.), Proceedings, 8 th International Symposium on Spatial Data Handling, 308-320. Smith, L. B. and Samuelson, L. K. (1997). Perceiving and remembering: category stability, variability and development. In: Lamberts K. Shanks, D. (Eds.), Knowledge, Concepts, and Categories. MIT Press, Cambridge, MA, 161-196. Smith, T., Ramakrishnan R., Voisard, A. (1991). An object-based data model and a deductive language for spatio-temporal database applications. In: Gambosi, G., Scholl, M., Widmayer, P. (Eds.) Proceedings of the Workshop of the BRA Esprit project ``GOODS'', BRA Series, Springer-Verlag, Berlin, 1991. Solomon, K. O., Medin, D. L., Lynch, E. (1999). Concepts do more than categorize. Trends in Cognitive Sciences, 3(3), 99-104. Sowa, J.F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks/Cole, New York. Sutcliffe, J.P., (1993). Concept, class, and category in the tradition of Aristotle. In: Mechelen, I.V., Hampton, J., Michalski, R.S., Theuns, P. (Eds.), Categories and Concepts: theoretical views and inductive data analysis. Academic, New York, 35-66. Tamzalit, D. Oussalah, C. (1999). Instances evolution vs classes evolution. In: Bench-Capon, T., Soda, G. (Eds.) Proceedings, Database and Expert Systems Applications: 10 th International Conference, DEXA 99, Florence, Italy, August/September, 16-25. Tamzalit, D., Oussalah, C. (2000). Allowing conceptual emergence through instance evolutionary processes. Engineering Intelligent Systems For Electrical Engineering And Communications, 8 (3), 177-192. Tudhope, D., Alani, H., Jones, C. (2001). Augmenting thesaurus relationships: possibilities for retrieval. Journal of Digital Information, 1(8). Ullman, J.D. (1988). Principles of Database and Knowledge-base systems, Volume 1: classical database systems. Computer Science Press, Rockville, MA. 16

Voisard, A. (1999). Abduction and deduction in geologic hypermaps. In: Guting RH, Papadias D, Lochovsky F (ds.) SSD'99, LNCS 1651. Springer-Verlag, Berlin pp 311-329. Wille, R. (1996). Conceptual structures of multicontexts. In Eklund, P.W., Ellis, G. and Mann, G. (Eds.), Conceptual Structures: Knowledge Representation and Interlingua, Proceedings, 4th Int. Conference on Conceptual Structures, ICCS 96, Syndney, Australia, August 96, LNAI 1115, Springer, New York, 23-39. Wisniewski, E.J. (1998). Property instantiation in conceptual combination. Memory & Cognition, 26(6):1330-1347. 17