Syntactic Patterns of Spatial Relations in Text

Similar documents
The Relevance of Spatial Relation Terms and Geographical Feature Types

A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE

Geospatial Semantics. Yingjie Hu. Geospatial Semantics

EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT

UNCERTAINTY IN THE POPULATION GEOGRAPHIC INFORMATION SYSTEM

Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, China

Taxonomies of Building Objects towards Topographic and Thematic Geo-Ontologies

Intelligent GIS: Automatic generation of qualitative spatial information

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

UPDATING AND REFINEMENT OF NATIONAL 1: DEM. National Geomatics Center of China, Beijing

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Joint International Mechanical, Electronic and Information Technology Conference (JIMET 2015)

2002 Journal of Software, )

SPATIAL ANALYSIS OF POPULATION DATA BASED ON GEOGRAPHIC INFORMATION SYSTEM

Curriculum Vitae. Dr. Danqing (Dana) Xiao. Ph.D. in Department of Geography, University of California Santa Barbara.

Spatial Role Labeling CS365 Course Project

EFFICIENT MODELLING OF HEIGHT DATUM BASED ON GIS

SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP

LECTURER: BURCU CAN Spring

A CARTO-LINGUISTIC PARADIGM TAKING A METHODOLOGICAL PERSPECTIVE

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

32 1 Vol. 32, No ACTA GEODAETICA et CARTO GRAPHICA SIN ICA Feb.,2003. The Model of Error Entropy of Area Unit in GIS

Digital Chart Cartography: Error and Quality control

CLRG Biocreative V

EXPLORATORY RESEARCH ON DYNAMIC VISUALIZATION BASED ON SPATIO-TEMPORAL DATABASE

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Uncertain Quadratic Minimum Spanning Tree Problem

V r : A new index to represent the variation rate of geomagnetic activity

ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature

Toponym Disambiguation using Ontology-based Semantic Similarity

THE DESIGN AND IMPLEMENTATION OF A WEB SERVICES-BASED APPLICATION FRAMEWORK FOR SEA SURFACE TEMPERATURE INFORMATION

Preliminary Research on Grassland Fineclassification

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

Machine Learning for natural language processing

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

December 3, Dipartimento di Informatica, Università di Torino. Felicittà. Visualizing and Estimating Happiness in

10/17/04. Today s Main Points

Convex Hull-Based Metric Refinements for Topological Spatial Relations

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

A STUDY ON INCREMENTAL OBJECT-ORIENTED MODEL AND ITS SUPPORTING ENVIRONMENT FOR CARTOGRAPHIC GENERALIZATION IN MULTI-SCALE SPATIAL DATABASE

Event-based Spatio-temporal Database Design

Exploiting WordNet as Background Knowledge

Reassessing the conservation status of the giant panda using remote sensing

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

The Semantic Annotation Based on Mongolian Place Recognition

Extracting Spatial Relations from Document for Geographic Information Retrieval

Can Grid and TIN coexist?

Text Mining. March 3, March 3, / 49

AN OBJECT-ORIENTED DATA MODEL FOR DIGITAL CARTOGRAPHIC OBJECT

Relative adjacencies in spatial pseudo-partitions

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

Research on Computer Aided Design and GIS Conversion Method

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Spati-temporal Changes of NDVI and Their Relations with Precipitation and Temperature in Yangtze River Catchment from 1992 to 2001

Harmonizing Level of Detail in OpenStreetMap Based Maps

A semantic and language-based model of landscape scenes

An Ontology-based Framework for Modeling Movement on a Smart Campus

A General Framework for Conflation

An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages

From Research Objects to Research Networks: Combining Spatial and Semantic Search

Yingjie Hu. Research Interests Geographic Information Science, Geospatial Semantics, Information Value, Data Mining, Spatial Data infrastructures

Unsupervised Vocabulary Induction

MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Learning Features from Co-occurrences: A Theoretical Analysis

Sequences and Information

CORRELATION BETWEEN URBAN HEAT ISLAND EFFECT AND THE THERMAL INERTIA USING ASTER DATA IN BEIJING, CHINA

The Concept of Geographic Relevance

NATURAL LANGUAGE PROCESSING. Dr. G. Bharadwaja Kumar

Context-based Natural Language Processing for. GIS-based Vague Region Visualization

Uncertainty in Spatial Data Mining and Its Propagation

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

An Introduction to Geographic Information System

PROBABILISTIC MATCHING OF MAP OBJECTS IN MULTI-SCALE SPACE

MAPPING THE CHINESE PAST GEOGRAPHY WITH GIS

A MODEL TO EVALUATE THE ENGINEERING GEOLOGY ON FROZEN GROUND FROM XIDATAN TO WUDAOLIANG ALONG THE QINGHAI-XIZANG HIGHWAY USING GIS

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

Nonlinear Controller Design of the Inverted Pendulum System based on Extended State Observer Limin Du, Fucheng Cao

Mappings For Cognitive Semantic Interoperability

Multiword Expression Identification with Tree Substitution Grammars

THE APPLICATION OF GREY SYSTEM THEORY TO EXCHANGE RATE PREDICTION IN THE POST-CRISIS ERA

WHAT YOU WILL LEARN TODAY

Lecture 1: Geospatial Data Models

Tomographic imaging of P wave velocity structure beneath the region around Beijing

Face Recognition Using Global Gabor Filter in Small Sample Case *

6th ICA Mountain Cartography Workshop Lenk/Switzerland. Cartographic Design Issues Utilizing Google Earth for Spatial Communication

Extracting and Analyzing Semantic Relatedness between Cities Using News Articles

POI Type Matching based on Culturally Different Datasets

Single alignment: Substitution Matrix. 16 march 2017

8/28/2011. Contents. Lecture 1: Introduction to GIS. Dr. Bo Wu Learning Outcomes. Map A Geographic Language.

Sequence Alignment Techniques and Their Uses

Linguistic-Valued Approximate Reasoning With Lattice Ordered Linguistic-Valued Credibility

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Lecture 15. Probabilistic Models on Graph

Research and Application of Sun Shadow Location Technology in Video Big Data

IBM Model 1 for Machine Translation

Transcription:

Syntactic Patterns of Spatial Relations in Text Shaonan Zhu, Xueying Zhang Key Laboratory of Virtual Geography Environment,Ministry of Education, Nanjing Normal University,Nanjing, China Abstract: Natural language is considered as the important resource of geospatial information including geospatial relations. Syntactic patterns represent linguistic characteristics of spatial relations described in natural language, which can be effectively used in spatial information retrieval, spatial query in GIS, cartographic language and spatial information retrieval. The previous research focused on induction syntactic patterns of spatial relations manually. In this paper, a machine learning approach is proposed, which is based on a annotation corpus marked by spatial relations. Firstly, the instances of the spatial relations in the corpus are extracted; secondly, sequence alignment algorithm is used to calculate the similarity between s of spatial relations; finally, the instances with higher similarity are generalized to generate the syntactic patterns of geospatial relations. Keywords: spatial relation; sequence alignment; syntactic pattern; text 1 Introduction Spatial relation described in natural language is much closer to human spatial cognition [1]. To understand and analyze how people express spatial relations is a meaningful issue in geographical information science [5]. National Geographic Information and Analysis Center (NCGIA) has implemented a study plan on spatial relations, since 1988. From a linguistics point of view, the description of spatial relations in natural language is mainly dependent on the spatial relation terms and syntactic structures. Spatial relations terms reflect the state of geographical entities and spatial relations. Syntactic structures delimit the semantic of spatial relations described in the specific pragmatic. Although the spatial relation terms, syntax and 1

semantics have strong fuzziness and uncertainty, there still exist some typical syntactic structures, i.e. syntactic patterns. Therefore, the identification of syntactic patterns of spatial relations is helpful for the understanding of spatial relations in natural language and their usage in GIS query, spatial reasoning and geographic information retrieval. The studies aiming at syntactic patterns mainly focus on the field of scene reconstruction from natural language. The Wordseye system using part of speech tagging and syntactic analysis, on the basis of the initial annotation and analysis, transforms syntactic parsing tree into a dependency structure, and expresses spatial interdependent relationship between the spatial objects. Reinbergerr extract the spatial relations with the syntactic structure similar to subject-verb-object and special field knowledge. In general, the syntactic patterns about spatial relations are manually summarized [4-7]. Corpus is the resource of linguistics research and information extraction. Therefore, a large-scale annotated corpus will provide a new way to automatically identify of syntactic patterns of spatial relations in text. This paper is based on a annotation corpus marked by spatial relations,analyses the instances from the corpus, uses the sequence alignment algorithm to calculate the similarity between instances of spatial relations, builds the similarity matrix, group instances of high similarity, generalized to generate the syntactic patterns of spatial relations. 2 The similarity matrix for instances of spatial relations A spatial relation can be abstracted as a binary relation between the two geographical entities, has apparent target and reference. In structure, a instance must contain two geographical entities with the order. This paper defines <left, GNE1, middle, GNE2, right> as the structure of the instance. GNE1 and GNE2 mean the target and reference in the instance of spatial relations (GNE: geographic named entity). Left, middle, right mean the context. In general, a description about the spatial relation is in a sentence, so the context of the description is usually in a sentence. The part of speech is the most basic language feature in the instance of spatial relations. On the other hand, the vocabulary of spatial relations has a strong indication about spatial relations. Geographic named entity, POS (part of speech), and the word which is in the vocabulary of spatial relations can express natural language characteristics of the instance of spatial relations. Figure 1 expresses the structural result of 2

the instances of spatial relations. For example, 阿里山位于嘉义市以东 (ALISHAN is east of JIAYI City) is a instance. After the annotation features are extracted and structured, the final feature sequence is <GNE><V><GNE><SIGNAL> Fig. 1. The instance of spatial relations In order to find the rules from spatial relations, the structure of the sentence which describes spatial relations must be researched. The words in the instances can be regarded as symbols. This paper uses sequence alignment algorithm which is widely applied in the field of biology to compute the similarity. A major theme of genomics is to compare DNA sequences and to find out the public part of the two sequences. Two ways are frequently used. The first one is local sequence alignment, which gets the similarity from local information. Smith-Waterman Algorithm is the classical algorithm. The other one is global sequence alignment, and Needleman-Wunsch Algorithm is the typical algorithm. This paper extends Needleman-Wunsch Algorithm to handle language unit in the sequence. For example, there are two sequences: GNE1 V GNE GNE2 and GNE1 V GNE2. Firstly, two-dimensional table must be generated. Secondly, two sequences with the same length can be built. Fig. 2. The process of sequence alignment Fig. 3. The result of sequence alignment Thirdly, the similarity between the sequences is quantified (see 1). 3

(1) SUM means the score of the whole sequence. If at the same position the language unit in the target sequence is the same with the reference sequence, the score is added one point, and called one match. LENGTH means the sequence length. SIM is the result. The score of the example which is expressed in Figure 3 is 0.75. Because the similarity should be computed between two instances, the matrix can be built. The similarity matrix for instances of spatial relations show the similarity of the whole sequence list. It is prepare for the generalization of syntactic patterns. 3 Generalization of syntactic patterns The sequence can be considered as a syntactic pattern, but it has not a representative. In fact, the instances of spatial relations can be considered as a sample set, then syntactic patterns are generalized according to the value of the similarity matrix. The procedure is defined as follows: (1) traverse the instances, and pick up the instance and the list which corresponds the instance in the similarity matrix; (2) get the most similar instances, the number of the instances is variable(1,2, n); (3) generalize the instances from step two. If the result contains language feature, go back to step two. Otherwise, go back to step one. (4) loop until all the instances are traversed in step one. The key point is how to define the generalization templates. When the two sequences are compared, the same kind of feature will be retained. For example, the target sequence is <GNE1><a/n/n/w/t/v/v/w/t/p><GNE2><n>, and table 1 shows the list which corresponds the instance in the similarity matrix. Tab.1 The sample of generalized syntactic patterns Reference Template Similarity <GNE1><m/n/c/w><GNE2><n> 0.71 <GNE1><v/a/n/f/w/c/SIGNAL/w/v/s/w/f/v/w/GN 0.70 E /w/w><gne2><n> 4

<GNE><SIGNAL/m/n/c/w/w/w><GNE><n> 0.70 <GNE><GNE><n> 0.67 The result of the above-mentioned is <GNE><n/w><GNE><n>, <GNE><GNE><n> 4 Experimental evaluation The syntactic patterns serve for the extraction of spatial relations in Chinese text with the rules. Traditionally, the rules are generalized manually. This paper gives a helpful method to rich the rules. How the text is handled with the rules is as follows. Firstly, the Chinese word segmentations in a sentence are got. At the same time the POS can be got. At last, the spatial signal word can be found by the spatial terms. So the syntactic patterns can be used to match the text which has been handled. That is why the patterns are important. This paper uses the annotated corpus called GeoCorpus. GeoCorpus gets data form encyclopedia of china which contains a full description of the geographic elements of the administrative unit, mountains, rivers, hills, plateaus, plains, basins, etc. This experiment selected 2355 instances from GeoCorpus. After the instances were structured and generalized, 5295 syntactic patterns were obtained, and someone repeated several times. Through statistical analysis, there were 996 syntactic patterns except the patterns were repeated. In 5295 syntactic patterns, there were 920 syntactic patterns which the number of repetitions is less than 10. 50 syntactic patterns which had the highest frequency accounted for 70% of the total. Tab.2 shows the syntactic patterns which are the top ten. Tab. 2 The syntactic patterns after generalization Syntactic patterns Frequency GNE1 SIGNAL GNE2 622 GNE1 GNE2 SIGNAL 403 GNE1 W GNE2 274 GNE1 GNE GNE2 197 GNE1 V GNE2 156 GNE1 GNE2 N 135 GNE1 GNE2 GNE 120 5

GNE1 GNE2 W 109 GNE1 W W GNE2 88 GNE1 SIGNAL GNE GNE2 86 The experimental results show that in Chinese text, spatial relations usually appear in a sentence; there is almost no qualifier before the former GNE in the syntactic patterns, some syntactic patterns is the part of other syntactic patterns. These Linguistic phenomena and rules help to further understand human cognition on spatial relations. 5 Conclusion Based on an annotation corpus, this paper proposes an approach to get the syntactic patterns with the help of the similarity matrix by sequence alignment algorithm. This method overcomes the shortcomings of manual induction, and finds out hidden rules about spatial relations. The future research should focus on how to extract the spatial relations in Chinese text by syntactic patterns, and extend their usage in GIS natural language query. 6 References 1. Du Shihong,Qin Qiming,Wang Qiao.The Spatial relations in GIS and their applications[j].earth Science Frontiers,2006,13(3):069-080. 2. Chen Jun, Zhao Renliang. Spatial relations in GIS: a survey on its key issues and research progress[j]. ACT A GEODAET ICA et CARTOGRAPHICA SINICA, 1999, 28( 2): 95-100. 3. DuShihong, Wang qiao, Li Zhijiang. Definitions of Natural Language Spatital Relations in GIS[J]. Geomatics and Information Science of Wuhan University, 2005, 30 ( 6) : 533-538. 4. Le xiaoqiu, Yang Chongjun, Yu Wenyang. Spatial Concept Extraction Based on Spatial Semantic Role in Natural Language[J]. Editorial Board of Geomatics and information Science of Wuhan university,2005, 30 (12) : 1100-1103. 5. Liu Yu, Gao Yong,Lin Baojia. Research on GIS Path Reconstruction Based on Constrained Chinese Language[J]. Journal of Remote Sensing,2004,8(4):323-330. 6. Ma Linbing,Gong Jianya. Research on Spatial Database Query Oriented Natural Language[J]. Computer Engineering and Applications, 2003 (22) : 16-19. 7. Zhang Xueying, Lv Guonian.Natural-language Spatial Relations and Their Applications in GIS[J].Geo-information science,2007,9(6):77-81. 6

8. Chen Xiaoying, Hu Yi, LU Ruzhan. Extraction of Entity Relation Templates from Text Collections[J].Computer Engineering, 2007,33(22):199-201,2007,9(6):77-81. 9. Zhuang Suxiang, Li Lei, Tan Yongmei. Research of Relation Pattern in Specific Domain[J]. Journal of Beijing University of Posts and Telecommunications,2006,19(5):79-83. 10. Xu zhongneng. Bioinformatics[M].Beijing:Tsinghua Unversity Press,2008. 11. Mark M D, Comas D, Egenhofer M J, et al. Evaluating and Refining Computational Models of Spatial Relations through Cross-linguistic Human-subjects Testing [C]. In: Frank A and Kuhn W. Spatial Information Theory A theoretical Basis for GIS, International Conference COSIT, Semmering, Austria, Lecture Notes in Computer Science, Berlin: Springer-Verlag,1995,988:553-568. 12. Egenhofer, M. J. (1987). Locational SQL: syntax extensions. Surveying Engineering Program, University of Maine. 13. Egenhofer, M. J. (1989). Spatial query languages. Department of Surveying Engineering, Ph.D dissertation: University of Maine. 14. Bowerman M. Learning How to Structure Space for Language: A Cross linguistic Perspective[J]. In: P. Bloom et al, Editors, Language and Space, Cambridge, MA: MIT Press. 1996, 385 436. 15. Bob Coyne,Richard Sproat. WordsEye: An Automatic Text-to-Scene Conversion System. Proceedings of the 28th annual conference on Computer graphics and interactive techniques,los Angeles, CA, USA, August 12-17, 2001[C], New York :ACM,2001:487-496. 16. Marie-Laure Reinberger. Automatic extraction of spatial relations. In Proceedings of the TEMA workshop, EPIA 2005, Portugal. 7