Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Similar documents
University of Groningen. Laser Spectroscopy of Trapped Ra+ Ion Versolato, Oscar Oreste

University of Groningen. Extraction and transport of ion beams from an ECR ion source Saminathan, Suresh

University of Groningen. Statistical inference via fiducial methods Salomé, Diemer

Citation for published version (APA): Sok, R. M. (1994). Permeation of small molecules across a polymer membrane: a computer simulation study s.n.

University of Groningen. Morphological design of Discrete-Time Cellular Neural Networks Brugge, Mark Harm ter

Citation for published version (APA): Ruíz Duarte, E. An invitation to algebraic number theory and class field theory

SUPPLEMENTARY INFORMATION

University of Groningen. Event-based simulation of quantum phenomena Zhao, Shuang

Citation for published version (APA): Shen, C. (2006). Wave Propagation through Photonic Crystal Slabs: Imaging and Localization. [S.l.]: s.n.

Superfluid helium and cryogenic noble gases as stopping media for ion catchers Purushothaman, Sivaji

Citation for published version (APA): Wang, Y. (2018). Disc reflection in low-mass X-ray binaries. [Groningen]: Rijksuniversiteit Groningen.

Theoretical simulation of nonlinear spectroscopy in the liquid phase La Cour Jansen, Thomas

Citation for published version (APA): Raimond, J. J. (1934). The coefficient of differential galactic absorption Groningen: s.n.

Can a Hexapole magnet of an ECR Ion Source be too strong? Drentje, A. G.; Barzangy, F.; Kremers, Herman; Meyer, D.; Mulder, J.; Sijbring, J.

Citation for published version (APA): Kooistra, F. B. (2007). Fullerenes for organic electronics [Groningen]: s.n.

Citation for published version (APA): Fathi, K. (2004). Dynamics and morphology in the inner regions of spiral galaxies Groningen: s.n.

University of Groningen. Statistical Auditing and the AOQL-method Talens, Erik

University of Groningen. Hollow-atom probing of surfaces Limburg, Johannes

Citation for published version (APA): Kooistra, F. B. (2007). Fullerenes for organic electronics [Groningen]: s.n.

University of Groningen. Bifurcations in Hamiltonian systems Lunter, Gerard Anton

Citation for published version (APA): Hoekstra, S. (2005). Atom Trap Trace Analysis of Calcium Isotopes s.n.

Citation for published version (APA): Sarma Chandramouli, V. V. M. (2008). Renormalization and non-rigidity s.n.

Citation for published version (APA): Halbersma, R. S. (2002). Geometry of strings and branes. Groningen: s.n.

University of Groningen. Water in protoplanetary disks Antonellini, Stefano

Substrate and Cation Binding Mechanism of Glutamate Transporter Homologs Jensen, Sonja

System-theoretic properties of port-controlled Hamiltonian systems Maschke, B.M.; van der Schaft, Arjan

University of Groningen

University of Groningen. Enantioselective liquid-liquid extraction in microreactors Susanti, Susanti

The role of camp-dependent protein kinase A in bile canalicular plasma membrane biogenesis in hepatocytes Wojtal, Kacper Andrze

Geometric approximation of curves and singularities of secant maps Ghosh, Sunayana

Citation for published version (APA): Martinus, G. H. (1998). Proton-proton bremsstrahlung in a relativistic covariant model s.n.

University of Groningen. Levulinic acid from lignocellulosic biomass Girisuta, Buana

Citation for published version (APA): Boomsma, R. (2007). The disk-halo connection in NGC 6946 and NGC 253 s.n.

University of Groningen

Sensitized solar cells with colloidal PbS-CdS core-shell quantum dots Lai, Lai-Hung; Protesescu, Loredana; Kovalenko, Maksym V.

Citation for published version (APA): Shen, C. (2006). Wave Propagation through Photonic Crystal Slabs: Imaging and Localization. [S.l.]: s.n.

University of Groningen. Taking topological insulators for a spin de Vries, Eric Kornelis

System theory and system identification of compartmental systems Hof, Jacoba Marchiena van den

Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia

University of Groningen. Managing time in a changing world Mizumo Tomotani, Barbara

University of Groningen. Photophysics of nanomaterials for opto-electronic applications Kahmann, Simon

Citation for published version (APA): Kootstra, F. (2001). Time-dependent density functional theory for periodic systems s.n.

Carbon dioxide removal processes by alkanolamines in aqueous organic solvents Hamborg, Espen Steinseth

Citation for published version (APA): Terluin, I. J. (2001). Rural regions in the EU: exploring differences in economic development s.n.

Spin caloritronics in magnetic/non-magnetic nanostructures and graphene field effect devices Dejene, Fasil

University of Groningen

Dual photo- and redox- active molecular switches for smart surfaces Ivashenko, Oleksii

Citation for published version (APA): Nouri-Nigjeh, E. (2011). Electrochemistry in the mimicry of oxidative drug metabolism Groningen: s.n.

Citation for published version (APA): Borensztajn, K. S. (2009). Action and Function of coagulation FXa on cellular signaling. s.n.

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

University of Groningen

Citation for published version (APA): Hoefman, M. (1999). A study of coherent bremsstrahlung and radiative capture s.n.

Citation for published version (APA): Mendoza, S. M. (2007). Exploiting molecular machines on surfaces s.n.

Citation for published version (APA): Kole, J. S. (2003). New methods for the numerical solution of Maxwell's equations s.n.

University of Groningen

Citation for published version (APA): Brienza, M. (2018). The life cycle of radio galaxies as seen by LOFAR [Groningen]: Rijksuniversiteit Groningen

University of Groningen. Life cycle behavior under uncertainty van Ooijen, Raun

University of Groningen

University of Groningen. Interregional migration in Indonesia Wajdi, Nashrul

Contributions to multivariate analysis with applications in marketing Perlo-ten Kleij, Frederieke van

Published in: ELECTRONIC PROPERTIES OF NOVEL MATERIALS - PROGRESS IN MOLECULAR NANOSTRUCTURES

University of Groningen

University of Groningen

University of Groningen. Enabling Darwinian evolution in chemical replicators Mattia, Elio

Toponym Disambiguation using Ontology-based Semantic Similarity

Optical hole burning and -free induction decay of molecular mixed crystals Vries, Harmen de

University of Groningen

University of Groningen. The binary knapsack problem Ghosh, Diptesh; Goldengorin, Boris

University of Groningen

University of Groningen. The opacity of spiral galaxy disks. Holwerda, Benne

Current density functional theory for optical spectra Boeij, P.L. de; Kootstra, F.; Berger, Johannes; Leeuwen, R. van; Snijders, J.G.

University of Groningen

University of Groningen. Study of compression modes in 56Ni using an active target Bagchi, Soumya

The Metamorphosis of Magic from Late Antiquity to the Early Modern Period Bremmer, Jan N.; Veenstra, Jan R.

The Role of Multinational Enterprises in the Transition Process of Central and Eastern European Economies Marek, Philipp

University of Groningen

Citation for published version (APA): Mollema, A. K. (2008). Laser cooling, trapping and spectroscopy of calcium isotopes s.n.

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

The GapVis project and automated textual geoparsing

University of Groningen

Generic Text Summarization

University of Groningen. A topological extension of GR Spaans, M. Published in: Journal of Physics Conference Series

Citation for published version (APA): Mustajab, M. (2009). Infrastructure investment in Indonesia: Process and impact [S.l.]: s.n.

Citation for published version (APA): Gobius du Sart, G. (2009). Supramolecular triblock copolymer complexes s.n.

Elastodynamic single-sided homogeneous Green's function representation: Theory and examples

GIR Experimentation. Abstract

Room Allocation Optimisation at the Technical University of Denmark

University of Groningen

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Can Vector Space Bases Model Context?

University of Groningen. Bisimulation Theory for Switching Linear Systems Pola, Giordano; van der Schaft, Abraham; Benedetto, Maria D.

Carbon dioxide removal processes by alkanolamines in aqueous organic solvents Hamborg, Espen Steinseth

Recent revisions of phosphate rock reserves and resources: a critique Edixhoven, J.D.; Gupta, J.; Savenije, H.H.G.

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

Automatically generating keywords for georeferenced images

1. Write down the term 2. Write down the book definition 3. Put the definition in your own words 4. Draw an image and/or put a Real Life Example

Maintaining a Directed, Triangular Formation of Mobile Autonomous Agents Cao, Ming; Morse, A.S.; Yu, C.; Anderson, B.D.O.; Dasgupta, S.

Spatializing time in a history text corpus

Computer animation of electron motion in nano-meter scale devices Raedt, Hans De; Michielsen, Kristel

Geodatabase An Overview

Transcription:

University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2010 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 22-03-2018

Summary Eighteen percent of information seekers are demanding geographically intelligent information retrieval systems (Sanderson and Kohler, 2004). The state-of-the-art information retrieval (IR) systems lack the geographical intelligence to effectively answer geography-dependent queries. Geographical information retrieval (GIR) is information retrieval (IR) with geographical awareness or the capability to reason geographically. GIR seeks to retrieve documents using both geographical and non-geographical information. Geographical information is pervasive in both documents and user queries explicitly in the names of places, and in adjectives referring to places, and implicitly in the names of people and organisations, and in adjectives referring to them, etc. Geographical information retrieval systems are posed to mine this information, and use the information to satisfy geographically constrained information needs. This dissertation seeks to answer the following research questions with respect to GIR (1) How can place names, geographical adjectives and the names of people be used to automatically determine the geographical coverage of documents, something we call geographic scope? (2) How well do the automatically determined geographical scopes of documents compare to human assigned scopes? (3) How best can the performance of scope resolution systems be compared? (4) How effective is the document s geographical scope in aiding the resolution of toponyms (place names) contained in the document? (5) How best can the performance of toponym resolution systems be compared? (6) How effective is relevance feedback for geographically constrained information retrieval? (7) How effective is a scope-controlled toponym selection scheme in the relevance feedback procedure? (8) How can geographical scope and type information be incorporated into the document ranking procedure to prioritise documents by geography? How can place names, geographical adjectives and names of people be used to automatically determine the geographical coverage of documents? How well do automatically determined geographical scopes of documents compare to hu- 177

178 Summary man assigned scopes? To answer the research questions above, two assumptions are made (a) Places of the same type or under the same administrative jurisdiction or adjacent to each other are more likely to be mentioned in a given discourse unit. (b) VIPs (i.e., political leaders) in the same geographical region or at the same leadership hierarchy level tend to be mentioned together in a unit of discourse. These assumptions are used to detect the geographical scopes of documents, and to resolve cases in which the scopes are ambiguous. Geographical scopes are modelled as documents using a zone-indexing concept from information retrieval. The proposed schemes achieved a binary evaluation score of 79.1% (using place information) and 58.0% (using VIP information). How effective is the document s geographical scope in aiding the resolution of toponym ambiguity? Toponym resolution is a key component within a GIR system. The task of a toponym resolution system is to non-ambiguously assign to a toponym in a text a location on the surface of the earth, e.g.by means of longitudes and latitudes. The toponym resolution scheme proposed in this thesis exploits the geographical scopes assigned to documents to control the selection of candidate referents for a given toponym. Other features are also investigated, including place types (e.g., city, mountain), place classification (e.g., administrative, vegetative), population size, and the frequency of mention of non-ambiguous or resolved places. The proposed scheme performed robustly on news articles and summaries surpassing the performance of the stateof-the-art systems (Leidner, 2007) reducing error by 57.8%. On a human annotated dataset, the scheme achieved performance scores in the range of 70% - 80%. How best can the performance of scope resolution systems be compared? How best can the performance of toponym resolution systems be compared? Two things are needed to evaluate the performance of scope and toponym resolution systems, namely, (a) gold-standard datasets consisting of reference gazetteers and document collections with human resolved scopes and toponyms; and (b) an evaluation metric to assess the correctness of system assigned scopes and toponyms as compared to the gold-standard resolutions. The state-of-the-art evaluation metrics to measure the performance of scope and toponym resolution systems are binary in nature, e.g., precision and

179 recall. These metrics ignore finer discrepancies among systems under evaluation. This thesis therefore sought to improve the sensitivity of the metric. To measure finer differences among competing scope resolution systems, a new evaluation metric is proposed that integrates the ranked position of candidate scopes in its evaluation score. The proposed toponym resolution evaluation metric extends the binary metric scheme by incorporating the following features in its calculations: (a) the number of candidate places for a given reference (reflecting difficulty), (b) the number of regions needed to traverse from the system resolved referent to correct gold standard referent (reflecting proximity), and (c) the number of feature classes traversed from the system resolved referent type to the correct gold standard referent type (reflecting ontological similarity). How effective is a scope-controlled toponym selection scheme in a relevance feedback procedure? The motivation for query expansion in IR is to enhance the user query by adding terms, which are semantically related, perhaps even synonymous with the terms in the original query. The geographical query expansion scheme investigated in this work applies two kinds of relevance feedback strategies; one scheme adds place names found in the relevant documents directly, and the other scheme adds place names belonging to geographical scopes assigned to relevant documents. The scope-controlled scheme selects place-names to add to feedback queries according to: scope-based(toponyms) = {M S}, where M are the commonly occurring place-names in the relevant documents, and S the commonly shared scopes among the relevant documents. The scope-controlled relevance feedback approach out-performed the default information retrieval system (Lucene IR system) by 9.5% on a residual document collection (i.e., collection consisting of documents not used for query expansion). How can geographical scope and feature type information be incorporated into the document ranking procedure so as to prioritise documents by geography? The task of relevance ranking is to order a retrieved set of documents by relevance to the user s information needs so that the more relevant documents are closer to the top of the list of documents returned. To integrate geographical scope and feature type information into the relevance ranking procedure, non-geographical relevance scores and geographical relevance scores (i.e., geographical scope and scores derived from feature type) are combined linearly and, alternatively, using a weighted harmonic mean. The weighted harmonic-

180 Summary mean derived scheme out-performed the default information retrieval system by 8.9%.