Geographic Informa0on Retrieval: Are we making progress? Ross Purves, University of Zurich

Similar documents
Automatically generating keywords for georeferenced images

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Principles of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR

Measuring topographic similarity of toponyms

A Web-based Geo-resolution Annotation and Evaluation Tool

Special issue introduction: Spatial approaches to information search

From Research Objects to Research Networks: Combining Spatial and Semantic Search

Universities of Leeds, Sheffield and York

Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs

Advances in IR Evalua0on. Ben Cartere6e Evangelos Kanoulas Emine Yilmaz

The use of GIS tools for analyzing eye- movement data

Toponym Disambiguation using Ontology-based Semantic Similarity

Maintaining and using spa1al data from a qualita1ve perspec1ve

Probabilistic Field Mapping for Product Search

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

UC Berkeley International Conference on GIScience Short Paper Proceedings

Spatio-Textual Indexing for Geographical Search on the Web

Spatial Information Retrieval

Chapter 1. GIS Fundamentals

Syntactic Patterns of Spatial Relations in Text

Convex Hull-Based Metric Refinements for Topological Spatial Relations

The Global Statistical Geospatial Framework and the Global Fundamental Geospatial Themes

The Relevance of Spatial Relation Terms and Geographical Feature Types

Universities of Leeds, Sheffield and York

CITY OF MORRO BAY WRFCAC Presentation May 3, 2016

Spatializing time in a history text corpus

Concept Based Representations for Ranking in Geographic Information Retrieval

GIR Experimentation. Abstract

4. GIS Implementation of the TxDOT Hydrology Extensions

The Concept of Geographic Relevance

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

Fi#h Mee(ng of the Inter-agency and Expert Group on Sustainable Development Goals Indicators O"awa, Canada March 2017

A General Framework for Conflation

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Nearest Neighbor Search with Keywords in Spatial Databases

The GapVis project and automated textual geoparsing

Geographic Information Retrieval. Danilo Montesi Yisleidy Linares Zaila

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs

UBGI and Address Standards

An empirical study of the effects of NLP components on Geographic IR performance

Detec%ng and Analyzing Urban Regions with High Impact of Weather Change on Transport

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Pilot Evaluation of a UAS Detect-and-Avoid System s Effectiveness in Remaining Well Clear!

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

Mobile re-use of geo-referenced media content of Sächsische Zeitung

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Are earthquakes triggered by hydraulic fracturing more common than previously recognized? Aus$n Holland and Amberlee Darold GSA South- Central Sec$on

GPS-tracking Method for Understating Human Behaviours during Navigation

Analysis of Regional Fundamental Datasets Questionnaire

Database and Representation Issues in Geographic Information Systems (GIS)

Can Vector Space Bases Model Context?

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Intelligent GIS: Automatic generation of qualitative spatial information

Modeling Discrete Processes Over Multiple Levels Of Detail Using Partial Function Application

CSE 21 Math for Algorithms and Systems Analysis. Lecture 10 Condi<onal Probability

Theory, Concepts and Terminology

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Collaborative topic models: motivations cont

Comparing Flickr tags to a geomorphometric classification. Christian Gschwend and Ross S. Purves

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

DART Tutorial Part IV: Other Updates for an Observed Variable

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

Automated Geoparsing of Paris Street Names in 19th Century Novels

ECONOMIC AND SOCIAL COUNCIL 13 July 2007

Synergy between GNSS and GIS Application for Monitoring Land use Production An Integrated Approach

Combining Geospatial and Statistical Data for Analysis & Dissemination

8/28/2011. Contents. Lecture 1: Introduction to GIS. Dr. Bo Wu Learning Outcomes. Map A Geographic Language.

Assessing pervasive user-generated content to describe tourist dynamics

An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages

UC Santa Barbara Specialist Meeting Position Papers and Reports

33 par&cipants 16 countries 5 sessions 16 presenta&ons

Extracting Location Information from Crowd-sourced Social Network Data

Mapping geospatial events based on extracted spatial information from web documents

Taxonomies of Building Objects towards Topographic and Thematic Geo-Ontologies

A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE

Indexing Structures for Geographic Web Retrieval

Exploring Class Discussions from a Massive Open Online Course (MOOC) on Cartography

Steve Pietersen Office Telephone No

University of Zurich. Modelling vague places with knowledge from the Web. Zurich Open Repository and Archive

Introduction to Geographic Information Science. Updates/News. Last Lecture 1/23/2017. Geography 4103 / Spatial Data Representations

A Presenta*on to the Interna*onal Workshop on Global Fundamental Geospa*al Data Themes for Africa By Sultan Mohammed Alya Chairman of UN-GGIM: Africa

Extracting Touristic Information from Online Image Collections

GIS data models: how to georeference a farm? Nicola Ferrè

6. Evolu)on, Co- evolu)on (and Ar)ficial Life) Part 1

arxiv: v1 [cs.cl] 21 Jun 2018

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

How a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006

GEIR: a full-fledged Geographically Enhanced Information Retrieval solution

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

A Comparative Study of Current Clinical NLP Systems on Handling Abbreviations

Crowdsourcing Semantics for Big Data in Geoscience Applications

Volume Editor. Hans Weghorn Faculty of Mechatronics BA-University of Cooperative Education, Stuttgart Germany

Understanding Geographic Information System GIS

SEMANTIC ALIGNMENT OF DOCUMENTS WITH 3D CITY MODELS

Data Creation and Editing

Developing Geo-temporal Context from Implicit Sources with Geovisual Analytics

Polynomials and Gröbner Bases

CSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1

Transcription:

Geographic Informa0on Retrieval: Are we making progress? Ross Purves, University of Zurich

Outline Where I m coming from: defini0ons, experiences and requirements for GIR Brief lis>ng of one (of many possible) sets of challenges for GIR Progress and opportuni>es with focus on why Geographic IR SomeDimension IR Interspersed with a personal selec>on of relevant, but perhaps less well known, papers

Star0ng points

Ray Larson: seminal work in GIR an applied research area that combines aspects of DBMS research, User Interface Research, GIS research, and Informa0on Retrieval research,... concerned with indexing, searching, retrieving and browsing of geo- referenced informa0on sources, and the design of systems to accomplish these tasks effec0vely and efficiently." Larson et al. (1996)

Refining the defini0on GIR is therefore concerned with improving the quality of geographically specific informa0on retrieval with a focus on access to unstructured documents such as those found on the Web". (Jones and Purves, 2008)

My perspec0ves on spa0al search Work da>ng back to 2002 with Chris Jones, Mark Sanderson, Alistair Edwardes, Paul Clough, Curdin Derungs and others Two European projects and related research SPIRIT Enabling spa0al search on internet documents Tripod Indexing images based on loca0ons and associated geographies Co- chair (with Chris) of Workshop on Geographic Informa0on Retrieval (8 edi>ons so far) Working with linguists on language and space

SPIRIT Spa0ally Aware Informa0on Retrieval on the Internet Handled queries of the form <theme> <spatial relationship> <location> One of several early examples of complete systems (e.g. Jones et al., 2002; Chen et al., 2006; Lieberman et al., 2007) Not based on Local Directory data (c.f. early examples of Local Search)

Basic conceptual model of a SPIRIT- like system Oben forgocen!!

Basic precondi0ons for GIR GIR (in my view) becomes interes>ng when a few precondi>ons are met: Informa0on needs complex, varied and oben underspecified Large collec0ons of unstructured documents (may or may not be thema>cally related) Simple binary (DBMS type) retrieval not effec0ve Mul>ple geographic granulari0es

and so to challenges for GIR based on an editorial in IJGIS (Jones and Purves, 2008)

The challenges Detec0ng geographical references in the form of place names and associated spa0al natural language qualifiers within text documents and in users queries; disambigua0ng place names to determine which par>cular instance of a name is intended; geometric interpreta0on of the meaning of vague place names, such as the Midlands and of vague spa0al language such as near ; indexing documents with respect to their geographic context as well as their non- spa0al thema0c content; ranking the relevance of documents with respect to geography as well as theme; developing effec0ve user interfaces that help users to find what they want; and developing methods to evaluate the success of GIR.

Detec0ng geographical references Basic task underlying GIR iden0fying candidate spa0al referents in text Underpinned by NER methods but oben simple gazeweer lookup is key Queries (and social media) have very different proper0es to text documents Referents typically treated as part of a bag of words/ points model Usually predicated on specific placenames (Santa Barbara) (as opposed to types (a beach)) referents Language modelling approaches (c.f. Vanessa s talk) overcome some of these problems but bring others

Wolf, S.J, Henrich, A. and Blank, D. 2014 Characteriza>on of Toponym Usages in Texts. 8 th Workshop on Geographic Informa>on Retrieval, Dallas, Texas.

and disambigua0ng place names Very large propor0ons of candidate referents are ambiguous Humans deal with this very well Very simple methods (default sense typically) achieve very high precision, especially at city level granulari>es Oben assume random toponym distribu0on and focus on coarse granulari0es Sources such as Wikipedia (for co- occurrence) may increase unevenness of coverage (c.f. Mark Graham) GazeWeer proper0es oben unques0oningly accepted

Moncla, L., Renteria- Agualimpia, W., Nogueras- Iso, J. & Gaio, M. (2014) Geocoding for texts with fine- grain toponyms: an experiment on a geoparsed hiking descripgons corpus. In Proceedings of the ACM SIGSPATIAL GIS 2014, Dallas, Texas.

Vague place names and spa0al language Recognises importance of vague spa0al language and incompleteness of gazeweers Many studies have demonstrated possibili>es of delinea0ng (and more rarely iden0fying) vague place names (through co- occurrence and georeferenced spa>al media) Vagueness and its implica0ons vary with granularity and user need (oben ignored) Reasoning in search typically discards vagueness (other than as a distance ranking measure) Many official, crisply bounded geometries are also used vaguely in natural language

Davies, C., Holt, I., Green, J., Harding, J. and Diamond, L. (2009) User needs and implica>ons for modelling vague named places. SpaGal CogniGon & ComputaGon 9 (3), 174-94.

Indexing Indexes fundamental to efficient search of both text and space In GIR early experiments showed that simple approaches (e.g. separate thema>c and spa>al indexes) were adequate Recent work uses more complex ideas to combine dimensions but advantages s>ll unclear Index efficiency is possibly of less interest to most par>cipants here but effec0veness is also key: 1. How should we represent documents for indexing? 2. Should indexes be space (e.g. Vanessa s language models) or object primary (e.g. POI data)? 3. How should query vs. document footprints be represented?

Chen, L., Cong, G., Jensen, C. S., & Wu, D. (2013). Spa>al keyword query processing: an experimental evalua>on. Proceedings of the VLDB Endowment, 6(3), 217-228.

Ranking (and relevance) Relevance is o_en reduced to a binary or ordinal quality in IR/ GIR No>ons of relevance are fundamental to similarity measures used in ranking Most approaches use rela>vely simple models note that measures must also be tractable on large document collec0ons First (simple) experiments suggest spa0al diversity also important Relevance literature is, in my view, confused/ confusing very licle is directly related to text documents (oben focuses on POI search) Many ranking approaches use Euclidean distance w.r.t. some no>onal point Underlying geographic distribu0on of theme of interest oben ignored

J. Tang, M. Sanderson. (2014) EvaluaGon and User Preference Study on SpaGal Diversity. In Proceedings of the 32nd European Conference on IR Research on Advances in Informa>on Retrieval (ECIR 2010)

User interfaces User interfaces central to query (re)formula0on and results display Most mainstream examples s>ll focus on points on maps, and basic cartographic issues (e.g. overplopng) are ignored Surprisingly limited crossover from the geovisualisa>on community (especially for search rather than explora0on) Search strategies known to be important in search (c.f. informa0on foraging model from Stuart Card) yet typically we s>ll adopt a one size fits all approach Large, complex, corpora, crying out for effec>ve visualiza>on approaches which capture variability and richness

H. Samet, M. D. Adelfio, B. C. Fruin, M. D. Lieberman, J. Sankaranarayanan. 2013. PhotoStand: A Map Query Interface for a Database of News Photos. PVLDB, 6(12):1350-1353.

Evalua0on Much evalua>on in IR has focussed on system- centred compara0ve approaches (e.g. GeoCLEF) Pure text baselines oben hard to beat func>on of corpora, query type, granularity and relevance judgements Limited, controlled access to query logs (c.f. AOL debacle) however work based on these has great poten>al for bewer understanding user needs (realis0c evalua0on) Increasing use of approaches based on crowd sourcing (e.g. CrowdFlower) but literature suggests evalua0ng specific geographic relevance is challenging, even with local knowledge Long tail queries are important user- centred, qualita0ve, approaches have great poten>al S>ll a need for community wide coopera0on and evalua0on to allow meaningful comparison

Mandl, T., Carvalho, P., Di Nunzio, G. M., Gey, F., Larson, R. R., Santos, D., & Womser- Hacker, C. (2009). GeoCLEF 2008: the CLEF 2008 cross- language geographic informa>on retrieval track overview. In EvaluaGng Systems for MulGlingual and MulGmodal InformaGon Access (pp. 808-821). Springer Berlin Heidelberg.

Some closing remarks Unstructured text documents very rich, great poten>al for answering geographic ques0ons (c.f. Chris Jones) Balance of research awen0on (social media vs. more tradi>onal text) perhaps uneven Developing complete systems complex and oben neglected need for more component sharing reproducible research? GIR is necessarily interdisciplinary great poten>al for more effec0ve collabora0ons