The GapVis project and automated textual geoparsing

Similar documents
A Web-based Geo-resolution Annotation and Evaluation Tool

Enhancing the Curation of Botanical Data Using Text Analysis Tools

Automated Geoparsing of Paris Street Names in 19th Century Novels

Pelagios Commons. Finding the Geography in History

Pelagios Semantic Annotation of Historical Place References in Digital Resources

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Dynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Creating a Definitive Place Name Gazetteer for Scotland. Bruce M. Gittings

ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature

How a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006

Explore. history landscapes family.

A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series

19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data

Pleiades: Beyond the Barrington Atlas. Richard Talbert, principal investigator Sean Gillies, chief engineer

The Importance of Spatial Literacy

Further frontiers in GIS: Extending spatial analysis to textual sources in Archaeology

Principles of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR

1 Archaeology, space, GIS and texts

December 3, Dipartimento di Informatica, Università di Torino. Felicittà. Visualizing and Estimating Happiness in

Subject: Geography Scheme of Work: B1 to B6 Mastery tiles. Term: Autumn/Spring/Summer

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

A spatial literacy initiative for undergraduate education at UCSB

What are we like? Population characteristics from UK censuses. Justin Hayes & Richard Wiseman UK Data Service Census Support

Geospatial Services in Special Libraries: A Needs Assessment Perspective

Changes in the sky Earth and Space Sciences Written for the Australian Curriculum: Science

TEMPLATE FOR CMaP PROJECT

Dynamic Maps and Historical Context

The SyMoGIH project and Geo-Larhra: A method and a collaborative platform for a digital historical atlas

Spatializing time in a history text corpus

A Map Through Time Virtual Historic Cities

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

Assessing pervasive user-generated content to describe tourist dynamics

MIDDLE TENNESSEE STATE UNIVERSITY Global Studies / Cultural Geography Major Matrix Page 1 of 7

THE PAST AND PRESENT STATE OF DIGITAL HISTORY

Design and Development of a Large Scale Archaeological Information System A Pilot Study for the City of Sparti

Middle School. Teacher s Guide MICROPLANTS MAJOR SPONSOR:

Understanding and accessing 2011 census aggregate data

DIGITAL LIBRARY INTRODUCTION FOR TEACHERS COMPILED BY LINDA ROCHE, FAIRFAX HIGH SCHOOL TEACHER LIBRARIAN

National Geographic World English 2 Workbook Answers

CLRG Biocreative V

Seymour Centre 2017 Education Program 2071 CURRICULUM LINKS

Algebra II. Unit 2 Suggested Time Frame. Analyzing Functions and Absolute Value. 19 Days

Brian D. George. GIMS Specialist Ohio Coastal Atlas Project Coordinator and Cartographer. Impacts and Outcomes of Mature Coastal Web Atlases

Where do I live? Geography teaching resource. Paula Owens. Primary. Locating own home address

Master Syllabus Department of Geography GEOG 121: Geography of the Cultural Environment Course Description

Semantics, ontologies and escience for the Geosciences

Section 12 2 Newton S First And Second Laws Of Motion Guided Reading Worksheet Answeres

HANDLING UNCERTAINTY IN INFORMATION EXTRACTION

Contextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data

European geography and what it can do for the future: the case of a bi-communal project in divided Cyprus

From Research Objects to Research Networks: Combining Spatial and Semantic Search

Edexcel B GCSE Geography Course Options

Newspaper archives + text mining = rich sources of historical geo-spatial data

Assessment Management. Math Boxes 7 3. Math Journal 2, p. 218 Students practice and maintain skills through Math Box problems.

Plants Biological Sciences Written for the Australian Curriculum: Science

St. Pölten University of Applied Sciences. Strategy St. Pölten University of Applied Sciences. fhstp.ac.at

Major Languages of the Americas

Extraction of Spatio-Temporal data about Historical events from text documents

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

UC Berkeley International Conference on GIScience Short Paper Proceedings

Exploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad

Geofacets EDUCATION & RESEARCH

Spatial Data Management of Bio Regional Assessments Phase 1 for Coal Seam Gas Challenges and Opportunities

Sample assessment task. Task details. Content description. Year level 7

Trading Consequences: A Case Study of Combining Text Mining and Visualization to Facilitate Document Exploration

Geospatial Intelligence

Amount of Substance and Its Unit Mole- Connecting the Invisible Micro World to the Observable Macro World Part 2 (English, mp4)

GIS PORTFOLIO MOHAMED MAGDY MOHAMED HUSSAIN GIS ENGINEER. UWF GIS ONLINE CERTIFICATE GIS Internship (GIS4944)

Creating A-16 Compliant National Data Theme for Cultural Resources

Archaeology & Digital Humanities

A level Geography. WG6 Induction 2015

Underground Railroad Grades 6-8. Time Frame: 1-3 class periods, depending on the extent of outside research

Treasury Of Norse Mythology Stories Of Intrigue Trickery Love And Revenge Mythology

MEP: Demonstration Project Unit 2: Formulae

The Establishment of a Database on Current Research in UNESCO Biosphere Reserves: Limitations and Opportunities

An Alternate Career Choice for the Geography Major: Map, GIS, or Geographic Information Librarianship

Stellar Astronomy 1401 Spring 2009

GIS Institute Center for Geographic Analysis

Wisconsin Academic Standards Science Grade: 8 - Adopted: 1998

River Dell Regional School District. Algebra I Curriculum

Sample file. Copyright 2002, 2003, 2005, 2006, 2009 Cindy Wiggers Revised 2006 Published by Geography Matters, Inc

Glossary of Common Terms. Guide 2. BAJR Practical Guide Series held by authors

Approximately 45 minutes

33 par&cipants 16 countries 5 sessions 16 presenta&ons

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Moon's Orbit ACTIVITY OVERVIEW NGSS CONNECTIONS NGSS CORRELATIONS

CHEMISTRY MATTER AND CHANGE TEACHER EDITION PDF

The What and Why of UTM Synergy 4.0

Chemistry with Spanish for Science

Developing Cross-cultural Education Programs

Grade 5 Social Studies Curriculum Map Mrs. Ward. Standards. Standards. # Course Outcomes Course Outcome

The maps in this resource can be freely modified and reproduced in the classroom only.

Measuring topographic similarity of toponyms

Geography Policy. for Hertsmere Jewish Primary School

State and National Standard Correlations NGS, NCGIA, ESRI, MCHE

Resources for Treasure Hunt In Earth s Attic Try This!

Montana Content Standards Science Grade: 6 - Adopted: 2016

Literary Geographies, Past and Future. Sheila Hones. The University of Tokyo

Transcription:

The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1

1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 2

1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 3

The GAP year Google Ancient Places Google Digital Humanities award, 2010-11 https://googleancientplaces.wordpress.com/ Cross-disciplinary and multi-national team humanities, classics, archaeology, natural language processing, graphical interface England, Scotland, California Produced the GapVis utility for reading Classics 4

GapVis nrabinowitz.github.com/gapvis/ 5

Pleiades gazetteer of the Classical World http://pleiades.stoa.org/ 6

Pleiades+ etc 7

Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. 7

Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. At run time (Pleiades++) Modern names often used in translations, eg Egypt not in Pleiades. Find alternative names for Egypt in Geonames, eg Aegyptus; search for these in Pleiades. 7

Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. At run time (Pleiades++) Modern names often used in translations, eg Egypt not in Pleiades. Find alternative names for Egypt in Geonames, eg Aegyptus; search for these in Pleiades. Pleiades team s plans Seeking funding from NEH to build robust API, with daily download option cf Geonames. 7

Geoparsing engine Automatic detection of placenames in free text (English). 8

Geoparsing engine Automatic detection of placenames in free text (English). Online RESTful API: http://edina.ac.uk/unlock/texts/ Stand-alone toolkit from LTG coming soon https://wp.ltg.ed.ac.uk/software/geoparser/ 8

Step 1: Geotagging Named Entity Recognition location mentions Also finds person, date/time expressions, organisation Rule-based, uses syntactic and semantic clues in context 9

Step 2: Georesolution Ground location mentions against one or more gazetteers Prefer populated places Prefer locations closer to others mentioned in document. 10

Step 2: Georesolution Ground location mentions against one or more gazetteers Prefer populated places Prefer locations closer to others mentioned in document. Document is about USA. Prefer Paris,Texas to Paris,France. 10

Putting it all together: How GapVis Works classical texts from Google Books 11

Putting it all together: How GapVis Works classical texts from Google Books Edinburgh Geoparser 11

Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser 11

Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser database of toponym URIs tied back to text snippets 11

Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser database of toponym URIs tied back to text snippets finding relationships between places 11

Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser putting it all together GapVis database of toponym URIs tied back to text snippets finding relationships between places 11

GapVis is still evolving Forks and Re-use GAP2 the Geographic Annotation Platform Hestia2: http://enridaga.github.io/gapvis/gap2 Used in undergraduate teaching Includes Greek original Related projects: Pelagios, Perseus, Perseids Journey of the Hero http://www.perseids.org/sites/joth people network as well as places hand-annotated by students in coursework 12

Perseids Journey of the Hero 13

Evaluation We like it, but is it good? Traditional NLP evaluation against Gold Standard Qualitative evaluation use in coursework 14

Using a hand-annotated Gold Standard Geotagging Precision: were tokens we tagged actually places? Recall: how many did we miss? F-score: harmonic mean of P and R Georesolution: placenames correctly located? Varies considerably across domains 15

Using a hand-annotated Gold Standard Geotagging Precision: were tokens we tagged actually places? Recall: how many did we miss? F-score: harmonic mean of P and R Georesolution: placenames correctly located? Varies considerably across domains Geotagging: 59-82 F-score; Georesolution: 69-92% Richard Tobin, Claire Grover, Kate Byrne, James Reid and Jo Walsh. (2010) Evaluation of georeferencing. In Proceedings of the 6th Workshop on Geographic Information Retrieval (GIR10), Zurich, Switzerland, Feb 2010. 15

Evaluation in Coursework GapVis used in 2 undergraduate Classics courses. hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-i hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-ii hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-iii Visualisations of time and space help You can ask different questions Digital doesn t necessarily mean easier Adam Rabinowitz, University of Texas, Austin 16

1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 17

Recent LTG georeferencing projects DEEP: Digital Exposure of English Placenames Trading Consequences: 19th Century commodity trading Palimpsest: an Edinburgh literature landscape 18

Recent LTG georeferencing projects DEEP: Digital Exposure of English Placenames Trading Consequences: 19th Century commodity trading Palimpsest: an Edinburgh literature landscape Natural Language Processing over historical and cultural documents, with Humanities partners 18

Palimpsest Edinburgh-related fiction from British Library, Hathi Trust, Project Gutenberg, NLS, etc 550 books Created a fine-grained gazetteer of Edinburgh Visualise geo-located snippets of text http://palimpsest.blogs.edina.ac.uk/ 19

Palimpsest online interface: LitLong http://litlong.edina.ac.uk/ 20

Problems, issues, lessons learned New domains, new problems newswire, classics, fiction... OCR issues!! TEI does not enforce well-formed XML No standard document structure Only as good as the gazetteer Distinguishing Places from People Paris, Priam, [Earl of] Montrose,... 21

OCR issues Around the World in Eighty Days 22

Assisted Curation Edinburgh Geo-annotator 23

24

25

26

Conclusions Visualisations of space and time give new perspectives 27

Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required 27

Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect 27

Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor 27

Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor Often need specialist gazetteers 27

Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor Often need specialist gazetteers Collaboration between different disciplines essential 27

28