The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1
1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 2
1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 3
The GAP year Google Ancient Places Google Digital Humanities award, 2010-11 https://googleancientplaces.wordpress.com/ Cross-disciplinary and multi-national team humanities, classics, archaeology, natural language processing, graphical interface England, Scotland, California Produced the GapVis utility for reading Classics 4
GapVis nrabinowitz.github.com/gapvis/ 5
Pleiades gazetteer of the Classical World http://pleiades.stoa.org/ 6
Pleiades+ etc 7
Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. 7
Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. At run time (Pleiades++) Modern names often used in translations, eg Egypt not in Pleiades. Find alternative names for Egypt in Geonames, eg Aegyptus; search for these in Pleiades. 7
Pleiades+ etc Alignment with Geonames (Pleiades+) More precise lat/long locations. Support for multiple names (incl. modern) for same place, eg Halicarnassus Bodrum. At run time (Pleiades++) Modern names often used in translations, eg Egypt not in Pleiades. Find alternative names for Egypt in Geonames, eg Aegyptus; search for these in Pleiades. Pleiades team s plans Seeking funding from NEH to build robust API, with daily download option cf Geonames. 7
Geoparsing engine Automatic detection of placenames in free text (English). 8
Geoparsing engine Automatic detection of placenames in free text (English). Online RESTful API: http://edina.ac.uk/unlock/texts/ Stand-alone toolkit from LTG coming soon https://wp.ltg.ed.ac.uk/software/geoparser/ 8
Step 1: Geotagging Named Entity Recognition location mentions Also finds person, date/time expressions, organisation Rule-based, uses syntactic and semantic clues in context 9
Step 2: Georesolution Ground location mentions against one or more gazetteers Prefer populated places Prefer locations closer to others mentioned in document. 10
Step 2: Georesolution Ground location mentions against one or more gazetteers Prefer populated places Prefer locations closer to others mentioned in document. Document is about USA. Prefer Paris,Texas to Paris,France. 10
Putting it all together: How GapVis Works classical texts from Google Books 11
Putting it all together: How GapVis Works classical texts from Google Books Edinburgh Geoparser 11
Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser 11
Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser database of toponym URIs tied back to text snippets 11
Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser database of toponym URIs tied back to text snippets finding relationships between places 11
Putting it all together: How GapVis Works classical texts from Google Books Pleiades gazetteer of ancient places Edinburgh Geoparser putting it all together GapVis database of toponym URIs tied back to text snippets finding relationships between places 11
GapVis is still evolving Forks and Re-use GAP2 the Geographic Annotation Platform Hestia2: http://enridaga.github.io/gapvis/gap2 Used in undergraduate teaching Includes Greek original Related projects: Pelagios, Perseus, Perseids Journey of the Hero http://www.perseids.org/sites/joth people network as well as places hand-annotated by students in coursework 12
Perseids Journey of the Hero 13
Evaluation We like it, but is it good? Traditional NLP evaluation against Gold Standard Qualitative evaluation use in coursework 14
Using a hand-annotated Gold Standard Geotagging Precision: were tokens we tagged actually places? Recall: how many did we miss? F-score: harmonic mean of P and R Georesolution: placenames correctly located? Varies considerably across domains 15
Using a hand-annotated Gold Standard Geotagging Precision: were tokens we tagged actually places? Recall: how many did we miss? F-score: harmonic mean of P and R Georesolution: placenames correctly located? Varies considerably across domains Geotagging: 59-82 F-score; Georesolution: 69-92% Richard Tobin, Claire Grover, Kate Byrne, James Reid and Jo Walsh. (2010) Evaluation of georeferencing. In Proceedings of the 6th Workshop on Geographic Information Retrieval (GIR10), Zurich, Switzerland, Feb 2010. 15
Evaluation in Coursework GapVis used in 2 undergraduate Classics courses. hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-i hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-ii hestia.open.ac.uk/reading-herodotus-spatially-in-the-undergraduate-classroom-part-iii Visualisations of time and space help You can ask different questions Digital doesn t necessarily mean easier Adam Rabinowitz, University of Texas, Austin 16
1 Google Ancient Places and GapVis The GAP year How GapVis Works Evaluation 2 Language Technology Group Recent georeferencing projects Problems, issues, lessons learned 17
Recent LTG georeferencing projects DEEP: Digital Exposure of English Placenames Trading Consequences: 19th Century commodity trading Palimpsest: an Edinburgh literature landscape 18
Recent LTG georeferencing projects DEEP: Digital Exposure of English Placenames Trading Consequences: 19th Century commodity trading Palimpsest: an Edinburgh literature landscape Natural Language Processing over historical and cultural documents, with Humanities partners 18
Palimpsest Edinburgh-related fiction from British Library, Hathi Trust, Project Gutenberg, NLS, etc 550 books Created a fine-grained gazetteer of Edinburgh Visualise geo-located snippets of text http://palimpsest.blogs.edina.ac.uk/ 19
Palimpsest online interface: LitLong http://litlong.edina.ac.uk/ 20
Problems, issues, lessons learned New domains, new problems newswire, classics, fiction... OCR issues!! TEI does not enforce well-formed XML No standard document structure Only as good as the gazetteer Distinguishing Places from People Paris, Priam, [Earl of] Montrose,... 21
OCR issues Around the World in Eighty Days 22
Assisted Curation Edinburgh Geo-annotator 23
24
25
26
Conclusions Visualisations of space and time give new perspectives 27
Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required 27
Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect 27
Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor 27
Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor Often need specialist gazetteers 27
Conclusions Visualisations of space and time give new perspectives Tailoring for each new domain is required Automated NLP is fast, but not perfect Quality of copyright-free material often poor Often need specialist gazetteers Collaboration between different disciplines essential 27
28