Inferring Location Names for Geographic Information Retrieval
|
|
- Mabel Marshall
- 6 years ago
- Views:
Transcription
1 Inferring Location Names for Geographic Information Retrieval Johannes Leveling and Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen), Hagen, Germany Abstract. For the participation of GIRSA at the GeoCLEF 2007 task, two innovative features were introduced to the geographic information retrieval (GIR) system: identification and normalization of location indicators, i.e. text segments from which a geographic scope can be inferred, and the application of techniques from question answering. In an extension of a previously performed experiment, the latter approach was combined with an approach using semantic networks for geographic retrieval. When using the topic title and description, the best performance was achieved by the combination of approaches (0.196 mean average precision, MAP); adding location names from the narrative part increased MAP to Results indicate that 1) employing normalized location indicators improves MAP significantly and increases the number of relevant documents found; 2) additional location names from the narrative increase MAP and recall, and 3) the semantic network approach has a high initial precision and even adds some relevant documents which were previously not found. For the bilingual experiments, English queries were translated into German by the Promt machine translation web service. Performance for these experiments is generally lower. The baseline experiment (0.114 MAP) is clearly outperformed, achieving the best performance for a setup using title, description, and narrative (0.209 MAP). 1 Introduction In geographic information retrieval (GIR) on textual information, named entity recognition and classification play an important role to identify location names. GIR is concerned with facilitating geographically-aware retrieval of information, which typically results from identifying location names in the text and classifying them into geographic and non-geographic names. The main goal of this paper is to investigate if GIR benefits from an approach which is not solely based on identifying proper nouns corresponding to location names. To this end, the system GIRSA (Geographic Information Retrieval by Semantic Annotation) 1 was developed. GIRSA introduces the notion of location 1 The research described is part of the IRSAW project (Intelligent Information Retrieval on the Basis of a Semantically Annotated Web; LIS (2) Hagen, BIB 48 HGfu 02-01), which is funded by the DFG (Deutsche Forschungsgemeinschaft). C. Peters et al. (Eds.): CLEF 2007, LNCS 5152, pp , c Springer-Verlag Berlin Heidelberg 2008
2 774 J. Leveling and S. Hartrumpf Class Table 1. Definition of location indicator classes Definition; Example location adjective adjective derived from a location name; irisch / Irish for Irland / Ireland demonym name for inhabitants originating from a location; Franzose / Frenchman for Frankreich / France location code code for a location, including ISO code, postal and zip code; HU21 asthefipsregioncodefor Tolna County, Hungary location abbreviation abbreviation or acronym for a location; franz. for französisch / French (mapped to Frankreich / France ) name variant orthographic variant, exonym, or historic name; Cologne for Köln language language name in a text; Portuguese for Portuguese speaking countries (mapped to Portugal, Angola, Cape Verde, East Timor, Mozambique, Brazil ) meta-information document language, place of publication, place of birth for the author; such attributes can be explicitly given by Dublin core elements or similar means or can be inferred from the document unique entity entity associated with a geographic location, including headquarters of an organization, persons, and buildings; Boeing for Seattle, Washington ; Eiffel Tower for Paris location name name of a location, including full name and short form; Republik Korea / Republic of Korea for Südkorea / South Korea indicators and the application of question answering (QA) techniques to GIR. The system is evaluated on documents and topics for GeoCLEF 2007, the GIR task at CLEF Location Indicators Location indicators are text segments from which the geographic scope of a document can be inferred. Important location indicators classes are shown in Table 1. 2 Typically, location indicators are not part of gazetteers, e.g. the morphological and lexical knowledge for adjectives is missing completely. Distinct classes of location indicators contribute differently in assigning a geographic scope to a document; their importance depends on their usage and frequency in the corpus (e.g. adjectives are generally frequent) and the correctness of identifying them because new ambiguities may be introduced (e.g. the ISO code for Tuvalu (TV) is also the abbreviation for television). For identification and normalization of location indicators, tokens are mapped to base forms and looked up in a knowledge base. The knowledge base contains pairs of a location indicator and a normalized location name. This knowledge base was created by collecting raw material from web sources and dictionaries 2 German examples are double-quoted, while English examples are single-quoted.
3 Inferring Location Names for Geographic Information Retrieval 775 (including Wikipedia and an official list of state names 3 ), which was then transformed into a machine-readable form, manually extended, and checked. Location indicators are normalized to location names on different levels of linguistic analysis in GIRSA. Normalization consists of several stages. First, Morphological variations are identified and inflectional endings are removed, reducing location indicators to their base form. In addition, multi-word names are recognized and represented as a single term ( Roten Meer(e)s / Red Sea s Rote Meer / Red Sea ). In the next step, location indicators are normalized, e.g. abbreviations and acronyms are expanded and then mapped to a synset representative, e.g. equivalent location names containing diacritical marks or their equivalent non-accented characters are represented by an element of the name synset (e.g., Québec Quebec ). Finally, prefixes indicating compass directions are separated from the name, which allows to retrieve documents with more specific location names if a more general one was used in the query. Thus, a search for Deutschland / Germany will also return documents containing the phrase Norddeutschland / Northern Germany (exception: Südafrika / South Africa ). We performed first experiments with semantic representation matching for GIR at GeoCLEF 2005 [1]. GIR-InSicht is derived from the deep QA system InSicht [2] and matches reduced semantic networks (SNs) of the topic description (or topic title) to the SNs of sentences from the document collection.this process is quite strict and proceeds sentence by sentence. 4 Before matching starts, the query SN is allowed to be split in parts at specific semantic relations, e.g. at a loc relation (location of a situation or object) of the MultiNet formalism (multilayered extended semantic networks; [3]), to increase recall while not losing too much precision. For GeoCLEF 2007, query decomposition was implemented, i.e. a query can be decomposed into two queries. First, a geographic subquery about the geographic part of the original query is derived and answered by the QA system InSicht. These geographic answers are integrated into the original query on the SN level (thereby avoiding the complicated or problematic integration on the surface level) yielding one or more revised queries. For example, the query Whiskey production on the Scottish Islands (57-GC) leads to the geographic subquery Name Scottish islands. GIR-InSicht also decomposes the alternative query SNs derived by inferential query expansion. In the above example, this results in the subquery Name islands in Scotland. InSicht answers the subqueries on the SNs of the GeoCLEF document collection and the German Wikipedia. For the above subqueries, it correctly delivered islands like Iona and Islay, which in turn lead to revised query SNs which can be paraphrased as Whiskey production 3 Staatennamen.pdf 4 But documents can also be found if the information is distributed across several sentences because a coreference resolver processed the SN representation for all documents.
4 776 J. Leveling and S. Hartrumpf on Iona and Whiskey production on Islay. Note that the revised queries are processed only as alternatives to the original query. Another decomposition strategy produces questions aiming at meronymy knowledge based on the geographic type of a location, e.g. for a country C in the original query a subquery like Name cities in C is generated, whose results are integrated into the original query SN yielding several revised queries. This strategy led to interesting questions like Which country/region/city is located in the Himalaya? (GC-69). In total, both decomposition strategies led to 80 different subqueries for the 25 topics. After the title and description of a topic have been processed independently, GIR-InSicht combines the results. If a document occurs in the title results and the description results, the highest score was taken for the combination. The semantic matching approach is completely independent of the main approach in GIRSA. Some of the functionality of the main approach is also realized in the matching approach, e.g. some of the location indicator classes described above are also exploited in GIR-InSicht (adjectives; demonyms for regions and countries). These location indicators are not normalized, but the query SN is extended by many alternative SNs that are in part derived by symbolic inference rules using the semantic knowledge about location indicators. In contrast, the main approach exploits this information on the level of terms. There has been little research on the role of normalization of location names, inferring locations from textual clues, and applying QA to GIR. Nagel [4] describes the manual construction of a place name ontology containing 17,000 geographic entities as a prerequisite for analyzing German sentences. He states that in German, toponyms have a simple inflectional morphology, but a complex (idiosyncratic) derivational morphology. Buscaldi, Rosso et al. [5] investigate the semi-automatic creation of a geographic ontology, using resources like Wikipedia, WordNet, and gazetteers. Li, Wang et al. [6] introduce the concept of implicit locations, i.e. locations which are not explicitly mentioned in a text. The only case explored are locations that are closely related to other locations. Our own previous work on GIR includes experiments with documents and queries represented as SNs [1], and experiments dealing with linguistic phenomena, such as identifying metonymic location names to increase precision in GIR [7]. Metonymy recognition was not included in GIRSA because we focused on investigating means to increase recall. 3 Experimental Setup GIRSA is evaluated on the data from GeoCLEF 2007, containing 25 topics with a title, a short description, and a narrative part. As for previous GIR experiments on GeoCLEF data [1], documents were indexed with a database management system supporting standard relevance ranking (tf-idf IR model). Documents are preprocessed as follows to produce different indexes: 1. S: As in traditional IR, all words in the document text (including location names) are stemmed, using a snowball stemmer for German.
5 Inferring Location Names for Geographic Information Retrieval 777 Table 2. Frequencies of selected location indicator classes Class # Documents # Locations # Unique locations demonym location abbreviation location adjective location name all Table 3. Results for different retrieval experiments on German GeoCLEF 2007 data Run ID Parameters Results query language index fields rel ret MAP P@5 P@10 P@20 FUHtd1de DE S TD FUHtd2de DE SL TD FUHtd3de DE SLD TD FUHtdn4de DE SL TDN FUHtdn5de DE SLD TDN FUHtd6de DE SLD/O TD GIR-InSicht DE O TD FUHtd1en EN S TD FUHtd2en EN SL TD FUHtd3en EN SLD TD FUHtdn4en EN SL TDN FUHtdn5en EN SLD TDN SL: Location indicators are identified and normalized to a base form of a location name. 3. SLD: In addition, document words are decompounded. German decompounding follows the frequency-based approach described in [8]. 4. O: Documents and queries are represented as SNs and GIR is seen as a form of QA. Typical location indicator classes were selected for normalization in documents and queries. Their frequencies are shown in Table 2. Queries and documents are processed in the same way. The title and short description were used for creating a query. GeoCLEF topics contain a narrative part describing documents which are to be assessed as relevant. Instead of employing a large gazetteer containing location names as a knowledge base for query expansion, additional location names were extracted from the narrative part of the topic. For the bilingual (English-German) experiments, the queries were translated using the Promt web service for machine translation. 5 Query processing then follows the setup for monolingual German experiments. 5
6 778 J. Leveling and S. Hartrumpf Values of three parameters were changed in the experiments, namely the query language (German: DE; English: EN), the index type (stemming only: S; identification of locations, not stemmed: SL; decomposition of German compounds: SLD; based on SNs: O; hybrid: SLD/O), and the query fields used (combinations of title T, description D, and locations from narrative N). Parameters and results for the GIR experiments are shown in Table 3. The table shows relevant and retrieved documents (rel ret), MAP and precision at five, ten, and twenty documents. In total, 904 documents were assessed as relevant for the 25 topics. For the run FUHtd6de, results from GIR-InSicht were merged with results from the experiment FUHtd3de in a straightforward way, using the maximum score. (Run IDs indicate which parameters and topic language were used.) 4 Results and Discussion Identifying and indexing normalized location indicators, decompounding, and adding location names from the narrative part improves performance significantly (paired Student s t-test, P=0.0008), i.e. another 120 relevant documents are found and MAP is increased from (FUHtd1de) to for FUHtdn5de. Decompounding German nouns seems to have different effects on precision and recall (FUHtd2de vs. FUHtd3de and FUHtdn4de vs. FUHtdn5de). More relevant documents are retrieved without decompounding, but initial precision is higher with decompounding. The topic Deaths caused by avalanches occurring in Europe, but not in the Alps (55-GC) contains a negation in the topic title and description. However, adding location names from the narrative part of the topic ( Scotland, Norway, Iceland ) did not notably improve precision for this topic (0.005 MAP in FUHtd3de vs MAP in FUHtdn5de). A small analysis of results found by GIR-InSicht in comparison with the main GIR system revealed that GIR-InSicht retrieved documents for ten topics and returned relevant documents for seven topics. This approach contributes three additional relevant documents to the combination (FUHtd6de). For the topic Crime near St. Andrews (52-GC), zero relevant documents were retrieved in all experiments. Several topics had a high negative difference to the median average precision, i.e. their performance was lower. These topics include Schäden durch sauren Regen in Nordeuropa ( Damage from acid rain in northern Europe, 54-GC), Beratungen der Andengemeinschaft ( Meetings of the Andean Community of Nations, 59-GC), and Todesfälle im Himalaya ( Death on the Himalaya, 69-GC). The following causes for the comparatively low performance were identified: The German decompounding was problematic with respect to location indicators, i.e. location indicator normalization was not applied to the constituents of German compounds (e.g. Andengemeinschaft is correctly split into Anden / Andes and Gemeinschaft / community, but Anden is not identified as a location name for topic 59-GC).
7 Inferring Location Names for Geographic Information Retrieval 779 Several terms were incorrectly stemmed, although they were base forms or proper nouns (e.g. Regen / rain reg and Anden / Andes and for topics 54-GC and 59-GC, respectively). Decompounding led in some cases to terms with a very high frequency, causing a thematic shift in the retrieved documents (e.g. Todesfälle / cases of death was split into Tod / death and Fall / case for topics 55-GC and 69-GC). In several cases, a focused query expansion might have improved performance, i.e. Scandinavia may have been a good term for query expansion in topic 54-GC, but GIRSA s main approach did not use query expansion for GeoCLEF Results for the bilingual (English-German) experiments are generally lower. As for German, all other experiments outperform the baseline (0.114 MAP). The best performance is achieved by an experiment using topic title, description, and location names from the narrative (0.209 MAP). In comparison with results for the monolingual German experiments, the performance drop lies between 4.2% (first experiment) and 27.1% (fifth experiment). The narrative part of a topic contains a detailed description about which documents are to be assessed as relevant (and which not), including additional location names. Extracting location names from the narrative (instead of looking up additional location names in large gazetteers) and adding them to the query notably improves performance. This result is seemingly in contrast to some results from GeoCLEF 2006, where it was found that additional query terms (from gazetteers) degrade performance. A possible explanation is that in this experiment, only a few location names were added (3.16 location names on average for 15 of the 25 topics with a maximum of 13 additional location names). When using a gazetteer, one has to decide which terms are the most useful ones in query expansion. If this decision is based on the importance of a location, a semantic shift in the results may occur, which degrades performance. In contrast, selecting terms from the narrative part increases the chance to expand a query with relevant terms only. 5 Conclusion and Outlook In GIRSA, location indicators were introduced as text segments from which location names can be inferred. Results of the GIR experiments show that MAP is higher when using location indicators instead of geographic proper nouns to represent the geographic scope of a document. This broader approach to identify the geographic scope of a document benefits system performance because proper nouns or location names do not alone imply the geographic scope of a document. The hybrid approach for GIR proved successful, and even a few additional relevant documents were found in the combined run. As GIR-InSicht originates from a deep (read: semantic) QA approach, it returns documents with a high initial precision, which may prove useful in combination with a geographic blind feedback strategy. GIR-InSicht performs worse than the IR baseline, because
8 780 J. Leveling and S. Hartrumpf only 102 documents were retrieved for 10 of the 25 topics. However, more than half (56 documents) turned out to be relevant. Several improvements are planned for GIRSA. These include using estimates for the importance (weight) of different location indicators, possibly depending on the context (e.g. Danish coast Denmark, but German shepherd Germany ), and augmenting the location name identification with a part-ofspeech tagger and a named entity recognizer. Furthermore, the QA methods provide a useful mapping from natural language questions to gazetteer entry points. For example, the expression Scottish Islands is typically not a name of a gazetteer entry, while the geographic subquery answers Iona and Islay typically are. In the future, a tighter coupling between the QA and IR components is planned, exploiting these subquery answers in the IR methods of GIRSA. (Note that this reverses the standard order of processing known from QA: In GIRSA, QA methods are employed to improve performance before subsequent IR phases.) Finally, we plan to investigate the combination of means to increase precision (e.g. recognizing metonymic location names) with means to increase recall (e.g. recognizing and normalizing location indicators). References 1. Leveling, J., Hartrumpf, S., Veiel, D.: Using semantic networks for geographic information retrieval. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF LNCS, vol. 4022, pp Springer, Heidelberg (2006) 2. Hartrumpf, S., Leveling, J.: Interpretation and normalization of temporal expressions for question answering. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF LNCS, vol. 4730, pp Springer, Heidelberg (2007) 3. Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Berlin (2006) 4. Nagel, S.: An ontology of German place names. Corela Cognition, Représentation, Langage Le traitement lexicographique des noms propres (2005) 5. Buscaldi, D., Rosso, P., Garcia, P.P.: Inferring geographical ontologies from multiple resources for geographical information retrieval. In: Proceedings of GIR 2006, Seattle, USA, pp (2006) 6. Li, Z., Wang, C., Xie, X., Wang, X., Ma, W.Y.: Indexing implicit locations for geographical information retrieval. In: Proceedings GIR 2006, Seattle, USA, pp (2006) 7. Leveling, J., Hartrumpf, S.: On metonymy recognition for GIR. In: Proceedings of GIR 2006, Seattle, USA, pp (2006) 8. Chen, A.: Cross-language retrieval experiments at CLEF In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF LNCS, vol. 2785, pp Springer, Heidelberg (2003)
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu
More informationGIR Experimentation. Abstract
GIR Experimentation Andogah Geoffrey Computational Linguistics Group Centre for Language and Cognition Groningen (CLCG) University of Groningen Groningen, The Netherlands g.andogah@rug.nl, annageof@yahoo.com
More informationCitation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.
University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
More informationUniversity of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries
University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries Johannes Leveling, Sven Hartrumpf, Dirk Veiel Intelligent Information and Communication Systems (IICS)
More informationToponym Disambiguation using Ontology-based Semantic Similarity
Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationAn empirical study of the effects of NLP components on Geographic IR performance
International Journal of Geographical Information Science Vol. 00, No. 00, Month 200x, 1 14 An empirical study of the effects of NLP components on Geographic IR performance Nicola Stokes*, Yi Li, Alistair
More informationGEIR: a full-fledged Geographically Enhanced Information Retrieval solution
Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN COMPUTER SCIENCE AND ENGINEERING Ciclo: XXIX Settore Concorsuale di Afferenza: 01/B1 Settore Scientifico Disciplinare: INF/01 GEIR: a
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationLIST OF DOCUMENTS* GROUP OF EXPERTS ON GEOGRAPHICAL NAMES. Twenty-seventh session 15 June 2016 New York, 28 April 2 May 2014
UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES GEGN/29/5/Rev.4 Twenty-seventh session 15 June 2016 New York, 28 April 2 May 2014 LIST OF DOCUMENTS* * Prepared by the UNGEGN Secretariat Symbol Title/Country
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationQuery Performance Prediction: Evaluation Contrasted with Effectiveness
Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,
More informationAnnotation tasks and solutions in CLARIN-PL
Annotation tasks and solutions in CLARIN-PL Marcin Oleksy, Ewa Rudnicka Wrocław University of Technology marcin.oleksy@pwr.edu.pl ewa.rudnicka@pwr.edu.pl CLARIN ERIC Common Language Resources and Technology
More informationCross-language Retrieval Experiments at CLEF-2002
Cross-language Retrieval Experiments at CLEF-2002 Aitao Chen School of Information Management and Systems University of California at Berkeley CLEF 2002 Workshop: 9-20 September, 2002, Rome, Italy Talk
More informationThe GapVis project and automated textual geoparsing
The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1 1 Google Ancient Places and GapVis The
More informationGeographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective
Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Alexander Klippel 1, Alan MacEachren 1, Prasenjit Mitra 2, Ian Turton 1, Xiao Zhang 2, Anuj Jaiswal 2, Kean
More informationResolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York*
UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES Twenty-eighth session New York, 28 April 2 May 2014 GEGN/28/9 English Resolutions from the Tenth United Nations Conference on the Standardization of
More informationA Surface-Similarity Based Two-Step Classifier for RITE-VAL
A Surface-Similarity Based Two-Step Classifier for RITE-VAL Shohei Hattori Satoshi Sato Graduate School of Engineering, Nagoya University Furo-cho, Chikusa-ku, Nagoya, 464-8603, JAPAN {syohei_h,ssato}@nuee.nagoya-u.ac.jp
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationWeierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN
Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 Mathematical models: A research data category? Thomas Koprucki, Karsten
More informationThe Semantic Annotation Based on Mongolian Place Recognition
The Semantic Annotation Based on Mongolian Place Recognition Yila Su, Huimin Li*, Fei Wang College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China. * Corresponding
More informationToponym Disambiguation Using Events
Toponym Disambiguation Using Events Kirk Roberts, Cosmin Adrian Bejan, and Sanda Harabagiu Human Language Technology Research Institute University of Texas at Dallas Richardson TX 75080 {kirk,ady,sanda}@hlt.utdallas.edu
More informationToponym Disambiguation by Arborescent Relationships
Journal of Computer Science 6 (6): 653-659, 2010 ISSN 1549-3636 2010 Science Publications Toponym Disambiguation by Arborescent Relationships Imene Bensalem and Mohamed-Khireddine Kholladi Department of
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationUnited Nations Group Of Experts On Geographical Names
Rudolph MATINDAS, Indonesia or William WATT, Australia Key words: place names, UNGEGN SUMMARY UNGEGN Discussing the strategic aims of UNGEGN and its divisional structure, future direction, and the benefits
More informationToponymic guidelines of Poland for map editors and other users. Fourth revised edition. Submitted by Poland*
UNITED NATIONS Working Paper GROUP OF EXPERTS ON No. 27 GEOGRAPHICAL NAMES Twenty-sixth session Vienna, 2-6 May 2011 Item 19 of the provisional agenda Toponymic guidelines for map editors and other editors
More informationPaper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary,
220 A Framework for Intensional and Extensional Integration of Geographic Ontologies Eleni Tomai 1 and Poulicos Prastacos 2 1 Research Assistant, 2 Research Director - Institute of Applied and Computational
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationProbabilistic Field Mapping for Product Search
Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This
More informationInternet Engineering Jacek Mazurkiewicz, PhD
Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language
More informationA Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs
A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs Pascal Kelm Communication Systems Group Technische Universität Berlin Germany kelm@nue.tu-berlin.de
More informationCollaborative NLP-aided ontology modelling
Collaborative NLP-aided ontology modelling Chiara Ghidini ghidini@fbk.eu Marco Rospocher rospocher@fbk.eu International Winter School on Language and Data/Knowledge Technologies TrentoRISE Trento, 24 th
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationDatabase integration ti
The Project EuroGeoNames Created by Dutch- and German-speaking Division (DGSD) of United Nations Group of Experts on Geographical Names (UNGEGN) Database integration ti Database source scale Database updating
More informationMaja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores
CHRF deconstructed: β parameters and n-gram weights Maja Popović Humboldt University of Berlin Berlin, Germany maja.popovic@hu-berlin.de Abstract Character n-gram F-score (CHRF) is shown to correlate very
More informationMultifacetedToponymRecognitionforStreamingNews
MultifacetedToponymRecognitionforStreamingNews Michael D. Lieberman Hanan Samet Center for Automation Research, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationGeospatial Semantics. Yingjie Hu. Geospatial Semantics
Outline What is geospatial? Why do we need it? Existing researches. Conclusions. What is geospatial? Semantics The meaning of expressions Syntax How you express the meaning E.g. I love GIS What is geospatial?
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationThe Alignment of Formal, Structured and Unstructured Process Descriptions. Josep Carmona
The Alignment of Formal, Structured and Unstructured Process Descriptions Josep Carmona Thomas Chatain Luis delicado Farbod Taymouri Boudewijn van Dongen Han van der Aa Lluís Padró Josep Sànchez-Ferreres
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationEstonian Place Names in the National Information System and the Place Names Register *
UNITED NATIONS Working Paper GROUP OF EXPERTS ON No. 62 GEOGRAPHICAL NAMES Twenty-fifth session Nairobi, 5 12 May 2009 Item 10 of the provisional agenda Activities relating to the Working Group on Toponymic
More informationLearning Textual Entailment using SVMs and String Similarity Measures
Learning Textual Entailment using SVMs and String Similarity Measures Prodromos Malakasiotis and Ion Androutsopoulos Department of Informatics Athens University of Economics and Business Patision 76, GR-104
More informationVariation of geospatial thinking in answering geography questions based on topographic maps
Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study
More informationSo Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks
So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz Johanna Geiß Michael Gertz Institute of Computer Science, Heidelberg University Im
More informationMachine Learning for Interpretation of Spatial Natural Language in terms of QSR
Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Parisa Kordjamshidi 1, Joana Hois 2, Martijn van Otterlo 1, and Marie-Francine Moens 1 1 Katholieke Universiteit Leuven,
More informationNLU: Semantic parsing
NLU: Semantic parsing Adam Lopez slide credits: Chris Dyer, Nathan Schneider March 30, 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk Recall: meaning representations Sam likes Casey
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationPrinciples of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR
Principles of IR Hacettepe University Department of Information Management DOK 324: Principles of IR Some Slides taken from: Ray Larson Geographic IR Overview What is Geographic Information Retrieval?
More informationUnited Nations, UNGEGN, and support for national geographical names standardization programmes
Philippines, 2018 United Nations, UNGEGN, and support for national geographical names standardization programmes Helen Kerfoot, UNGEGN Cecille Blake, UNGEGN Secretariat What is important to know? Background
More informationEvaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018
Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence
More informationFROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationConcept Based Representations for Ranking in Geographic Information Retrieval
Concept Based Representations for Ranking in Geographic Information Retrieval Maya Carrillo 1,2,Esaú Villatoro-Tello 1, Aurelio López-López 1, Chris Eliasmith 3, Luis Villaseñor-Pineda 1, and Manuel Montes-y-Gómez
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationCIDOC-CRM Method: A Standardisation View. Haridimos Kondylakis, Martin Doerr, Dimitris Plexousakis,
The CIDOC CRM CIDOC-CRM Method: A Standardisation View Haridimos Kondylakis, Martin Doerr, Dimitris Plexousakis, Center for Cultural Informatics, Institute of Computer Science Foundation for Research and
More informationCLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview
CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1(B), Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1, and Kirk Roberts 4 1 Tulane University,
More informationOntology-Based News Recommendation
Ontology-Based News Recommendation Wouter IJntema Frank Goossen Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam, the Netherlands frasincar@ese.eur.nl Outline Introduction Hermes: News
More informationISO Plant Hardiness Zones Data Product Specification
ISO 19131 Plant Hardiness Zones Data Product Specification Revision: A Page 1 of 12 Data specification: Plant Hardiness Zones - Table of Contents - 1. OVERVIEW...3 1.1. Informal description...3 1.2. Data
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationTowards Collaborative Information Retrieval
Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,
More informationCan Vector Space Bases Model Context?
Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval
More informationEconomic and Social Council
United Nations Economic and Social Council Distr.: General 23 May 2012 Original: English E/CONF.101/100 Tenth United Nations Conference on the Standardization of Geographical Names New York, 31 July 9
More informationUnited Nations, UNGEGN, and support for national geographical names standardization programmes
Brazil, 2017 United Nations, UNGEGN, and support for national geographical names standardization programmes Helen Kerfoot, Cecille Blake UNGEGN What is important to know? Background on UNGEGN Aims of the
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationA Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series
A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series Teemu Leskinen National Land Survey of Finland Abstract. The Geographic Names Register
More informationHelen Kerfoot. Former Chair, UNGEGN / Emeritus Scientist, Natural Resources Canada
Geographic names authorities, standardization and international cooperation Helen Kerfoot Former Chair, UNGEGN / Emeritus Scientist, Natural Resources Canada BGN at 100 years In recognition of international
More informationUniversities of Leeds, Sheffield and York
promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/4559/
More information19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data
Section 7 Technical issues web services Chapter 19 A Data Repository for Named Places and their Standardised Names Integrated with the Production of National Map Series Teemu Leskinen (National Land Survey
More informationFrom Research Objects to Research Networks: Combining Spatial and Semantic Search
From Research Objects to Research Networks: Combining Spatial and Semantic Search Sara Lafia 1 and Lisa Staehli 2 1 Department of Geography, UCSB, Santa Barbara, CA, USA 2 Institute of Cartography and
More informationEffectiveness of complex index terms in information retrieval
Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores
More informationPIRLS 2016 INTERNATIONAL RESULTS IN READING
Exhibit 2.3: Low International Benchmark (400) Exhibit 2.3 presents the description of fourth grade students achievement at the Low International Benchmark primarily based on results from the PIRLS Literacy
More informationKey Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.
Marinos Kavouras & Margarita Kokla Department of Rural and Surveying Engineering National Technical University of Athens 9, H. Polytechniou Str., 157 80 Zografos Campus, Athens - Greece Tel: 30+1+772-2731/2637,
More informationCreating a Definitive Place Name Gazetteer for Scotland. Bruce M. Gittings
Creating a Definitive Place Name Gazetteer for Scotland Bruce M. Gittings School of GeoSciences, University of Edinburgh, Edinburgh, UK bruce@ed.ac.uk Place-names represent a fundamental geographical identifier,
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationGEOGRAPHICAL NAMES AS PART OF THE GLOBAL, REGIONAL AND NATIONAL SPATIAL DATA INFRASTRUCTURES
GEOGRAPHICAL NAMES AS PART OF THE GLOBAL, REGIONAL AND NATIONAL SPATIAL DATA INFRASTRUCTURES Željko HEĆIMOVIĆ, Željka JAKIR, Zvonko ŠTEFAN, Danijela KUKIĆ zeljko.hecimovic@cgi.hr, zeljka.jakir@cgi.hr,
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationCUNI at the CLEF ehealth 2015 Task 2
CUNI at the CLEF ehealth 2015 Task 2 Shadi Saleh, Feraena Bibyna, and Pavel Pecina Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics, Czech Republic
More informationA Pairwise Document Analysis Approach for Monolingual Plagiarism Detection
A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introuction Plagiarism: Unauthorize use of Text, coe, iea, Plagiarism etection research area has receive increasing attention
More informationThe list of Polish geographical names of the world. Maciej Zych
The list of Polish geographical names of the world Maciej Zych 21st Session of the East Central and South-East Europe Division of the United Nations Group of Experts on Geographical Names Ljubljana, 26th
More informationSpatializing time in a history text corpus
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2014 Spatializing time in a history text corpus Bruggmann, André; Fabrikant,
More informationHidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.
, I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding
More informationAutomated Geoparsing of Paris Street Names in 19th Century Novels
Automated Geoparsing of Paris Street Names in 19th Century Novels L. Moncla, M. Gaio, T. Joliveau, and Y-F. Le Lay L. Moncla ludovic.moncla@ecole-navale.fr GeoHumanities 17 L. Moncla GeoHumanities 17 2/22
More informationUSING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING
POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR
More informationExploiting WordNet as Background Knowledge
Exploiting WordNet as Background Knowledge Chantal Reynaud, Brigitte Safar LRI, Université Paris-Sud, Bât. G, INRIA Futurs Parc Club Orsay-Université - 2-4 rue Jacques Monod, F-91893 Orsay, France {chantal.reynaud,
More informationConceptual Modeling in the Environmental Domain
Appeared in: 15 th IMACS World Congress on Scientific Computation, Modelling and Applied Mathematics, Berlin, 1997 Conceptual Modeling in the Environmental Domain Ulrich Heller, Peter Struss Dept. of Computer
More informationLOUISIANA STUDENT STANDARDS FOR SOCIAL STUDIES THAT CORRELATE WITH A FIELD TRIP TO DESTREHAN PLANTATION KINDERGARTEN
LOUISIANA STUDENT STANDARDS FOR SOCIAL STUDIES THAT CORRELATE WITH A FIELD TRIP TO DESTREHAN PLANTATION KINDERGARTEN Standard 2 Historical Thinking Skills Students distinguish between events, people, and
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More information27. THESE SENTENCES CERTAINLY LOOK DIFFERENT
27 HESE SENENCES CERAINLY LOOK DIEREN comparing expressions versus comparing sentences a motivating example: sentences that LOOK different; but, in a very important way, are the same Whereas the = sign
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationINTRODUCTION TO LOGIC. Propositional Logic. Examples of syntactic claims
Introduction INTRODUCTION TO LOGIC 2 Syntax and Semantics of Propositional Logic Volker Halbach In what follows I look at some formal languages that are much simpler than English and define validity of
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationCLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview
CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1, Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1 and Kirk Roberts 4 1 Tulane University
More informationYear 34 B2 Geography - Continents and Oceans 2018 Key Skills to be covered: Taken from Level 3 Taken from Level 4
Key Skills to be covered: Taken from Level 3 Taken from Level 4 Geographical Enquiry: I ask, Which PHYSICAL features does this place have? I ask, Which HUMAN features does this place have? I give reasons
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More informationDepartment of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling
Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal
More information