Inferring Location Names for Geographic Information Retrieval

Size: px
Start display at page:

Download "Inferring Location Names for Geographic Information Retrieval"

Transcription

1 Inferring Location Names for Geographic Information Retrieval Johannes Leveling and Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen), Hagen, Germany Abstract. For the participation of GIRSA at the GeoCLEF 2007 task, two innovative features were introduced to the geographic information retrieval (GIR) system: identification and normalization of location indicators, i.e. text segments from which a geographic scope can be inferred, and the application of techniques from question answering. In an extension of a previously performed experiment, the latter approach was combined with an approach using semantic networks for geographic retrieval. When using the topic title and description, the best performance was achieved by the combination of approaches (0.196 mean average precision, MAP); adding location names from the narrative part increased MAP to Results indicate that 1) employing normalized location indicators improves MAP significantly and increases the number of relevant documents found; 2) additional location names from the narrative increase MAP and recall, and 3) the semantic network approach has a high initial precision and even adds some relevant documents which were previously not found. For the bilingual experiments, English queries were translated into German by the Promt machine translation web service. Performance for these experiments is generally lower. The baseline experiment (0.114 MAP) is clearly outperformed, achieving the best performance for a setup using title, description, and narrative (0.209 MAP). 1 Introduction In geographic information retrieval (GIR) on textual information, named entity recognition and classification play an important role to identify location names. GIR is concerned with facilitating geographically-aware retrieval of information, which typically results from identifying location names in the text and classifying them into geographic and non-geographic names. The main goal of this paper is to investigate if GIR benefits from an approach which is not solely based on identifying proper nouns corresponding to location names. To this end, the system GIRSA (Geographic Information Retrieval by Semantic Annotation) 1 was developed. GIRSA introduces the notion of location 1 The research described is part of the IRSAW project (Intelligent Information Retrieval on the Basis of a Semantically Annotated Web; LIS (2) Hagen, BIB 48 HGfu 02-01), which is funded by the DFG (Deutsche Forschungsgemeinschaft). C. Peters et al. (Eds.): CLEF 2007, LNCS 5152, pp , c Springer-Verlag Berlin Heidelberg 2008

2 774 J. Leveling and S. Hartrumpf Class Table 1. Definition of location indicator classes Definition; Example location adjective adjective derived from a location name; irisch / Irish for Irland / Ireland demonym name for inhabitants originating from a location; Franzose / Frenchman for Frankreich / France location code code for a location, including ISO code, postal and zip code; HU21 asthefipsregioncodefor Tolna County, Hungary location abbreviation abbreviation or acronym for a location; franz. for französisch / French (mapped to Frankreich / France ) name variant orthographic variant, exonym, or historic name; Cologne for Köln language language name in a text; Portuguese for Portuguese speaking countries (mapped to Portugal, Angola, Cape Verde, East Timor, Mozambique, Brazil ) meta-information document language, place of publication, place of birth for the author; such attributes can be explicitly given by Dublin core elements or similar means or can be inferred from the document unique entity entity associated with a geographic location, including headquarters of an organization, persons, and buildings; Boeing for Seattle, Washington ; Eiffel Tower for Paris location name name of a location, including full name and short form; Republik Korea / Republic of Korea for Südkorea / South Korea indicators and the application of question answering (QA) techniques to GIR. The system is evaluated on documents and topics for GeoCLEF 2007, the GIR task at CLEF Location Indicators Location indicators are text segments from which the geographic scope of a document can be inferred. Important location indicators classes are shown in Table 1. 2 Typically, location indicators are not part of gazetteers, e.g. the morphological and lexical knowledge for adjectives is missing completely. Distinct classes of location indicators contribute differently in assigning a geographic scope to a document; their importance depends on their usage and frequency in the corpus (e.g. adjectives are generally frequent) and the correctness of identifying them because new ambiguities may be introduced (e.g. the ISO code for Tuvalu (TV) is also the abbreviation for television). For identification and normalization of location indicators, tokens are mapped to base forms and looked up in a knowledge base. The knowledge base contains pairs of a location indicator and a normalized location name. This knowledge base was created by collecting raw material from web sources and dictionaries 2 German examples are double-quoted, while English examples are single-quoted.

3 Inferring Location Names for Geographic Information Retrieval 775 (including Wikipedia and an official list of state names 3 ), which was then transformed into a machine-readable form, manually extended, and checked. Location indicators are normalized to location names on different levels of linguistic analysis in GIRSA. Normalization consists of several stages. First, Morphological variations are identified and inflectional endings are removed, reducing location indicators to their base form. In addition, multi-word names are recognized and represented as a single term ( Roten Meer(e)s / Red Sea s Rote Meer / Red Sea ). In the next step, location indicators are normalized, e.g. abbreviations and acronyms are expanded and then mapped to a synset representative, e.g. equivalent location names containing diacritical marks or their equivalent non-accented characters are represented by an element of the name synset (e.g., Québec Quebec ). Finally, prefixes indicating compass directions are separated from the name, which allows to retrieve documents with more specific location names if a more general one was used in the query. Thus, a search for Deutschland / Germany will also return documents containing the phrase Norddeutschland / Northern Germany (exception: Südafrika / South Africa ). We performed first experiments with semantic representation matching for GIR at GeoCLEF 2005 [1]. GIR-InSicht is derived from the deep QA system InSicht [2] and matches reduced semantic networks (SNs) of the topic description (or topic title) to the SNs of sentences from the document collection.this process is quite strict and proceeds sentence by sentence. 4 Before matching starts, the query SN is allowed to be split in parts at specific semantic relations, e.g. at a loc relation (location of a situation or object) of the MultiNet formalism (multilayered extended semantic networks; [3]), to increase recall while not losing too much precision. For GeoCLEF 2007, query decomposition was implemented, i.e. a query can be decomposed into two queries. First, a geographic subquery about the geographic part of the original query is derived and answered by the QA system InSicht. These geographic answers are integrated into the original query on the SN level (thereby avoiding the complicated or problematic integration on the surface level) yielding one or more revised queries. For example, the query Whiskey production on the Scottish Islands (57-GC) leads to the geographic subquery Name Scottish islands. GIR-InSicht also decomposes the alternative query SNs derived by inferential query expansion. In the above example, this results in the subquery Name islands in Scotland. InSicht answers the subqueries on the SNs of the GeoCLEF document collection and the German Wikipedia. For the above subqueries, it correctly delivered islands like Iona and Islay, which in turn lead to revised query SNs which can be paraphrased as Whiskey production 3 Staatennamen.pdf 4 But documents can also be found if the information is distributed across several sentences because a coreference resolver processed the SN representation for all documents.

4 776 J. Leveling and S. Hartrumpf on Iona and Whiskey production on Islay. Note that the revised queries are processed only as alternatives to the original query. Another decomposition strategy produces questions aiming at meronymy knowledge based on the geographic type of a location, e.g. for a country C in the original query a subquery like Name cities in C is generated, whose results are integrated into the original query SN yielding several revised queries. This strategy led to interesting questions like Which country/region/city is located in the Himalaya? (GC-69). In total, both decomposition strategies led to 80 different subqueries for the 25 topics. After the title and description of a topic have been processed independently, GIR-InSicht combines the results. If a document occurs in the title results and the description results, the highest score was taken for the combination. The semantic matching approach is completely independent of the main approach in GIRSA. Some of the functionality of the main approach is also realized in the matching approach, e.g. some of the location indicator classes described above are also exploited in GIR-InSicht (adjectives; demonyms for regions and countries). These location indicators are not normalized, but the query SN is extended by many alternative SNs that are in part derived by symbolic inference rules using the semantic knowledge about location indicators. In contrast, the main approach exploits this information on the level of terms. There has been little research on the role of normalization of location names, inferring locations from textual clues, and applying QA to GIR. Nagel [4] describes the manual construction of a place name ontology containing 17,000 geographic entities as a prerequisite for analyzing German sentences. He states that in German, toponyms have a simple inflectional morphology, but a complex (idiosyncratic) derivational morphology. Buscaldi, Rosso et al. [5] investigate the semi-automatic creation of a geographic ontology, using resources like Wikipedia, WordNet, and gazetteers. Li, Wang et al. [6] introduce the concept of implicit locations, i.e. locations which are not explicitly mentioned in a text. The only case explored are locations that are closely related to other locations. Our own previous work on GIR includes experiments with documents and queries represented as SNs [1], and experiments dealing with linguistic phenomena, such as identifying metonymic location names to increase precision in GIR [7]. Metonymy recognition was not included in GIRSA because we focused on investigating means to increase recall. 3 Experimental Setup GIRSA is evaluated on the data from GeoCLEF 2007, containing 25 topics with a title, a short description, and a narrative part. As for previous GIR experiments on GeoCLEF data [1], documents were indexed with a database management system supporting standard relevance ranking (tf-idf IR model). Documents are preprocessed as follows to produce different indexes: 1. S: As in traditional IR, all words in the document text (including location names) are stemmed, using a snowball stemmer for German.

5 Inferring Location Names for Geographic Information Retrieval 777 Table 2. Frequencies of selected location indicator classes Class # Documents # Locations # Unique locations demonym location abbreviation location adjective location name all Table 3. Results for different retrieval experiments on German GeoCLEF 2007 data Run ID Parameters Results query language index fields rel ret MAP P@5 P@10 P@20 FUHtd1de DE S TD FUHtd2de DE SL TD FUHtd3de DE SLD TD FUHtdn4de DE SL TDN FUHtdn5de DE SLD TDN FUHtd6de DE SLD/O TD GIR-InSicht DE O TD FUHtd1en EN S TD FUHtd2en EN SL TD FUHtd3en EN SLD TD FUHtdn4en EN SL TDN FUHtdn5en EN SLD TDN SL: Location indicators are identified and normalized to a base form of a location name. 3. SLD: In addition, document words are decompounded. German decompounding follows the frequency-based approach described in [8]. 4. O: Documents and queries are represented as SNs and GIR is seen as a form of QA. Typical location indicator classes were selected for normalization in documents and queries. Their frequencies are shown in Table 2. Queries and documents are processed in the same way. The title and short description were used for creating a query. GeoCLEF topics contain a narrative part describing documents which are to be assessed as relevant. Instead of employing a large gazetteer containing location names as a knowledge base for query expansion, additional location names were extracted from the narrative part of the topic. For the bilingual (English-German) experiments, the queries were translated using the Promt web service for machine translation. 5 Query processing then follows the setup for monolingual German experiments. 5

6 778 J. Leveling and S. Hartrumpf Values of three parameters were changed in the experiments, namely the query language (German: DE; English: EN), the index type (stemming only: S; identification of locations, not stemmed: SL; decomposition of German compounds: SLD; based on SNs: O; hybrid: SLD/O), and the query fields used (combinations of title T, description D, and locations from narrative N). Parameters and results for the GIR experiments are shown in Table 3. The table shows relevant and retrieved documents (rel ret), MAP and precision at five, ten, and twenty documents. In total, 904 documents were assessed as relevant for the 25 topics. For the run FUHtd6de, results from GIR-InSicht were merged with results from the experiment FUHtd3de in a straightforward way, using the maximum score. (Run IDs indicate which parameters and topic language were used.) 4 Results and Discussion Identifying and indexing normalized location indicators, decompounding, and adding location names from the narrative part improves performance significantly (paired Student s t-test, P=0.0008), i.e. another 120 relevant documents are found and MAP is increased from (FUHtd1de) to for FUHtdn5de. Decompounding German nouns seems to have different effects on precision and recall (FUHtd2de vs. FUHtd3de and FUHtdn4de vs. FUHtdn5de). More relevant documents are retrieved without decompounding, but initial precision is higher with decompounding. The topic Deaths caused by avalanches occurring in Europe, but not in the Alps (55-GC) contains a negation in the topic title and description. However, adding location names from the narrative part of the topic ( Scotland, Norway, Iceland ) did not notably improve precision for this topic (0.005 MAP in FUHtd3de vs MAP in FUHtdn5de). A small analysis of results found by GIR-InSicht in comparison with the main GIR system revealed that GIR-InSicht retrieved documents for ten topics and returned relevant documents for seven topics. This approach contributes three additional relevant documents to the combination (FUHtd6de). For the topic Crime near St. Andrews (52-GC), zero relevant documents were retrieved in all experiments. Several topics had a high negative difference to the median average precision, i.e. their performance was lower. These topics include Schäden durch sauren Regen in Nordeuropa ( Damage from acid rain in northern Europe, 54-GC), Beratungen der Andengemeinschaft ( Meetings of the Andean Community of Nations, 59-GC), and Todesfälle im Himalaya ( Death on the Himalaya, 69-GC). The following causes for the comparatively low performance were identified: The German decompounding was problematic with respect to location indicators, i.e. location indicator normalization was not applied to the constituents of German compounds (e.g. Andengemeinschaft is correctly split into Anden / Andes and Gemeinschaft / community, but Anden is not identified as a location name for topic 59-GC).

7 Inferring Location Names for Geographic Information Retrieval 779 Several terms were incorrectly stemmed, although they were base forms or proper nouns (e.g. Regen / rain reg and Anden / Andes and for topics 54-GC and 59-GC, respectively). Decompounding led in some cases to terms with a very high frequency, causing a thematic shift in the retrieved documents (e.g. Todesfälle / cases of death was split into Tod / death and Fall / case for topics 55-GC and 69-GC). In several cases, a focused query expansion might have improved performance, i.e. Scandinavia may have been a good term for query expansion in topic 54-GC, but GIRSA s main approach did not use query expansion for GeoCLEF Results for the bilingual (English-German) experiments are generally lower. As for German, all other experiments outperform the baseline (0.114 MAP). The best performance is achieved by an experiment using topic title, description, and location names from the narrative (0.209 MAP). In comparison with results for the monolingual German experiments, the performance drop lies between 4.2% (first experiment) and 27.1% (fifth experiment). The narrative part of a topic contains a detailed description about which documents are to be assessed as relevant (and which not), including additional location names. Extracting location names from the narrative (instead of looking up additional location names in large gazetteers) and adding them to the query notably improves performance. This result is seemingly in contrast to some results from GeoCLEF 2006, where it was found that additional query terms (from gazetteers) degrade performance. A possible explanation is that in this experiment, only a few location names were added (3.16 location names on average for 15 of the 25 topics with a maximum of 13 additional location names). When using a gazetteer, one has to decide which terms are the most useful ones in query expansion. If this decision is based on the importance of a location, a semantic shift in the results may occur, which degrades performance. In contrast, selecting terms from the narrative part increases the chance to expand a query with relevant terms only. 5 Conclusion and Outlook In GIRSA, location indicators were introduced as text segments from which location names can be inferred. Results of the GIR experiments show that MAP is higher when using location indicators instead of geographic proper nouns to represent the geographic scope of a document. This broader approach to identify the geographic scope of a document benefits system performance because proper nouns or location names do not alone imply the geographic scope of a document. The hybrid approach for GIR proved successful, and even a few additional relevant documents were found in the combined run. As GIR-InSicht originates from a deep (read: semantic) QA approach, it returns documents with a high initial precision, which may prove useful in combination with a geographic blind feedback strategy. GIR-InSicht performs worse than the IR baseline, because

8 780 J. Leveling and S. Hartrumpf only 102 documents were retrieved for 10 of the 25 topics. However, more than half (56 documents) turned out to be relevant. Several improvements are planned for GIRSA. These include using estimates for the importance (weight) of different location indicators, possibly depending on the context (e.g. Danish coast Denmark, but German shepherd Germany ), and augmenting the location name identification with a part-ofspeech tagger and a named entity recognizer. Furthermore, the QA methods provide a useful mapping from natural language questions to gazetteer entry points. For example, the expression Scottish Islands is typically not a name of a gazetteer entry, while the geographic subquery answers Iona and Islay typically are. In the future, a tighter coupling between the QA and IR components is planned, exploiting these subquery answers in the IR methods of GIRSA. (Note that this reverses the standard order of processing known from QA: In GIRSA, QA methods are employed to improve performance before subsequent IR phases.) Finally, we plan to investigate the combination of means to increase precision (e.g. recognizing metonymic location names) with means to increase recall (e.g. recognizing and normalizing location indicators). References 1. Leveling, J., Hartrumpf, S., Veiel, D.: Using semantic networks for geographic information retrieval. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF LNCS, vol. 4022, pp Springer, Heidelberg (2006) 2. Hartrumpf, S., Leveling, J.: Interpretation and normalization of temporal expressions for question answering. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF LNCS, vol. 4730, pp Springer, Heidelberg (2007) 3. Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Berlin (2006) 4. Nagel, S.: An ontology of German place names. Corela Cognition, Représentation, Langage Le traitement lexicographique des noms propres (2005) 5. Buscaldi, D., Rosso, P., Garcia, P.P.: Inferring geographical ontologies from multiple resources for geographical information retrieval. In: Proceedings of GIR 2006, Seattle, USA, pp (2006) 6. Li, Z., Wang, C., Xie, X., Wang, X., Ma, W.Y.: Indexing implicit locations for geographical information retrieval. In: Proceedings GIR 2006, Seattle, USA, pp (2006) 7. Leveling, J., Hartrumpf, S.: On metonymy recognition for GIR. In: Proceedings of GIR 2006, Seattle, USA, pp (2006) 8. Chen, A.: Cross-language retrieval experiments at CLEF In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF LNCS, vol. 2785, pp Springer, Heidelberg (2003)

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu

More information

GIR Experimentation. Abstract

GIR Experimentation. Abstract GIR Experimentation Andogah Geoffrey Computational Linguistics Group Centre for Language and Cognition Groningen (CLCG) University of Groningen Groningen, The Netherlands g.andogah@rug.nl, annageof@yahoo.com

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries

University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries Johannes Leveling, Sven Hartrumpf, Dirk Veiel Intelligent Information and Communication Systems (IICS)

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

CLRG Biocreative V

CLRG Biocreative V CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre

More information

An empirical study of the effects of NLP components on Geographic IR performance

An empirical study of the effects of NLP components on Geographic IR performance International Journal of Geographical Information Science Vol. 00, No. 00, Month 200x, 1 14 An empirical study of the effects of NLP components on Geographic IR performance Nicola Stokes*, Yi Li, Alistair

More information

GEIR: a full-fledged Geographically Enhanced Information Retrieval solution

GEIR: a full-fledged Geographically Enhanced Information Retrieval solution Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN COMPUTER SCIENCE AND ENGINEERING Ciclo: XXIX Settore Concorsuale di Afferenza: 01/B1 Settore Scientifico Disciplinare: INF/01 GEIR: a

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

LIST OF DOCUMENTS* GROUP OF EXPERTS ON GEOGRAPHICAL NAMES. Twenty-seventh session 15 June 2016 New York, 28 April 2 May 2014

LIST OF DOCUMENTS* GROUP OF EXPERTS ON GEOGRAPHICAL NAMES. Twenty-seventh session 15 June 2016 New York, 28 April 2 May 2014 UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES GEGN/29/5/Rev.4 Twenty-seventh session 15 June 2016 New York, 28 April 2 May 2014 LIST OF DOCUMENTS* * Prepared by the UNGEGN Secretariat Symbol Title/Country

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

Query Performance Prediction: Evaluation Contrasted with Effectiveness

Query Performance Prediction: Evaluation Contrasted with Effectiveness Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,

More information

Annotation tasks and solutions in CLARIN-PL

Annotation tasks and solutions in CLARIN-PL Annotation tasks and solutions in CLARIN-PL Marcin Oleksy, Ewa Rudnicka Wrocław University of Technology marcin.oleksy@pwr.edu.pl ewa.rudnicka@pwr.edu.pl CLARIN ERIC Common Language Resources and Technology

More information

Cross-language Retrieval Experiments at CLEF-2002

Cross-language Retrieval Experiments at CLEF-2002 Cross-language Retrieval Experiments at CLEF-2002 Aitao Chen School of Information Management and Systems University of California at Berkeley CLEF 2002 Workshop: 9-20 September, 2002, Rome, Italy Talk

More information

The GapVis project and automated textual geoparsing

The GapVis project and automated textual geoparsing The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1 1 Google Ancient Places and GapVis The

More information

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Alexander Klippel 1, Alan MacEachren 1, Prasenjit Mitra 2, Ian Turton 1, Xiao Zhang 2, Anuj Jaiswal 2, Kean

More information

Resolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York*

Resolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York* UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES Twenty-eighth session New York, 28 April 2 May 2014 GEGN/28/9 English Resolutions from the Tenth United Nations Conference on the Standardization of

More information

A Surface-Similarity Based Two-Step Classifier for RITE-VAL

A Surface-Similarity Based Two-Step Classifier for RITE-VAL A Surface-Similarity Based Two-Step Classifier for RITE-VAL Shohei Hattori Satoshi Sato Graduate School of Engineering, Nagoya University Furo-cho, Chikusa-ku, Nagoya, 464-8603, JAPAN {syohei_h,ssato}@nuee.nagoya-u.ac.jp

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 Mathematical models: A research data category? Thomas Koprucki, Karsten

More information

The Semantic Annotation Based on Mongolian Place Recognition

The Semantic Annotation Based on Mongolian Place Recognition The Semantic Annotation Based on Mongolian Place Recognition Yila Su, Huimin Li*, Fei Wang College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China. * Corresponding

More information

Toponym Disambiguation Using Events

Toponym Disambiguation Using Events Toponym Disambiguation Using Events Kirk Roberts, Cosmin Adrian Bejan, and Sanda Harabagiu Human Language Technology Research Institute University of Texas at Dallas Richardson TX 75080 {kirk,ady,sanda}@hlt.utdallas.edu

More information

Toponym Disambiguation by Arborescent Relationships

Toponym Disambiguation by Arborescent Relationships Journal of Computer Science 6 (6): 653-659, 2010 ISSN 1549-3636 2010 Science Publications Toponym Disambiguation by Arborescent Relationships Imene Bensalem and Mohamed-Khireddine Kholladi Department of

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

United Nations Group Of Experts On Geographical Names

United Nations Group Of Experts On Geographical Names Rudolph MATINDAS, Indonesia or William WATT, Australia Key words: place names, UNGEGN SUMMARY UNGEGN Discussing the strategic aims of UNGEGN and its divisional structure, future direction, and the benefits

More information

Toponymic guidelines of Poland for map editors and other users. Fourth revised edition. Submitted by Poland*

Toponymic guidelines of Poland for map editors and other users. Fourth revised edition. Submitted by Poland* UNITED NATIONS Working Paper GROUP OF EXPERTS ON No. 27 GEOGRAPHICAL NAMES Twenty-sixth session Vienna, 2-6 May 2011 Item 19 of the provisional agenda Toponymic guidelines for map editors and other editors

More information

Paper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary,

Paper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary, 220 A Framework for Intensional and Extensional Integration of Geographic Ontologies Eleni Tomai 1 and Poulicos Prastacos 2 1 Research Assistant, 2 Research Director - Institute of Applied and Computational

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Probabilistic Field Mapping for Product Search

Probabilistic Field Mapping for Product Search Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This

More information

Internet Engineering Jacek Mazurkiewicz, PhD

Internet Engineering Jacek Mazurkiewicz, PhD Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language

More information

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs Pascal Kelm Communication Systems Group Technische Universität Berlin Germany kelm@nue.tu-berlin.de

More information

Collaborative NLP-aided ontology modelling

Collaborative NLP-aided ontology modelling Collaborative NLP-aided ontology modelling Chiara Ghidini ghidini@fbk.eu Marco Rospocher rospocher@fbk.eu International Winter School on Language and Data/Knowledge Technologies TrentoRISE Trento, 24 th

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

Database integration ti

Database integration ti The Project EuroGeoNames Created by Dutch- and German-speaking Division (DGSD) of United Nations Group of Experts on Geographical Names (UNGEGN) Database integration ti Database source scale Database updating

More information

Maja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores

Maja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores CHRF deconstructed: β parameters and n-gram weights Maja Popović Humboldt University of Berlin Berlin, Germany maja.popovic@hu-berlin.de Abstract Character n-gram F-score (CHRF) is shown to correlate very

More information

MultifacetedToponymRecognitionforStreamingNews

MultifacetedToponymRecognitionforStreamingNews MultifacetedToponymRecognitionforStreamingNews Michael D. Lieberman Hanan Samet Center for Automation Research, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Lecture 1b: Text, terms, and bags of words

Lecture 1b: Text, terms, and bags of words Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection

More information

Geospatial Semantics. Yingjie Hu. Geospatial Semantics

Geospatial Semantics. Yingjie Hu. Geospatial Semantics Outline What is geospatial? Why do we need it? Existing researches. Conclusions. What is geospatial? Semantics The meaning of expressions Syntax How you express the meaning E.g. I love GIS What is geospatial?

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

The Alignment of Formal, Structured and Unstructured Process Descriptions. Josep Carmona

The Alignment of Formal, Structured and Unstructured Process Descriptions. Josep Carmona The Alignment of Formal, Structured and Unstructured Process Descriptions Josep Carmona Thomas Chatain Luis delicado Farbod Taymouri Boudewijn van Dongen Han van der Aa Lluís Padró Josep Sànchez-Ferreres

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Estonian Place Names in the National Information System and the Place Names Register *

Estonian Place Names in the National Information System and the Place Names Register * UNITED NATIONS Working Paper GROUP OF EXPERTS ON No. 62 GEOGRAPHICAL NAMES Twenty-fifth session Nairobi, 5 12 May 2009 Item 10 of the provisional agenda Activities relating to the Working Group on Toponymic

More information

Learning Textual Entailment using SVMs and String Similarity Measures

Learning Textual Entailment using SVMs and String Similarity Measures Learning Textual Entailment using SVMs and String Similarity Measures Prodromos Malakasiotis and Ion Androutsopoulos Department of Informatics Athens University of Economics and Business Patision 76, GR-104

More information

Variation of geospatial thinking in answering geography questions based on topographic maps

Variation of geospatial thinking in answering geography questions based on topographic maps Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study

More information

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz Johanna Geiß Michael Gertz Institute of Computer Science, Heidelberg University Im

More information

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Parisa Kordjamshidi 1, Joana Hois 2, Martijn van Otterlo 1, and Marie-Francine Moens 1 1 Katholieke Universiteit Leuven,

More information

NLU: Semantic parsing

NLU: Semantic parsing NLU: Semantic parsing Adam Lopez slide credits: Chris Dyer, Nathan Schneider March 30, 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk Recall: meaning representations Sam likes Casey

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Principles of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR

Principles of IR. Hacettepe University Department of Information Management DOK 324: Principles of IR Principles of IR Hacettepe University Department of Information Management DOK 324: Principles of IR Some Slides taken from: Ray Larson Geographic IR Overview What is Geographic Information Retrieval?

More information

United Nations, UNGEGN, and support for national geographical names standardization programmes

United Nations, UNGEGN, and support for national geographical names standardization programmes Philippines, 2018 United Nations, UNGEGN, and support for national geographical names standardization programmes Helen Kerfoot, UNGEGN Cecille Blake, UNGEGN Secretariat What is important to know? Background

More information

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018 Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence

More information

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Concept Based Representations for Ranking in Geographic Information Retrieval

Concept Based Representations for Ranking in Geographic Information Retrieval Concept Based Representations for Ranking in Geographic Information Retrieval Maya Carrillo 1,2,Esaú Villatoro-Tello 1, Aurelio López-López 1, Chris Eliasmith 3, Luis Villaseñor-Pineda 1, and Manuel Montes-y-Gómez

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

CIDOC-CRM Method: A Standardisation View. Haridimos Kondylakis, Martin Doerr, Dimitris Plexousakis,

CIDOC-CRM Method: A Standardisation View. Haridimos Kondylakis, Martin Doerr, Dimitris Plexousakis, The CIDOC CRM CIDOC-CRM Method: A Standardisation View Haridimos Kondylakis, Martin Doerr, Dimitris Plexousakis, Center for Cultural Informatics, Institute of Computer Science Foundation for Research and

More information

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1(B), Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1, and Kirk Roberts 4 1 Tulane University,

More information

Ontology-Based News Recommendation

Ontology-Based News Recommendation Ontology-Based News Recommendation Wouter IJntema Frank Goossen Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam, the Netherlands frasincar@ese.eur.nl Outline Introduction Hermes: News

More information

ISO Plant Hardiness Zones Data Product Specification

ISO Plant Hardiness Zones Data Product Specification ISO 19131 Plant Hardiness Zones Data Product Specification Revision: A Page 1 of 12 Data specification: Plant Hardiness Zones - Table of Contents - 1. OVERVIEW...3 1.1. Informal description...3 1.2. Data

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

Towards Collaborative Information Retrieval

Towards Collaborative Information Retrieval Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

Economic and Social Council

Economic and Social Council United Nations Economic and Social Council Distr.: General 23 May 2012 Original: English E/CONF.101/100 Tenth United Nations Conference on the Standardization of Geographical Names New York, 31 July 9

More information

United Nations, UNGEGN, and support for national geographical names standardization programmes

United Nations, UNGEGN, and support for national geographical names standardization programmes Brazil, 2017 United Nations, UNGEGN, and support for national geographical names standardization programmes Helen Kerfoot, Cecille Blake UNGEGN What is important to know? Background on UNGEGN Aims of the

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series

A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series Teemu Leskinen National Land Survey of Finland Abstract. The Geographic Names Register

More information

Helen Kerfoot. Former Chair, UNGEGN / Emeritus Scientist, Natural Resources Canada

Helen Kerfoot. Former Chair, UNGEGN / Emeritus Scientist, Natural Resources Canada Geographic names authorities, standardization and international cooperation Helen Kerfoot Former Chair, UNGEGN / Emeritus Scientist, Natural Resources Canada BGN at 100 years In recognition of international

More information

Universities of Leeds, Sheffield and York

Universities of Leeds, Sheffield and York promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/4559/

More information

19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data

19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data Section 7 Technical issues web services Chapter 19 A Data Repository for Named Places and their Standardised Names Integrated with the Production of National Map Series Teemu Leskinen (National Land Survey

More information

From Research Objects to Research Networks: Combining Spatial and Semantic Search

From Research Objects to Research Networks: Combining Spatial and Semantic Search From Research Objects to Research Networks: Combining Spatial and Semantic Search Sara Lafia 1 and Lisa Staehli 2 1 Department of Geography, UCSB, Santa Barbara, CA, USA 2 Institute of Cartography and

More information

Effectiveness of complex index terms in information retrieval

Effectiveness of complex index terms in information retrieval Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores

More information

PIRLS 2016 INTERNATIONAL RESULTS IN READING

PIRLS 2016 INTERNATIONAL RESULTS IN READING Exhibit 2.3: Low International Benchmark (400) Exhibit 2.3 presents the description of fourth grade students achievement at the Low International Benchmark primarily based on results from the PIRLS Literacy

More information

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context. Marinos Kavouras & Margarita Kokla Department of Rural and Surveying Engineering National Technical University of Athens 9, H. Polytechniou Str., 157 80 Zografos Campus, Athens - Greece Tel: 30+1+772-2731/2637,

More information

Creating a Definitive Place Name Gazetteer for Scotland. Bruce M. Gittings

Creating a Definitive Place Name Gazetteer for Scotland. Bruce M. Gittings Creating a Definitive Place Name Gazetteer for Scotland Bruce M. Gittings School of GeoSciences, University of Edinburgh, Edinburgh, UK bruce@ed.ac.uk Place-names represent a fundamental geographical identifier,

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

GEOGRAPHICAL NAMES AS PART OF THE GLOBAL, REGIONAL AND NATIONAL SPATIAL DATA INFRASTRUCTURES

GEOGRAPHICAL NAMES AS PART OF THE GLOBAL, REGIONAL AND NATIONAL SPATIAL DATA INFRASTRUCTURES GEOGRAPHICAL NAMES AS PART OF THE GLOBAL, REGIONAL AND NATIONAL SPATIAL DATA INFRASTRUCTURES Željko HEĆIMOVIĆ, Željka JAKIR, Zvonko ŠTEFAN, Danijela KUKIĆ zeljko.hecimovic@cgi.hr, zeljka.jakir@cgi.hr,

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

CUNI at the CLEF ehealth 2015 Task 2

CUNI at the CLEF ehealth 2015 Task 2 CUNI at the CLEF ehealth 2015 Task 2 Shadi Saleh, Feraena Bibyna, and Pavel Pecina Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics, Czech Republic

More information

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introuction Plagiarism: Unauthorize use of Text, coe, iea, Plagiarism etection research area has receive increasing attention

More information

The list of Polish geographical names of the world. Maciej Zych

The list of Polish geographical names of the world. Maciej Zych The list of Polish geographical names of the world Maciej Zych 21st Session of the East Central and South-East Europe Division of the United Nations Group of Experts on Geographical Names Ljubljana, 26th

More information

Spatializing time in a history text corpus

Spatializing time in a history text corpus Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2014 Spatializing time in a history text corpus Bruggmann, André; Fabrikant,

More information

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models. , I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding

More information

Automated Geoparsing of Paris Street Names in 19th Century Novels

Automated Geoparsing of Paris Street Names in 19th Century Novels Automated Geoparsing of Paris Street Names in 19th Century Novels L. Moncla, M. Gaio, T. Joliveau, and Y-F. Le Lay L. Moncla ludovic.moncla@ecole-navale.fr GeoHumanities 17 L. Moncla GeoHumanities 17 2/22

More information

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR

More information

Exploiting WordNet as Background Knowledge

Exploiting WordNet as Background Knowledge Exploiting WordNet as Background Knowledge Chantal Reynaud, Brigitte Safar LRI, Université Paris-Sud, Bât. G, INRIA Futurs Parc Club Orsay-Université - 2-4 rue Jacques Monod, F-91893 Orsay, France {chantal.reynaud,

More information

Conceptual Modeling in the Environmental Domain

Conceptual Modeling in the Environmental Domain Appeared in: 15 th IMACS World Congress on Scientific Computation, Modelling and Applied Mathematics, Berlin, 1997 Conceptual Modeling in the Environmental Domain Ulrich Heller, Peter Struss Dept. of Computer

More information

LOUISIANA STUDENT STANDARDS FOR SOCIAL STUDIES THAT CORRELATE WITH A FIELD TRIP TO DESTREHAN PLANTATION KINDERGARTEN

LOUISIANA STUDENT STANDARDS FOR SOCIAL STUDIES THAT CORRELATE WITH A FIELD TRIP TO DESTREHAN PLANTATION KINDERGARTEN LOUISIANA STUDENT STANDARDS FOR SOCIAL STUDIES THAT CORRELATE WITH A FIELD TRIP TO DESTREHAN PLANTATION KINDERGARTEN Standard 2 Historical Thinking Skills Students distinguish between events, people, and

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

27. THESE SENTENCES CERTAINLY LOOK DIFFERENT

27. THESE SENTENCES CERTAINLY LOOK DIFFERENT 27 HESE SENENCES CERAINLY LOOK DIEREN comparing expressions versus comparing sentences a motivating example: sentences that LOOK different; but, in a very important way, are the same Whereas the = sign

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

INTRODUCTION TO LOGIC. Propositional Logic. Examples of syntactic claims

INTRODUCTION TO LOGIC. Propositional Logic. Examples of syntactic claims Introduction INTRODUCTION TO LOGIC 2 Syntax and Semantics of Propositional Logic Volker Halbach In what follows I look at some formal languages that are much simpler than English and define validity of

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1, Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1 and Kirk Roberts 4 1 Tulane University

More information

Year 34 B2 Geography - Continents and Oceans 2018 Key Skills to be covered: Taken from Level 3 Taken from Level 4

Year 34 B2 Geography - Continents and Oceans 2018 Key Skills to be covered: Taken from Level 3 Taken from Level 4 Key Skills to be covered: Taken from Level 3 Taken from Level 4 Geographical Enquiry: I ask, Which PHYSICAL features does this place have? I ask, Which HUMAN features does this place have? I give reasons

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal

More information