A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia
|
|
- Matthew Short
- 5 years ago
- Views:
Transcription
1 A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia Daryl Woodward, Jeremy Witmer, and Jugal Kalita University of Colorado, Colorado Springs Computer Science Department 1420 Austin Bluffs Pkwy Colorado Springs, CO Abstract We target in this paper the challenge of extracting geospatial data from the article text of the English Wikipedia. We present the results of a Hidden Markov Model (HMM) based approach to identify location-related named entities in the our corpus of Wikipedia articles, which are primarily about battles and wars due to their high geospatial content. The HMM NER process drives a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name (often referred to as grounding). We compare our results to a previously developed data structure and algorithm for disambiguating place names that can have multiple coordinates. We demonstrate an overall f-measure of 79.63% identifying and geocoding place names. Finally, we compare the results of the HMM-driven process to earlier work using a Support Vector Machine. I. Introduction The amount of user-generated, unstructured content on the Internet increases significantly every day. Consequently, the need for techniques to automatically extract information from unstructured text becomes increasingly important. A significant number of queries on the Internet target geospatial data, making this a good area of study. Therefore, this paper focuses on approaches to extracting geospatial information from unstructured text. While Named Entity Recognition (NER) has seen much progress in recent years, NER for location names is only the first step in extracting geospatial information. The extracted place names must then be geocoded to at least a latitude and longitude coordinate pair to allow visualization, geospatial search, and information retrieval of text based on locations. We refer to this process as geocoding and disambiguation. It is also often referred to as grounding. Our work in this paper has a very specific focus: to maximize the efficiency of the initial geospatial NER that feeds our geocoding and disambiguation process. Increasing the accuracy of the raw NE data will improve our final results. Specifically, a more refined list of geospatial named entities (place names) from unstructured text documents such as Wikipedia articles, will aid in geocoding ambiguous names to the correct geospatial coordinates. Our research in this area is motivated by a number of factors. The goal of the ongoing development of the Geografikos package is a software system that creates a database of Wikipedia articles in which each article has an associated structured set of geospatial entities extracted from the article. This database will allow geospatial querying, information retrieval, and geovisualization to be applied to the Wikipedia articles. Further, we wish to opensource the Geografikos software on completion. Section 2 of the paper discusses relevant background research and information on the LingPipe library. Section 3 focuses on our process for choosing training data. Section 4 summarizes our method for generating the most accurate results. Section 5 quickly recaps the geocoding and disambiguation process. Section 6 discusses our results and compares them to the SVM used for the same process used by Witmer and Kalita [1]. Finally, we conclude with a discussion of possibilities for future research. II. Background Research Named entity recognition refers to the extraction of words and strings of text within documents that represent discrete concepts, such as names and locations. The term Named Entity Recognition describes the operations in
2 natural language processing that seek to extract the names of persons, organizations, locations, other proper nouns, and numeric terms such as percentages and dollar amounts. The term Named Entity was defined by the MUC-6, sponsored by DARPA in 1996 [2]. The NE recognition task was further defined, and expanded to language independence by the Conference on Natural Language Learning shared task for the 2002 and 2003 conferences. Numerous approaches have been tried since MUC- 6 to increase performance in NER, including Hidden Markov Models, Conditional Random Fields, Maximum Entropy models, Neural Networks, and Support Vector Machines (SVM). Dakka and Cucerzan demonstrated an SVM that achieves an f-measure of for LOC entities in Wikipedia articles, and an f-measure of across all NE classes [3]. Although research into text classification and NER has found that SVMs provide good performance on NER tasks, HMMs can produce similar results with minimal training. Hidden Markov Models (HMMs) have also shown excellent results. Klein et al. demonstrated that a characterlevel HMM can identify both English and German named entities with an f-measure of and for LOC entities in testing data, respectively [4]. In [5], Zhou and Su evaluated a HMM and HMM-based chunk tagger on the MUC-6 and MUC-7 English NE tasks, achieving f- measures of and 0.941, respectively. To compare to the SVM-based approach used by Witmer in [1], we chose to use the HMM implemented by the LingPipe library for NER which if participated in the CoNLL 2002 NER Task would have been tied for fourth place with an f-measure of For simplicity, Lingpipe is a fully developed Java package that easily integrated into our existing code. LingPipe identifies itself as a suite of Java libraries for the linguistic analysis of human language, providing tools for information extraction and data mining 2. III. Training and Test Corpus Generation A. Training We downloaded a number of previously tagged data sets to provide the training material for the HMM, along with other resources that required hand-tagging. We narrowed our training to include: CoNLL 2003 shared task dataset on multi-language NE tagging 3 containing tagged named entities for PER, LOC, and ORG in English CoNLL 2004 shared task dataset on Semantic Role Labeling 4 tagged for English LOC NEs. Hand-tagged articles from the English Wikipedia, downloaded June 18, The CoNLL datasets were chosen due to their high quality and because they had been previously tagged for all NEs. We combined the CoNLL data with articles from the English Wikipedia in which all LOC NEs were handtagged. The articles focused on battles and wars with high frequencies of geospatial entities. An HMM was then generated for various combinations of the listed corpora and short simulations conducted for each of the models. The inclusion of the hand tagged data proved to have the greatest effect on the results. The average difference in f-measure between training corpora that did and did not include the hand tagged data was about 1.7%, where including the data was beneficial. Our final corpus was comprised of the CoNLL 2003 and 2004 datasets and the hand-tagged data from Wikipedia. All named entity tags were preserved from the CoNLL datasets but only locations were tagged in the Wikipedia articles. To put into perspective the degree of accuracy required in this NER task, simple statistics were gathered from the nine hand tagged articles used for training. Only 2191 of 61,708 total words were part of a location string (3.55%). Due to the sparse occurrences of these geospatial entities, the addition of these location-weighted and subject-similar articles significantly improved results. Based on our analysis of the CoNLL 2003 dataset, 7,893 of 32,588 NEs of 169,032 total words in 947 articles were locations. The CoNLL 2004 data did not contain article separators, but 3,347 of 16,308 NEs of 176,920 total words were locations. Only about half of the locations in the hand tagged articles did not appear in the CoNLL datasets, thus only 1,168 new locations were truly added with the inclusion of the nine hand tagged articles. B. Testing For primary testing, 21 Wikipedia articles (171,232 total words) were selected from the list of 90 articles processed by the SVM in [1]. These specific articles were chosen because they were used as primary examples in Witmer s previous work. These articles were preprocessed and were determined to have a variety of lengths and location frequencies but suitable for statistical analysis. Particularly, they did not have too few locations where incorporating them into the statistics without some sort of standardization would have an imbalanced impact on the results. The test articles shared the topic of historical battles and wars with the training articles. The articles used 4 srlconll/st04/st04.html
3 for training only made up about 15% of the final training set. Currently, this set of Wikipedia articles is the only corpus chosen for testing. In the future, we may expand our corpus to include news articles or other such informational online resources that are also expected to contain geospatial content. IV. Method LingPipe offers various formats for results along with different Named Entity Recognizers which vary in accuracy and efficiency. We chose the CharLMRescoringChunker as it is described to be the most accurate, but also the slowest chunker 5. This was best suited for the Geografikos package since the geospatial information associated with each article is only processed once for each article. The NER process is also significantly faster than the geocoding and disambiguation process, so NER speed was not an issue. Table I is an example of LingPipe s Confidence Named Entity Chunking which returns a list of the most confident results, including the string, where it can be found, what type of chunk it may be, and the probability that the string is correctly typed. The four types of chunks it can be trained to identify are PER, ORG, LOC, and MISC while text not identified as one of these chunks is labeled O. Based on a manual review of sentences such as these, we predicted that setting a threshold of 1.1 for the confidence would provide the best balance between false positives and correct identification. This threshold was set as a parameter within the Geografikos package as it processes results returned by the HMM. Tests were conducted with thresholds 1.0 to 1.5 with 0.1 increments. A direct correlation emerged between the threshold value and the precision in final results. An inverse correlation emerged between the precision and recall. The highest f-measure was achieved with a threshold of 1.1, which coincided with our initial prediction. V. Geospatial Named Entity Resolution The HMM extracted a set of candidate geospatial NEs from the article text. For each candidate string, the second objective was to decide whether it was a geospatial NE, and to determine the correct (latitude, longitude), or (φ, λ) coordinate pair for the place name in context of the article. To resolve the candidate NE, a lookup was made using Google Geocoder 6. If the entity reference resolved to a html single geospatial location, no further action was required. Otherwise, the context of the place name in the article, a data structure and a rule-driven algorithm were used to decide the correct spatial location for the place name. Our disambiguation task is close to that of word sense disambiguation, as defined by Cucerzan [6], only that we consider the geospatial context and domain instead of the lexical context and domain. We refer to this as geospatial entity resolution. It has also been referred to as grounding a place name [7]. Sehgal et al. demonstrate good results for geospatial entity resolution using both spatial (coordinate) and nonspatial (lexical) features of the geospatial entities [8]. Zong et al. demonstrated a rule-based method for place name assignment, achieving a precision of 88.6% on disambiguating place names in the United States, from the Digital Library for Earth System Education (DLESE) metadata [9]. While related to the work done by Martins et al. in [10], using an HMM for NER, and resolving through a geocoder for geospatial coordinates as a second step, we drew on the work in this area done by Witmer in [1], using a novel location tree data structure and algorithm to disambiguate and geocode the place names. A. Google Geocoder We used Google Geocoder as the gazetteer and geocoder for simplicity, as much research has already been done in this area. [11] and [12] provided an excellent overview of existing technology in this area. Google Geocoder provides a simple REST-based interface that can be queried over HTTP, returning data in a variety of formats. This system architecture allows developers to manager their user-client interfaces however they wish while only their client-server interaction must stay consistent. This also enables the option of adding layers to a process, allowing many lower-level layers to share caches to reduce server interaction. For each geospatial NE string submitted as a query, Google Geocoder returns 0 or more placemarks as a result. VI. Results In this section, we compare the overall results from the HMM-driven NER and disambiguation with the SVMdriven NER and disambiguation presented in [13] and [1]. Table II compares the final results of the HMM with that of the SVM. These results are based on the processing of 21 articles from Wikipedia. The Resolved Results in Table II specifically show the performance of our package in correctly identifying location strings and geocoding the locations. The NER Results show the accuracy of
4 Table I. LingPipe output for sentence: Lee at first anticipated that he would fight Burnside northwest of Fredericksburg and that it might be necessary to drop back behind the North Anna River. Rank Conf Span Type Phrase (67, 81) LOC Fredericksburg (137, 154) LOC North Anna River (143, 154) LOC Anna River (137, 142) LOC North (148, 154) LOC River (137, 147) ORG North Anna (137, 147) LOC North Anna (0, 3) PER Lee Table II. Resolved Geospatial NE Results Precision Recall F-Measure HMM NER Results SVM NER Results HMM Resolved Results SVM Resolved Results the two NERs before any further processing. A correctly identified string identified by the NER process is one that exactly matched a hand tagged ground truth named entity. Geocoding success is taking the string and correctly resolving a single location in the context of the document. Although the NER results look significantly lower when using the HMM, it should be noted that the median for the collected data was A handful of articles with foreign names, such as the article on the Korean War, brought down the average results with f-measures around only 10%. This is most likely due to the fact that our training data contained a limited number of foreign names, and the HMM had trouble recognizing these strings as LOC named entities. Figure 1 shows a more detailed breakdown of the precision, recall, and f-measure for a subset of the articles processed. The results in this chart show the same trend shown by Table III. The Geografikos package was generally able to extract locations with a higher performance from articles that contain a higher percentage of nonambiguous locations. The lowest scoring articles, portrayed as the leftmost three articles in Figure 1 all reference engagements in the American Civil War. Table III, from [14], shows that North and Central America have a much larger percentage of ambiguous place names that other parts of the world. The highest scoring article (rightmost three articles in Figure 1) focus on engagements on other continents. Table III. Places With Multiple Names and Names Applied to More Than One Place in the Getty Thesaurus of Geographic Names Continent % places with multiple names % names with multiple places North & Central America Oceania South America Asia Africa Europe A. Analysis The SVM used in [1] performed independent of article length and the frequency of place names. This independence did not carry over into the HMM. Although the two methods share the same disambiguation process, the initial phase of named entity extraction is the main influence on final results and the focus of this paper. The SVM was tuned to produce a very high recall by extracting a large number of potential NEs. Ultimately, all of these names were fed into Google Geocoder which identified actual places. The geocoding process counteracted the large amount of extra extraction from the initial phase and protected the precision by only accepting potential NEs that successfully geocoded. However, this generated a significantly higher amount of traffic to the geocoder than the HMM-based NER process. The HMM focused on decreasing this number of geocoder queries while maintaining overall performance. Table IV shows the decrease in the number of candidate location NEs extracted by the HMM over the SVM for some of the articles in the test corpus. It also shows the number of these NEs that successfully geocoded and were disambiguated. The HMM identified significantly less potential NEs in the initial phase, resulting in the generally lower recall but
5 Figure 1. Lowest 3 and Highest 3 Scoring Articles of HMM higher precision in the final geocoding and disambiguation process. The results for a selection of articles is shown in Table IV. Although the HMM often extracted less than half as many potential NEs as the SVM, the final results came out similar. The HMM demonstrates better performance than the SVM in the longer articles, and worse performance on shorter articles. The f-measure for some of these articles are pictured in Figure 1 in which the three lowest and three highest scoring articles of the HMM s results are shown side by side. For the HMM, the lowest scoring articles were the articles with the fewest potential NEs in the text, and the highest had the most potential NEs. Overall, the HMM-driven process showed an f-measure 2.4% lower than the SVM-driven process on the same testing corpus of Wikipedia articles. However, various overall improvements were demonstrated by the HMMdriven process that balance out these results. The time required to generate the model for the HMM was under four minutes while it took about 36 hours to train the SVM on a similar system. The vast decrease in training time for the HMM allows much greater flexibility in changing and expanding the training corpus to adjust the model for greater performance. Second, the HMM-driven process reduced the number of candidate NEs by over 50% in most cases, reducing the time spent on the geocoding and disambiguation phase. For both the SVM and HMM driven approaches, the most processing time is spent in the geocoding and disambiguation phase, so the streamlining of the NER phase multiplies the decrease in time spent on processing each article. VII. Conclusions and Future Work By continuing to enhance the efficiency of the Geografikos package, we both increase the value of the output results, and we can make it more suitable for heavy public use over the Internet. It also supports our ultimate goal of making this code open source. We envision a number of uses for this package in the search and visualization of Wikipedia articles. With the geospatial-specific information, searches for Wikipedia articles could be filtered by geographic area, through a search in two steps. The first would be a standard free text search on Wikipedia for articles about the topic. That list could then be further filtered to those articles with appropriate locations. Reversing this paradigm, the location data provided by the Geografikos package could also allow location-centric search. If a user wanted information on a particular region in the world, they may be able to select that location on a map interface and articles that reference it could be displayed with some excerpts of the text concerning the region. Furthermore, this database of locations could enable the visualization of Wikipedia articles through a geospatial interface. For instance, consider an interface that would allow a user to select a Wikipedia article, and then present the user with a map of all the locations from the article. Each location on the map would be clickable, and provide the sentences or paragraph around that NE from the text. Imagine putting World War II into this interface, and being presented with a map of Europe, Africa, and the Pacific theater, with all the locations from the article marked and clickable. This kind of visualization would
6 Table IV. Hand Tagged Articles - Potential Location NEs HMM Extracted SVM Extracted Article Potential NEs / Potential NEs / HMM Precision SVM Precision HMM Recall SVM Recall Grounded NEs Grounded NEs Chancellorsville 119/47 566/ Gettysburg 327/ / Korean War 625/ / War of / / World War 2 668/ / be an excellent teaching tool, and possibly reveal implicit information and relationships that are not apparent from the text of the article. Applied to other corpora of information, this kind of information could also be very useful in finding geospatial trends and relationships. For instance, consider a database of textual disease outbreak reports or world news articles. The Geografikos package could extract all the locations, allowing graphical presentation on a map, allowing trends to be found much more easily. With additional work, the geospatial data extracted by the Geografikos package could be combined with temporal information to allow geographic and temporal refinement. While many of these visualization tools already exist, they are driven from structured databases, and not from free text document sets. Our contribution in this paper is demonstrating improvements to the process originally laid out in [1]. This process extracts location names from a given text and grounds them to a single, disambiguated geospatial entity. Through the improvements by applying an HMM to our process the flexibility and speed of the NER phase of the overall process is increased. With the data structure and algorithm for resolution of ambiguous geospatial NEs based on article context, we open up possibilities for increased capability in geospatial information retrieval provided by associating a structured list of geospatial entities with a free text corpus. Credits The work reported in this paper is partially supported by the NSF Research Experience for Undergraduates Grant ARRA::CNS References [1] J. Witmer and J. Kalita, Extracting geospatial entities from wikipedia, IEEE International Conference on Semantic Computing, pp , [4] D. Klein, J. Smarr, H. Nguyen, and C. D. Manning, Named entity recognition with character-level models, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL Morristown, NJ, USA: Association for Computational Linguistics, 2003, pp [5] G. Zhou and J. Su, Named entity recognition using an HMM-based chunk tagger, in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), [6] S. Cucerzan, Large-scale named entity disambiguation based on Wikipedia data, EMNLP, [7] J. Leidner, G. Sinclair, and B. Webber, Grounding spatial named entities for information extraction and question answering, in Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references, 2003, pp [8] V. Sehgal, L. Getoor, and P. Viechnicki, Entity resolution in geospatial data integration, in ACM int. sym. on Advances in GIS. ACM, 2006, pp [9] W. Zong, D. Wu, A. Sun, E. Lim, and D. Goh, On assigning place names to geography related web pages, in ACM/IEEE-CS joint conf. on Digital libraries. ACM, 2005, pp [10] B. Martins, H. Manguinhas, and J. Borbinha, Extracting and Exploring the Geo-Temporal Semantics of Textual Resources, in IEEE ICSC, 2008, pp [11] Ø. Vestavik, Geographic Information Retrieval: An Overview, [12] T. D Roza and G. Bilchev, An Overview of Location- Based Services, BT Technology Journal, vol. 21, no. 1, pp , [13] J. Witmer and J. Kalita, Mining Wikipedia Article Clusters for Geospatial Entities and Relationships, Papers from the AAAI Spring Symposium: Technical Report SS-09-08, [14] D. Smith and G. Crane, Disambiguating geographic names in a historical digital library, Lecture Notes in CS, pp , [2] R. Grishman and B. Sundheim, Message understanding conference-6: A brief history, in ICCL. ACL, 1996, pp [3] W. Dakka and S. Cucerzan, Augmenting wikipedia with named entity tags, IJCNLP, 2008.
CLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationGIS Visualization: A Library s Pursuit Towards Creative and Innovative Research
GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research Justin B. Sorensen J. Willard Marriott Library University of Utah justin.sorensen@utah.edu Abstract As emerging technologies
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationDepartment of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling
Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal
More informationToponym Disambiguation using Ontology-based Semantic Similarity
Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationGeographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective
Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Alexander Klippel 1, Alan MacEachren 1, Prasenjit Mitra 2, Ian Turton 1, Xiao Zhang 2, Anuj Jaiswal 2, Kean
More informationTALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu
More informationSpatial Information Retrieval
Spatial Information Retrieval Wenwen LI 1, 2, Phil Yang 1, Bin Zhou 1, 3 [1] Joint Center for Intelligent Spatial Computing, and Earth System & GeoInformation Sciences College of Science, George Mason
More informationStyle-aware Mid-level Representation for Discovering Visual Connections in Space and Time
Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu
More informationCitation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.
University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationThe Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center.
The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center In the Beginning GIS was independent The GIS analyst or manager was typically a oneperson
More informationTest and Evaluation of an Electronic Database Selection Expert System
282 Test and Evaluation of an Electronic Database Selection Expert System Introduction As the number of electronic bibliographic databases available continues to increase, library users are confronted
More informationEXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT
Abstract EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT Xueying Zhang zhangsnowy@163.com Guonian Lv Zhiren Xie Yizhong Sun 210046 Key Laboratory of Virtual Geographical Environment (MOE) Naning
More informationAn empirical study of the effects of NLP components on Geographic IR performance
International Journal of Geographical Information Science Vol. 00, No. 00, Month 200x, 1 14 An empirical study of the effects of NLP components on Geographic IR performance Nicola Stokes*, Yi Li, Alistair
More informationConditional Random Fields
Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationImproving Geographical Data Finder Using Tokenize Approach from GIS Map API
Improving Geographical Data Finder Using Tokenize Approach from GIS Map API Antveer Kaur Department of computer science Banasthali University, Jaipur, Rajasthan, India bntsnghbrr940@gmail.com Shweta Kumari
More informationAlexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA
Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA
More informationA geo-temporal information extraction service for processing descriptive metadata in digital libraries
B. Martins, H. Manguinhas *, J. Borbinha *, W. Siabato A geo-temporal information extraction service for processing descriptive metadata in digital libraries Keywords: Georeferencing; gazetteers; geoparser;
More informationSt. Kitts and Nevis Heritage and Culture
St. Kitts and Nevis Heritage and Culture Eloise Stancioff, Habiba, Departmet of Culture St. Kitts HERA workshop: March 17-20, 2015 Goals Using freely available open source platforms, we implement two different
More informationRainfall data analysis and storm prediction system
Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited
More informationEvaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ
Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ-13-2-0013 Annual Report FY 2018 Submitted by Sergio Bernardes and Marguerite
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationCLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview
CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1, Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1 and Kirk Roberts 4 1 Tulane University
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationVariation of geospatial thinking in answering geography questions based on topographic maps
Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationPredicting New Search-Query Cluster Volume
Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationText Mining. March 3, March 3, / 49
Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationMining coreference relations between formulas and text using Wikipedia
Mining coreference relations between formulas and text using Wikipedia Minh Nghiem Quoc 1, Keisuke Yokoi 2, Yuichiroh Matsubayashi 3 Akiko Aizawa 1 2 3 1 Department of Informatics, The Graduate University
More informationExtraction of Spatio-Temporal data about Historical events from text documents
Faculty of Environmental Sciences, Department of Geosciences, Chair of Geoinformatics Extraction of Spatio-Temporal data about Historical events from text documents Case Study: German-Herero war of resistance
More informationIntroduction to ArcGIS Server Development
Introduction to ArcGIS Server Development Kevin Deege,, Rob Burke, Kelly Hutchins, and Sathya Prasad ESRI Developer Summit 2008 1 Schedule Introduction to ArcGIS Server Rob and Kevin Questions Break 2:15
More informationLarge Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.
Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion
More informationThe GapVis project and automated textual geoparsing
The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1 1 Google Ancient Places and GapVis The
More informationAnomaly Detection for the CERN Large Hadron Collider injection magnets
Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing
More informationFrom Research Objects to Research Networks: Combining Spatial and Semantic Search
From Research Objects to Research Networks: Combining Spatial and Semantic Search Sara Lafia 1 and Lisa Staehli 2 1 Department of Geography, UCSB, Santa Barbara, CA, USA 2 Institute of Cartography and
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationGOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE
GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE Abstract SHI Lihong 1 LI Haiyong 1,2 LIU Jiping 1 LI Bin 1 1 Chinese Academy Surveying and Mapping, Beijing, China, 100039 2 Liaoning
More informationAn integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages
An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages Hao Lin, Longping Hu, Yingjie Hu, Jianping Wu, Bailang Yu* Key Laboratory of Geographic Information Science, Ministry
More informationInternet Engineering Jacek Mazurkiewicz, PhD
Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationDM-Group Meeting. Subhodip Biswas 10/16/2014
DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions
More informationHow a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006
A Case-Study-In-Process: How a Media Organization Tackles the Georeferencing Challenge Opportunity Digital Gazetteer Workshop December 8, 2006 to increase and diffuse geographic knowledge 1888 to 2006:
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationResolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York*
UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES Twenty-eighth session New York, 28 April 2 May 2014 GEGN/28/9 English Resolutions from the Tenth United Nations Conference on the Standardization of
More informationVisualizing Energy Usage and Consumption of the World
Visualizing Energy Usage and Consumption of the World William Liew Abstract As the world becomes more and more environmentally aware, a simple layman s visualization of worldwide energy use is needed to
More informationToponym Disambiguation by Arborescent Relationships
Journal of Computer Science 6 (6): 653-659, 2010 ISSN 1549-3636 2010 Science Publications Toponym Disambiguation by Arborescent Relationships Imene Bensalem and Mohamed-Khireddine Kholladi Department of
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationAspect Term Extraction with History Attention and Selective Transformation 1
Aspect Term Extraction with History Attention and Selective Transformation 1 Xin Li 1, Lidong Bing 2, Piji Li 1, Wai Lam 1, Zhimou Yang 3 Presenter: Lin Ma 2 1 The Chinese University of Hong Kong 2 Tencent
More informationAnalysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie
Library and Information Services in Astronomy III ASP Conference Series, Vol. 153, 1998 U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez (eds.) Analysis of Evolutionary Trends in Astronomical
More informationfile://q:\report1\greenatlasfinalreportindex.html
Page 1 of 8 Quick Links WATER MANAGEMENT INTERNSHIP USDA HIS GRANT FUNDED FINAL PROJECT REPORT SUBMITTED BY MELISSA QUINTANA 11/07/07-03/24/08 Summary Provided is an assessment of my accomplishments for
More informationCLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview
CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1(B), Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1, and Kirk Roberts 4 1 Tulane University,
More informationExploring Class Discussions from a Massive Open Online Course (MOOC) on Cartography
Forthcoming in: Vondrakova, A., Brus, J., and Vozenilek, V. (Eds.) (2015) Modern Trends in Cartography, Selected Papers of CARTOCON 2014, Lecture Notes in Geoinformation and Cartography, Springer-Verlag.
More informationVisualizing Uncertainty: How to Use the Fuzzy Data of 550 Medieval Texts?
Visualizing Uncertainty: How to Use the Fuzzy Data of 550 Medieval Texts? Stefan Jänicke, Institut für Informatik, Universität Leipzig David Joseph Wrisley, Department of English, American University of
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationRoad & Railway Network Density Dataset at 1 km over the Belt and Road and Surround Region
Journal of Global Change Data & Discovery. 2017, 1(4): 402-407 DOI:10.3974/geodp.2017.04.03 www.geodoi.ac.cn 2017 GCdataPR Global Change Research Data Publishing & Repository Road & Railway Network Density
More informationINDOT Office of Traffic Safety
Intro to GIS Spatial Analysis INDOT Office of Traffic Safety Intro to GIS Spatial Analysis INDOT Office of Traffic Safety Kevin Knoke Section 130 Program Manager Highway Engineer II Registered Professional
More informationSpatial Decision Tree: A Novel Approach to Land-Cover Classification
Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department
More information2013 AND 2025 THE FUTURE OF GIS
THE FUTURE OF GIS 2013 AND 2025 What is the state of geospatial computing today? What are the issues today? Unresolved problems What will geospatial computing be like in 2025? What issues will be of concern
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationDATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON
DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON LEAD CONSULTANT, INFOSYS LIMITED SEZ Survey No. 41 (pt) 50 (pt), Singapore Township PO, Ghatkesar Mandal, Hyderabad, Telengana 500088 Word Limit of the
More informationDP Project Development Pvt. Ltd.
Dear Sir/Madam, Greetings!!! Thanks for contacting DP Project Development for your training requirement. DP Project Development is leading professional training provider in GIS technologies and GIS application
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationUC Berkeley International Conference on GIScience Short Paper Proceedings
UC Berkeley International Conference on GIScience Title Deriving Locational Reference through Implicit Information Retrieval Permalink https://escholarship.org/uc/item/0tn5s4v9 Journal International Conference
More informationNR402 GIS Applications in Natural Resources
NR402 GIS Applications in Natural Resources Lesson 1 Introduction to GIS Eva Strand, University of Idaho Map of the Pacific Northwest from http://www.or.blm.gov/gis/ Welcome to NR402 GIS Applications in
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationA Bayesian Model of Diachronic Meaning Change
A Bayesian Model of Diachronic Meaning Change Lea Frermann and Mirella Lapata Institute for Language, Cognition, and Computation School of Informatics The University of Edinburgh lea@frermann.de www.frermann.de
More informationA Web-based Geo-resolution Annotation and Evaluation Tool
A Web-based Geo-resolution Annotation and Evaluation Tool Beatrice Alex, Kate Byrne, Claire Grover and Richard Tobin School of Informatics University of Edinburgh {balex,kbyrne3,grover,richard}@inf.ed.ac.uk
More informationA Prototype of a Web Mapping System Architecture for the Arctic Region
A Prototype of a Web Mapping System Architecture for the Arctic Region Han-Fang Tsai 1, Chih-Yuan Huang 2, and Steve Liang 3 GeoSensorWeb Laboratory, Department of Geomatics Engineering, University of
More informationDeep Learning for NLP Part 2
Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationA Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs
A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs Pascal Kelm Communication Systems Group Technische Universität Berlin Germany kelm@nue.tu-berlin.de
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationFAO GAEZ Data Portal
FAO GAEZ Data Portal www.fao.org/nr/gaez Renato Cumani Environment Officer Land and Water Division Natural Resources Management and Environment Department Food and Agriculture Organization of the UN October
More informationHMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation
HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation Denis Turdakov and Dmitry Lizorkin Institute for System Programming of the Russian Academy of Sciences, 25 Solzhenitsina
More informationDeveloping Geo-temporal Context from Implicit Sources with Geovisual Analytics
Developing Geo-temporal Context from Implicit Sources with Geovisual Analytics Brian Tomaszewski 1 1 The Pennsylvania State University, Department of Geography, 302 Walker Building University Park, PA,
More informationWEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS
WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS Claus Rinner University of Muenster, Germany Piotr Jankowski San Diego State University, USA Keywords: geographic information
More informationYour web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore
Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Activityengage MAPPING W O RL D HERITAGE Where are sites of significant
More informationTEMPLATE FOR CMaP PROJECT
TEMPLATE FOR CMaP PROJECT Project Title: Native Utah Plants Created by: Anna Davis Class: Box Elder 2008 Project Description Community Issue or Problem Selected -How project evolved? Community Partner(s)
More informationComparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin) Mitra Shahabi 1 Department of Language and Culture University of Aveiro Aveiro, 3800-356, Portugal mitra.shahabi@ua.pt Abstract An attempt
More informationgeographic patterns and processes are captured and represented using computer technologies
Proposed Certificate in Geographic Information Science Department of Geographical and Sustainability Sciences Submitted: November 9, 2016 Geographic information systems (GIS) capture the complex spatial
More informationEnhancing the Curation of Botanical Data Using Text Analysis Tools
Enhancing the Curation of Botanical Data Using Text Analysis Tools Clare Llewellyn 1,ClareGrover 1, Jon Oberlander 1,andElspethHaston 2 1 University of Edinburgh, Edinburgh, United Kingdom C.A.Llewellyn@sms.ed.ac.uk,
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationSpatio-Textual Indexing for Geographical Search on the Web
Spatio-Textual Indexing for Geographical Search on the Web Subodh Vaid 1, Christopher B. Jones 1, Hideo Joho 2 and Mark Sanderson 2 1 School of Computer Science, Cardiff University, UK email: {c.b.jones,
More informationDiscovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager
Discovery and Access of Geospatial Resources using the Geoportal Extension Marten Hogeweg Geoportal Extension Product Manager DISCOVERY AND ACCESS USING THE GEOPORTAL EXTENSION Geospatial Data Is Very
More informationA Hidden Markov Model for Alphabet-Soup Word Recognition
A Hidden Markov Model for Alphabet-Soup Word Recognition Shaolei Feng Nicholas R. Howe R. Manmatha Dept. of Computer Science University of Massachusetts Amherst, MA-01003 slfeng@cs.umass.edu Dept. of Computer
More informationCrime Analyst Extension. Christine Charles
Crime Analyst Extension Christine Charles ccharles@esricanada.com Agenda Why use Crime Analyst? Overview Tools Demo Interoperability With our old software it could take a police officer up to forty minutes
More informationIntroduction to Spatial Big Data Analytics. Zhe Jiang Office: SEC 3435
Introduction to Spatial Big Data Analytics Zhe Jiang zjiang@cs.ua.edu Office: SEC 3435 1 What is Big Data? Examples Internet data (images from the web) Earth observation data (nasa.gov) wikimedia.org www.me.mtu.edu
More informationMACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion
More information