A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia

Size: px
Start display at page:

Download "A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia"

Transcription

1 A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia Daryl Woodward, Jeremy Witmer, and Jugal Kalita University of Colorado, Colorado Springs Computer Science Department 1420 Austin Bluffs Pkwy Colorado Springs, CO Abstract We target in this paper the challenge of extracting geospatial data from the article text of the English Wikipedia. We present the results of a Hidden Markov Model (HMM) based approach to identify location-related named entities in the our corpus of Wikipedia articles, which are primarily about battles and wars due to their high geospatial content. The HMM NER process drives a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name (often referred to as grounding). We compare our results to a previously developed data structure and algorithm for disambiguating place names that can have multiple coordinates. We demonstrate an overall f-measure of 79.63% identifying and geocoding place names. Finally, we compare the results of the HMM-driven process to earlier work using a Support Vector Machine. I. Introduction The amount of user-generated, unstructured content on the Internet increases significantly every day. Consequently, the need for techniques to automatically extract information from unstructured text becomes increasingly important. A significant number of queries on the Internet target geospatial data, making this a good area of study. Therefore, this paper focuses on approaches to extracting geospatial information from unstructured text. While Named Entity Recognition (NER) has seen much progress in recent years, NER for location names is only the first step in extracting geospatial information. The extracted place names must then be geocoded to at least a latitude and longitude coordinate pair to allow visualization, geospatial search, and information retrieval of text based on locations. We refer to this process as geocoding and disambiguation. It is also often referred to as grounding. Our work in this paper has a very specific focus: to maximize the efficiency of the initial geospatial NER that feeds our geocoding and disambiguation process. Increasing the accuracy of the raw NE data will improve our final results. Specifically, a more refined list of geospatial named entities (place names) from unstructured text documents such as Wikipedia articles, will aid in geocoding ambiguous names to the correct geospatial coordinates. Our research in this area is motivated by a number of factors. The goal of the ongoing development of the Geografikos package is a software system that creates a database of Wikipedia articles in which each article has an associated structured set of geospatial entities extracted from the article. This database will allow geospatial querying, information retrieval, and geovisualization to be applied to the Wikipedia articles. Further, we wish to opensource the Geografikos software on completion. Section 2 of the paper discusses relevant background research and information on the LingPipe library. Section 3 focuses on our process for choosing training data. Section 4 summarizes our method for generating the most accurate results. Section 5 quickly recaps the geocoding and disambiguation process. Section 6 discusses our results and compares them to the SVM used for the same process used by Witmer and Kalita [1]. Finally, we conclude with a discussion of possibilities for future research. II. Background Research Named entity recognition refers to the extraction of words and strings of text within documents that represent discrete concepts, such as names and locations. The term Named Entity Recognition describes the operations in

2 natural language processing that seek to extract the names of persons, organizations, locations, other proper nouns, and numeric terms such as percentages and dollar amounts. The term Named Entity was defined by the MUC-6, sponsored by DARPA in 1996 [2]. The NE recognition task was further defined, and expanded to language independence by the Conference on Natural Language Learning shared task for the 2002 and 2003 conferences. Numerous approaches have been tried since MUC- 6 to increase performance in NER, including Hidden Markov Models, Conditional Random Fields, Maximum Entropy models, Neural Networks, and Support Vector Machines (SVM). Dakka and Cucerzan demonstrated an SVM that achieves an f-measure of for LOC entities in Wikipedia articles, and an f-measure of across all NE classes [3]. Although research into text classification and NER has found that SVMs provide good performance on NER tasks, HMMs can produce similar results with minimal training. Hidden Markov Models (HMMs) have also shown excellent results. Klein et al. demonstrated that a characterlevel HMM can identify both English and German named entities with an f-measure of and for LOC entities in testing data, respectively [4]. In [5], Zhou and Su evaluated a HMM and HMM-based chunk tagger on the MUC-6 and MUC-7 English NE tasks, achieving f- measures of and 0.941, respectively. To compare to the SVM-based approach used by Witmer in [1], we chose to use the HMM implemented by the LingPipe library for NER which if participated in the CoNLL 2002 NER Task would have been tied for fourth place with an f-measure of For simplicity, Lingpipe is a fully developed Java package that easily integrated into our existing code. LingPipe identifies itself as a suite of Java libraries for the linguistic analysis of human language, providing tools for information extraction and data mining 2. III. Training and Test Corpus Generation A. Training We downloaded a number of previously tagged data sets to provide the training material for the HMM, along with other resources that required hand-tagging. We narrowed our training to include: CoNLL 2003 shared task dataset on multi-language NE tagging 3 containing tagged named entities for PER, LOC, and ORG in English CoNLL 2004 shared task dataset on Semantic Role Labeling 4 tagged for English LOC NEs. Hand-tagged articles from the English Wikipedia, downloaded June 18, The CoNLL datasets were chosen due to their high quality and because they had been previously tagged for all NEs. We combined the CoNLL data with articles from the English Wikipedia in which all LOC NEs were handtagged. The articles focused on battles and wars with high frequencies of geospatial entities. An HMM was then generated for various combinations of the listed corpora and short simulations conducted for each of the models. The inclusion of the hand tagged data proved to have the greatest effect on the results. The average difference in f-measure between training corpora that did and did not include the hand tagged data was about 1.7%, where including the data was beneficial. Our final corpus was comprised of the CoNLL 2003 and 2004 datasets and the hand-tagged data from Wikipedia. All named entity tags were preserved from the CoNLL datasets but only locations were tagged in the Wikipedia articles. To put into perspective the degree of accuracy required in this NER task, simple statistics were gathered from the nine hand tagged articles used for training. Only 2191 of 61,708 total words were part of a location string (3.55%). Due to the sparse occurrences of these geospatial entities, the addition of these location-weighted and subject-similar articles significantly improved results. Based on our analysis of the CoNLL 2003 dataset, 7,893 of 32,588 NEs of 169,032 total words in 947 articles were locations. The CoNLL 2004 data did not contain article separators, but 3,347 of 16,308 NEs of 176,920 total words were locations. Only about half of the locations in the hand tagged articles did not appear in the CoNLL datasets, thus only 1,168 new locations were truly added with the inclusion of the nine hand tagged articles. B. Testing For primary testing, 21 Wikipedia articles (171,232 total words) were selected from the list of 90 articles processed by the SVM in [1]. These specific articles were chosen because they were used as primary examples in Witmer s previous work. These articles were preprocessed and were determined to have a variety of lengths and location frequencies but suitable for statistical analysis. Particularly, they did not have too few locations where incorporating them into the statistics without some sort of standardization would have an imbalanced impact on the results. The test articles shared the topic of historical battles and wars with the training articles. The articles used 4 srlconll/st04/st04.html

3 for training only made up about 15% of the final training set. Currently, this set of Wikipedia articles is the only corpus chosen for testing. In the future, we may expand our corpus to include news articles or other such informational online resources that are also expected to contain geospatial content. IV. Method LingPipe offers various formats for results along with different Named Entity Recognizers which vary in accuracy and efficiency. We chose the CharLMRescoringChunker as it is described to be the most accurate, but also the slowest chunker 5. This was best suited for the Geografikos package since the geospatial information associated with each article is only processed once for each article. The NER process is also significantly faster than the geocoding and disambiguation process, so NER speed was not an issue. Table I is an example of LingPipe s Confidence Named Entity Chunking which returns a list of the most confident results, including the string, where it can be found, what type of chunk it may be, and the probability that the string is correctly typed. The four types of chunks it can be trained to identify are PER, ORG, LOC, and MISC while text not identified as one of these chunks is labeled O. Based on a manual review of sentences such as these, we predicted that setting a threshold of 1.1 for the confidence would provide the best balance between false positives and correct identification. This threshold was set as a parameter within the Geografikos package as it processes results returned by the HMM. Tests were conducted with thresholds 1.0 to 1.5 with 0.1 increments. A direct correlation emerged between the threshold value and the precision in final results. An inverse correlation emerged between the precision and recall. The highest f-measure was achieved with a threshold of 1.1, which coincided with our initial prediction. V. Geospatial Named Entity Resolution The HMM extracted a set of candidate geospatial NEs from the article text. For each candidate string, the second objective was to decide whether it was a geospatial NE, and to determine the correct (latitude, longitude), or (φ, λ) coordinate pair for the place name in context of the article. To resolve the candidate NE, a lookup was made using Google Geocoder 6. If the entity reference resolved to a html single geospatial location, no further action was required. Otherwise, the context of the place name in the article, a data structure and a rule-driven algorithm were used to decide the correct spatial location for the place name. Our disambiguation task is close to that of word sense disambiguation, as defined by Cucerzan [6], only that we consider the geospatial context and domain instead of the lexical context and domain. We refer to this as geospatial entity resolution. It has also been referred to as grounding a place name [7]. Sehgal et al. demonstrate good results for geospatial entity resolution using both spatial (coordinate) and nonspatial (lexical) features of the geospatial entities [8]. Zong et al. demonstrated a rule-based method for place name assignment, achieving a precision of 88.6% on disambiguating place names in the United States, from the Digital Library for Earth System Education (DLESE) metadata [9]. While related to the work done by Martins et al. in [10], using an HMM for NER, and resolving through a geocoder for geospatial coordinates as a second step, we drew on the work in this area done by Witmer in [1], using a novel location tree data structure and algorithm to disambiguate and geocode the place names. A. Google Geocoder We used Google Geocoder as the gazetteer and geocoder for simplicity, as much research has already been done in this area. [11] and [12] provided an excellent overview of existing technology in this area. Google Geocoder provides a simple REST-based interface that can be queried over HTTP, returning data in a variety of formats. This system architecture allows developers to manager their user-client interfaces however they wish while only their client-server interaction must stay consistent. This also enables the option of adding layers to a process, allowing many lower-level layers to share caches to reduce server interaction. For each geospatial NE string submitted as a query, Google Geocoder returns 0 or more placemarks as a result. VI. Results In this section, we compare the overall results from the HMM-driven NER and disambiguation with the SVMdriven NER and disambiguation presented in [13] and [1]. Table II compares the final results of the HMM with that of the SVM. These results are based on the processing of 21 articles from Wikipedia. The Resolved Results in Table II specifically show the performance of our package in correctly identifying location strings and geocoding the locations. The NER Results show the accuracy of

4 Table I. LingPipe output for sentence: Lee at first anticipated that he would fight Burnside northwest of Fredericksburg and that it might be necessary to drop back behind the North Anna River. Rank Conf Span Type Phrase (67, 81) LOC Fredericksburg (137, 154) LOC North Anna River (143, 154) LOC Anna River (137, 142) LOC North (148, 154) LOC River (137, 147) ORG North Anna (137, 147) LOC North Anna (0, 3) PER Lee Table II. Resolved Geospatial NE Results Precision Recall F-Measure HMM NER Results SVM NER Results HMM Resolved Results SVM Resolved Results the two NERs before any further processing. A correctly identified string identified by the NER process is one that exactly matched a hand tagged ground truth named entity. Geocoding success is taking the string and correctly resolving a single location in the context of the document. Although the NER results look significantly lower when using the HMM, it should be noted that the median for the collected data was A handful of articles with foreign names, such as the article on the Korean War, brought down the average results with f-measures around only 10%. This is most likely due to the fact that our training data contained a limited number of foreign names, and the HMM had trouble recognizing these strings as LOC named entities. Figure 1 shows a more detailed breakdown of the precision, recall, and f-measure for a subset of the articles processed. The results in this chart show the same trend shown by Table III. The Geografikos package was generally able to extract locations with a higher performance from articles that contain a higher percentage of nonambiguous locations. The lowest scoring articles, portrayed as the leftmost three articles in Figure 1 all reference engagements in the American Civil War. Table III, from [14], shows that North and Central America have a much larger percentage of ambiguous place names that other parts of the world. The highest scoring article (rightmost three articles in Figure 1) focus on engagements on other continents. Table III. Places With Multiple Names and Names Applied to More Than One Place in the Getty Thesaurus of Geographic Names Continent % places with multiple names % names with multiple places North & Central America Oceania South America Asia Africa Europe A. Analysis The SVM used in [1] performed independent of article length and the frequency of place names. This independence did not carry over into the HMM. Although the two methods share the same disambiguation process, the initial phase of named entity extraction is the main influence on final results and the focus of this paper. The SVM was tuned to produce a very high recall by extracting a large number of potential NEs. Ultimately, all of these names were fed into Google Geocoder which identified actual places. The geocoding process counteracted the large amount of extra extraction from the initial phase and protected the precision by only accepting potential NEs that successfully geocoded. However, this generated a significantly higher amount of traffic to the geocoder than the HMM-based NER process. The HMM focused on decreasing this number of geocoder queries while maintaining overall performance. Table IV shows the decrease in the number of candidate location NEs extracted by the HMM over the SVM for some of the articles in the test corpus. It also shows the number of these NEs that successfully geocoded and were disambiguated. The HMM identified significantly less potential NEs in the initial phase, resulting in the generally lower recall but

5 Figure 1. Lowest 3 and Highest 3 Scoring Articles of HMM higher precision in the final geocoding and disambiguation process. The results for a selection of articles is shown in Table IV. Although the HMM often extracted less than half as many potential NEs as the SVM, the final results came out similar. The HMM demonstrates better performance than the SVM in the longer articles, and worse performance on shorter articles. The f-measure for some of these articles are pictured in Figure 1 in which the three lowest and three highest scoring articles of the HMM s results are shown side by side. For the HMM, the lowest scoring articles were the articles with the fewest potential NEs in the text, and the highest had the most potential NEs. Overall, the HMM-driven process showed an f-measure 2.4% lower than the SVM-driven process on the same testing corpus of Wikipedia articles. However, various overall improvements were demonstrated by the HMMdriven process that balance out these results. The time required to generate the model for the HMM was under four minutes while it took about 36 hours to train the SVM on a similar system. The vast decrease in training time for the HMM allows much greater flexibility in changing and expanding the training corpus to adjust the model for greater performance. Second, the HMM-driven process reduced the number of candidate NEs by over 50% in most cases, reducing the time spent on the geocoding and disambiguation phase. For both the SVM and HMM driven approaches, the most processing time is spent in the geocoding and disambiguation phase, so the streamlining of the NER phase multiplies the decrease in time spent on processing each article. VII. Conclusions and Future Work By continuing to enhance the efficiency of the Geografikos package, we both increase the value of the output results, and we can make it more suitable for heavy public use over the Internet. It also supports our ultimate goal of making this code open source. We envision a number of uses for this package in the search and visualization of Wikipedia articles. With the geospatial-specific information, searches for Wikipedia articles could be filtered by geographic area, through a search in two steps. The first would be a standard free text search on Wikipedia for articles about the topic. That list could then be further filtered to those articles with appropriate locations. Reversing this paradigm, the location data provided by the Geografikos package could also allow location-centric search. If a user wanted information on a particular region in the world, they may be able to select that location on a map interface and articles that reference it could be displayed with some excerpts of the text concerning the region. Furthermore, this database of locations could enable the visualization of Wikipedia articles through a geospatial interface. For instance, consider an interface that would allow a user to select a Wikipedia article, and then present the user with a map of all the locations from the article. Each location on the map would be clickable, and provide the sentences or paragraph around that NE from the text. Imagine putting World War II into this interface, and being presented with a map of Europe, Africa, and the Pacific theater, with all the locations from the article marked and clickable. This kind of visualization would

6 Table IV. Hand Tagged Articles - Potential Location NEs HMM Extracted SVM Extracted Article Potential NEs / Potential NEs / HMM Precision SVM Precision HMM Recall SVM Recall Grounded NEs Grounded NEs Chancellorsville 119/47 566/ Gettysburg 327/ / Korean War 625/ / War of / / World War 2 668/ / be an excellent teaching tool, and possibly reveal implicit information and relationships that are not apparent from the text of the article. Applied to other corpora of information, this kind of information could also be very useful in finding geospatial trends and relationships. For instance, consider a database of textual disease outbreak reports or world news articles. The Geografikos package could extract all the locations, allowing graphical presentation on a map, allowing trends to be found much more easily. With additional work, the geospatial data extracted by the Geografikos package could be combined with temporal information to allow geographic and temporal refinement. While many of these visualization tools already exist, they are driven from structured databases, and not from free text document sets. Our contribution in this paper is demonstrating improvements to the process originally laid out in [1]. This process extracts location names from a given text and grounds them to a single, disambiguated geospatial entity. Through the improvements by applying an HMM to our process the flexibility and speed of the NER phase of the overall process is increased. With the data structure and algorithm for resolution of ambiguous geospatial NEs based on article context, we open up possibilities for increased capability in geospatial information retrieval provided by associating a structured list of geospatial entities with a free text corpus. Credits The work reported in this paper is partially supported by the NSF Research Experience for Undergraduates Grant ARRA::CNS References [1] J. Witmer and J. Kalita, Extracting geospatial entities from wikipedia, IEEE International Conference on Semantic Computing, pp , [4] D. Klein, J. Smarr, H. Nguyen, and C. D. Manning, Named entity recognition with character-level models, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL Morristown, NJ, USA: Association for Computational Linguistics, 2003, pp [5] G. Zhou and J. Su, Named entity recognition using an HMM-based chunk tagger, in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), [6] S. Cucerzan, Large-scale named entity disambiguation based on Wikipedia data, EMNLP, [7] J. Leidner, G. Sinclair, and B. Webber, Grounding spatial named entities for information extraction and question answering, in Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references, 2003, pp [8] V. Sehgal, L. Getoor, and P. Viechnicki, Entity resolution in geospatial data integration, in ACM int. sym. on Advances in GIS. ACM, 2006, pp [9] W. Zong, D. Wu, A. Sun, E. Lim, and D. Goh, On assigning place names to geography related web pages, in ACM/IEEE-CS joint conf. on Digital libraries. ACM, 2005, pp [10] B. Martins, H. Manguinhas, and J. Borbinha, Extracting and Exploring the Geo-Temporal Semantics of Textual Resources, in IEEE ICSC, 2008, pp [11] Ø. Vestavik, Geographic Information Retrieval: An Overview, [12] T. D Roza and G. Bilchev, An Overview of Location- Based Services, BT Technology Journal, vol. 21, no. 1, pp , [13] J. Witmer and J. Kalita, Mining Wikipedia Article Clusters for Geospatial Entities and Relationships, Papers from the AAAI Spring Symposium: Technical Report SS-09-08, [14] D. Smith and G. Crane, Disambiguating geographic names in a historical digital library, Lecture Notes in CS, pp , [2] R. Grishman and B. Sundheim, Message understanding conference-6: A brief history, in ICCL. ACL, 1996, pp [3] W. Dakka and S. Cucerzan, Augmenting wikipedia with named entity tags, IJCNLP, 2008.

CLRG Biocreative V

CLRG Biocreative V CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre

More information

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research Justin B. Sorensen J. Willard Marriott Library University of Utah justin.sorensen@utah.edu Abstract As emerging technologies

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective Alexander Klippel 1, Alan MacEachren 1, Prasenjit Mitra 2, Ian Turton 1, Xiao Zhang 2, Anuj Jaiswal 2, Kean

More information

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu

More information

Spatial Information Retrieval

Spatial Information Retrieval Spatial Information Retrieval Wenwen LI 1, 2, Phil Yang 1, Bin Zhou 1, 3 [1] Joint Center for Intelligent Spatial Computing, and Earth System & GeoInformation Sciences College of Science, George Mason

More information

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center.

The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center. The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center In the Beginning GIS was independent The GIS analyst or manager was typically a oneperson

More information

Test and Evaluation of an Electronic Database Selection Expert System

Test and Evaluation of an Electronic Database Selection Expert System 282 Test and Evaluation of an Electronic Database Selection Expert System Introduction As the number of electronic bibliographic databases available continues to increase, library users are confronted

More information

EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT

EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT Abstract EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT Xueying Zhang zhangsnowy@163.com Guonian Lv Zhiren Xie Yizhong Sun 210046 Key Laboratory of Virtual Geographical Environment (MOE) Naning

More information

An empirical study of the effects of NLP components on Geographic IR performance

An empirical study of the effects of NLP components on Geographic IR performance International Journal of Geographical Information Science Vol. 00, No. 00, Month 200x, 1 14 An empirical study of the effects of NLP components on Geographic IR performance Nicola Stokes*, Yi Li, Alistair

More information

Conditional Random Fields

Conditional Random Fields Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Improving Geographical Data Finder Using Tokenize Approach from GIS Map API

Improving Geographical Data Finder Using Tokenize Approach from GIS Map API Improving Geographical Data Finder Using Tokenize Approach from GIS Map API Antveer Kaur Department of computer science Banasthali University, Jaipur, Rajasthan, India bntsnghbrr940@gmail.com Shweta Kumari

More information

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA

More information

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

A geo-temporal information extraction service for processing descriptive metadata in digital libraries B. Martins, H. Manguinhas *, J. Borbinha *, W. Siabato A geo-temporal information extraction service for processing descriptive metadata in digital libraries Keywords: Georeferencing; gazetteers; geoparser;

More information

St. Kitts and Nevis Heritage and Culture

St. Kitts and Nevis Heritage and Culture St. Kitts and Nevis Heritage and Culture Eloise Stancioff, Habiba, Departmet of Culture St. Kitts HERA workshop: March 17-20, 2015 Goals Using freely available open source platforms, we implement two different

More information

Rainfall data analysis and storm prediction system

Rainfall data analysis and storm prediction system Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited

More information

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ-13-2-0013 Annual Report FY 2018 Submitted by Sergio Bernardes and Marguerite

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1, Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1 and Kirk Roberts 4 1 Tulane University

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

Variation of geospatial thinking in answering geography questions based on topographic maps

Variation of geospatial thinking in answering geography questions based on topographic maps Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study

More information

Lecture 18 April 26, 2012

Lecture 18 April 26, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Text Mining. March 3, March 3, / 49

Text Mining. March 3, March 3, / 49 Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Mining coreference relations between formulas and text using Wikipedia

Mining coreference relations between formulas and text using Wikipedia Mining coreference relations between formulas and text using Wikipedia Minh Nghiem Quoc 1, Keisuke Yokoi 2, Yuichiroh Matsubayashi 3 Akiko Aizawa 1 2 3 1 Department of Informatics, The Graduate University

More information

Extraction of Spatio-Temporal data about Historical events from text documents

Extraction of Spatio-Temporal data about Historical events from text documents Faculty of Environmental Sciences, Department of Geosciences, Chair of Geoinformatics Extraction of Spatio-Temporal data about Historical events from text documents Case Study: German-Herero war of resistance

More information

Introduction to ArcGIS Server Development

Introduction to ArcGIS Server Development Introduction to ArcGIS Server Development Kevin Deege,, Rob Burke, Kelly Hutchins, and Sathya Prasad ESRI Developer Summit 2008 1 Schedule Introduction to ArcGIS Server Rob and Kevin Questions Break 2:15

More information

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr. Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion

More information

The GapVis project and automated textual geoparsing

The GapVis project and automated textual geoparsing The GapVis project and automated textual geoparsing Google Ancient Places team, Edinburgh Language Technology Group GeoBib Conference, Giessen, 4th-5th May 2015 1 1 Google Ancient Places and GapVis The

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

From Research Objects to Research Networks: Combining Spatial and Semantic Search

From Research Objects to Research Networks: Combining Spatial and Semantic Search From Research Objects to Research Networks: Combining Spatial and Semantic Search Sara Lafia 1 and Lisa Staehli 2 1 Department of Geography, UCSB, Santa Barbara, CA, USA 2 Institute of Cartography and

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE Abstract SHI Lihong 1 LI Haiyong 1,2 LIU Jiping 1 LI Bin 1 1 Chinese Academy Surveying and Mapping, Beijing, China, 100039 2 Liaoning

More information

An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages

An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages An integrated Framework for Retrieving and Analyzing Geographic Information in Web Pages Hao Lin, Longping Hu, Yingjie Hu, Jianping Wu, Bailang Yu* Key Laboratory of Geographic Information Science, Ministry

More information

Internet Engineering Jacek Mazurkiewicz, PhD

Internet Engineering Jacek Mazurkiewicz, PhD Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

How a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006

How a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006 A Case-Study-In-Process: How a Media Organization Tackles the Georeferencing Challenge Opportunity Digital Gazetteer Workshop December 8, 2006 to increase and diffuse geographic knowledge 1888 to 2006:

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

Resolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York*

Resolutions from the Tenth United Nations Conference on the Standardization of Geographical Names, 2012, New York* UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES Twenty-eighth session New York, 28 April 2 May 2014 GEGN/28/9 English Resolutions from the Tenth United Nations Conference on the Standardization of

More information

Visualizing Energy Usage and Consumption of the World

Visualizing Energy Usage and Consumption of the World Visualizing Energy Usage and Consumption of the World William Liew Abstract As the world becomes more and more environmentally aware, a simple layman s visualization of worldwide energy use is needed to

More information

Toponym Disambiguation by Arborescent Relationships

Toponym Disambiguation by Arborescent Relationships Journal of Computer Science 6 (6): 653-659, 2010 ISSN 1549-3636 2010 Science Publications Toponym Disambiguation by Arborescent Relationships Imene Bensalem and Mohamed-Khireddine Kholladi Department of

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Aspect Term Extraction with History Attention and Selective Transformation 1

Aspect Term Extraction with History Attention and Selective Transformation 1 Aspect Term Extraction with History Attention and Selective Transformation 1 Xin Li 1, Lidong Bing 2, Piji Li 1, Wai Lam 1, Zhimou Yang 3 Presenter: Lin Ma 2 1 The Chinese University of Hong Kong 2 Tencent

More information

Analysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie

Analysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie Library and Information Services in Astronomy III ASP Conference Series, Vol. 153, 1998 U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez (eds.) Analysis of Evolutionary Trends in Astronomical

More information

file://q:\report1\greenatlasfinalreportindex.html

file://q:\report1\greenatlasfinalreportindex.html Page 1 of 8 Quick Links WATER MANAGEMENT INTERNSHIP USDA HIS GRANT FUNDED FINAL PROJECT REPORT SUBMITTED BY MELISSA QUINTANA 11/07/07-03/24/08 Summary Provided is an assessment of my accomplishments for

More information

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview Parisa Kordjamshidi 1(B), Taher Rahgooy 1, Marie-Francine Moens 2, James Pustejovsky 3, Umar Manzoor 1, and Kirk Roberts 4 1 Tulane University,

More information

Exploring Class Discussions from a Massive Open Online Course (MOOC) on Cartography

Exploring Class Discussions from a Massive Open Online Course (MOOC) on Cartography Forthcoming in: Vondrakova, A., Brus, J., and Vozenilek, V. (Eds.) (2015) Modern Trends in Cartography, Selected Papers of CARTOCON 2014, Lecture Notes in Geoinformation and Cartography, Springer-Verlag.

More information

Visualizing Uncertainty: How to Use the Fuzzy Data of 550 Medieval Texts?

Visualizing Uncertainty: How to Use the Fuzzy Data of 550 Medieval Texts? Visualizing Uncertainty: How to Use the Fuzzy Data of 550 Medieval Texts? Stefan Jänicke, Institut für Informatik, Universität Leipzig David Joseph Wrisley, Department of English, American University of

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Road & Railway Network Density Dataset at 1 km over the Belt and Road and Surround Region

Road & Railway Network Density Dataset at 1 km over the Belt and Road and Surround Region Journal of Global Change Data & Discovery. 2017, 1(4): 402-407 DOI:10.3974/geodp.2017.04.03 www.geodoi.ac.cn 2017 GCdataPR Global Change Research Data Publishing & Repository Road & Railway Network Density

More information

INDOT Office of Traffic Safety

INDOT Office of Traffic Safety Intro to GIS Spatial Analysis INDOT Office of Traffic Safety Intro to GIS Spatial Analysis INDOT Office of Traffic Safety Kevin Knoke Section 130 Program Manager Highway Engineer II Registered Professional

More information

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Spatial Decision Tree: A Novel Approach to Land-Cover Classification Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department

More information

2013 AND 2025 THE FUTURE OF GIS

2013 AND 2025 THE FUTURE OF GIS THE FUTURE OF GIS 2013 AND 2025 What is the state of geospatial computing today? What are the issues today? Unresolved problems What will geospatial computing be like in 2025? What issues will be of concern

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON

DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON DATA SCIENCE SIMPLIFIED USING ARCGIS API FOR PYTHON LEAD CONSULTANT, INFOSYS LIMITED SEZ Survey No. 41 (pt) 50 (pt), Singapore Township PO, Ghatkesar Mandal, Hyderabad, Telengana 500088 Word Limit of the

More information

DP Project Development Pvt. Ltd.

DP Project Development Pvt. Ltd. Dear Sir/Madam, Greetings!!! Thanks for contacting DP Project Development for your training requirement. DP Project Development is leading professional training provider in GIS technologies and GIS application

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

UC Berkeley International Conference on GIScience Short Paper Proceedings

UC Berkeley International Conference on GIScience Short Paper Proceedings UC Berkeley International Conference on GIScience Title Deriving Locational Reference through Implicit Information Retrieval Permalink https://escholarship.org/uc/item/0tn5s4v9 Journal International Conference

More information

NR402 GIS Applications in Natural Resources

NR402 GIS Applications in Natural Resources NR402 GIS Applications in Natural Resources Lesson 1 Introduction to GIS Eva Strand, University of Idaho Map of the Pacific Northwest from http://www.or.blm.gov/gis/ Welcome to NR402 GIS Applications in

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

A Bayesian Model of Diachronic Meaning Change

A Bayesian Model of Diachronic Meaning Change A Bayesian Model of Diachronic Meaning Change Lea Frermann and Mirella Lapata Institute for Language, Cognition, and Computation School of Informatics The University of Edinburgh lea@frermann.de www.frermann.de

More information

A Web-based Geo-resolution Annotation and Evaluation Tool

A Web-based Geo-resolution Annotation and Evaluation Tool A Web-based Geo-resolution Annotation and Evaluation Tool Beatrice Alex, Kate Byrne, Claire Grover and Richard Tobin School of Informatics University of Edinburgh {balex,kbyrne3,grover,richard}@inf.ed.ac.uk

More information

A Prototype of a Web Mapping System Architecture for the Arctic Region

A Prototype of a Web Mapping System Architecture for the Arctic Region A Prototype of a Web Mapping System Architecture for the Arctic Region Han-Fang Tsai 1, Chih-Yuan Huang 2, and Steve Liang 3 GeoSensorWeb Laboratory, Department of Geomatics Engineering, University of

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs

A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs Pascal Kelm Communication Systems Group Technische Universität Berlin Germany kelm@nue.tu-berlin.de

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

FAO GAEZ Data Portal

FAO GAEZ Data Portal FAO GAEZ Data Portal www.fao.org/nr/gaez Renato Cumani Environment Officer Land and Water Division Natural Resources Management and Environment Department Food and Agriculture Organization of the UN October

More information

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation Denis Turdakov and Dmitry Lizorkin Institute for System Programming of the Russian Academy of Sciences, 25 Solzhenitsina

More information

Developing Geo-temporal Context from Implicit Sources with Geovisual Analytics

Developing Geo-temporal Context from Implicit Sources with Geovisual Analytics Developing Geo-temporal Context from Implicit Sources with Geovisual Analytics Brian Tomaszewski 1 1 The Pennsylvania State University, Department of Geography, 302 Walker Building University Park, PA,

More information

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS Claus Rinner University of Muenster, Germany Piotr Jankowski San Diego State University, USA Keywords: geographic information

More information

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Activityengage MAPPING W O RL D HERITAGE Where are sites of significant

More information

TEMPLATE FOR CMaP PROJECT

TEMPLATE FOR CMaP PROJECT TEMPLATE FOR CMaP PROJECT Project Title: Native Utah Plants Created by: Anna Davis Class: Box Elder 2008 Project Description Community Issue or Problem Selected -How project evolved? Community Partner(s)

More information

Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)

Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin) Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin) Mitra Shahabi 1 Department of Language and Culture University of Aveiro Aveiro, 3800-356, Portugal mitra.shahabi@ua.pt Abstract An attempt

More information

geographic patterns and processes are captured and represented using computer technologies

geographic patterns and processes are captured and represented using computer technologies Proposed Certificate in Geographic Information Science Department of Geographical and Sustainability Sciences Submitted: November 9, 2016 Geographic information systems (GIS) capture the complex spatial

More information

Enhancing the Curation of Botanical Data Using Text Analysis Tools

Enhancing the Curation of Botanical Data Using Text Analysis Tools Enhancing the Curation of Botanical Data Using Text Analysis Tools Clare Llewellyn 1,ClareGrover 1, Jon Oberlander 1,andElspethHaston 2 1 University of Edinburgh, Edinburgh, United Kingdom C.A.Llewellyn@sms.ed.ac.uk,

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Spatio-Textual Indexing for Geographical Search on the Web

Spatio-Textual Indexing for Geographical Search on the Web Spatio-Textual Indexing for Geographical Search on the Web Subodh Vaid 1, Christopher B. Jones 1, Hideo Joho 2 and Mark Sanderson 2 1 School of Computer Science, Cardiff University, UK email: {c.b.jones,

More information

Discovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager

Discovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager Discovery and Access of Geospatial Resources using the Geoportal Extension Marten Hogeweg Geoportal Extension Product Manager DISCOVERY AND ACCESS USING THE GEOPORTAL EXTENSION Geospatial Data Is Very

More information

A Hidden Markov Model for Alphabet-Soup Word Recognition

A Hidden Markov Model for Alphabet-Soup Word Recognition A Hidden Markov Model for Alphabet-Soup Word Recognition Shaolei Feng Nicholas R. Howe R. Manmatha Dept. of Computer Science University of Massachusetts Amherst, MA-01003 slfeng@cs.umass.edu Dept. of Computer

More information

Crime Analyst Extension. Christine Charles

Crime Analyst Extension. Christine Charles Crime Analyst Extension Christine Charles ccharles@esricanada.com Agenda Why use Crime Analyst? Overview Tools Demo Interoperability With our old software it could take a police officer up to forty minutes

More information

Introduction to Spatial Big Data Analytics. Zhe Jiang Office: SEC 3435

Introduction to Spatial Big Data Analytics. Zhe Jiang Office: SEC 3435 Introduction to Spatial Big Data Analytics Zhe Jiang zjiang@cs.ua.edu Office: SEC 3435 1 What is Big Data? Examples Internet data (images from the web) Earth observation data (nasa.gov) wikimedia.org www.me.mtu.edu

More information

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion

More information