Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Similar documents
Spatial Role Labeling CS365 Course Project

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

From Language towards. Formal Spatial Calculi

Spatial Role Labeling: Towards Extraction of Spatial Relations from Natural Language

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

Intelligent Systems (AI-2)

Intelligent Systems (AI-2)

SemEval-2012 Task 3: Spatial Role Labeling

SemEval-2013 Task 3: Spatial Role Labeling

Information Extraction from Text

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Machine Learning for natural language processing

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Conditional Random Field

POS-Tagging. Fabian M. Suchanek

Global Machine Learning for Spatial Ontology Population

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Hidden Markov Models

10/17/04. Today s Main Points

Lecture 13: Structured Prediction

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Maxent Models and Discriminative Estimation

Intelligent Systems (AI-2)

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Midterm sample questions

Natural Language Processing

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models in Language Processing

The Noisy Channel Model and Markov Models

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Log-Linear Models, MEMMs, and CRFs

Structured Output Prediction: Generative Models

Conditional Random Fields

Hidden Markov Models

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Directed Probabilistic Graphical Models CMSC 678 UMBC

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Monday 22nd May 2017 Time: 09:45-11:45

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Information Extraction from Biomedical Text

Text Mining. March 3, March 3, / 49

Parsing with Context-Free Grammars

Hidden Markov Models

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Intelligent GIS: Automatic generation of qualitative spatial information

UTD: Ensemble-Based Spatial Relation Extraction

A brief introduction to Conditional Random Fields

COMP90051 Statistical Machine Learning

Lecture 15. Probabilistic Models on Graph

Intelligent Systems:

Linear Classifiers IV

Chapter 3: Basics of Language Modelling

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

Today s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Probabilistic Graphical Models

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Annotation tasks and solutions in CLARIN-PL

Conditional Random Fields: An Introduction

Graphical models for part of speech tagging

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Extracting Information from Text

Named Entity Recognition using Maximum Entropy Model SEEM5680

Lecture 7: Sequence Labeling

CS Homework 3. October 15, 2009

Probability Review and Naïve Bayes

Topic Models and Applications to Short Documents

Lecture 5 Neural models for NLP

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT

Collapsed Variational Bayesian Inference for Hidden Markov Models

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

Hidden Markov Models

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

More Smoothing, Tuning, and Evaluation

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Sequential Supervised Learning

Random Field Models for Applications in Computer Vision

CS460/626 : Natural Language

Statistical Methods for NLP

Latent Variable Models in NLP

Extracting Spatial Information From Place Descriptions

Introduction to Artificial Intelligence (AI)

CLRG Biocreative V

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS 662 Sample Midterm

Chapter 3: Basics of Language Modeling

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Boosting: Foundations and Algorithms. Rob Schapire

Probabilistic Models for Sequence Labeling

Transcription:

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal (12264) Under the guidance of Dr. Amitabha Mukerjee Abstract One of the key aspects today in Natural Language processing is to describe the spatial relations between objects and their locations. As describes by the SemEval challenge, Spatial Role labelling involves extraction of spatial roles and their relations. In this project, prepositions being used in a spatial sense are found and the related landmarks and trajectors are extracted. Additionally, all the words are classified into categories of location, organization, person, date, time or none. Stanford Parser and Named Entity Recognition along with other learning techniques are used to produce good results on the CLEF corpus.

Contents 1 Introduction 1 1.1 Problem Definition........................ 1 2 Motivation 2 3 Previous Work 2 4 Dataset 2 5 Methodology 3 5.1 Part 1: Spatial Indicator Extraction............... 3 5.2 Part 2: Trajector and Landmark Extraction.......... 4 5.3 Part 3: Named Entity Recognition............... 5 6 Results 6 6.1 Spatial Role Labelling...................... 6 6.2 Named Entity Recognition.................... 7 7 Limitations 7 8 Conclusion 7 9 Future Work 8 10 Software used 8 i

1 Introduction 1.1 Problem Definition Spatial role labelling is the task of automatic labelling of words and phrases in natural language text with a set of spatial roles. These roles can be trajector, landmark, spatial indicator, distance, direction etc. In this task, we would be classifying spatial prepositions as spatial indicators and the use it to extract the trajectors and landmarks. As defined by the SemEval Challenge 2012, these are the definitions of the various spatial roles: Trajector: Trajector is a spatial role label assigned to a word or a phrase that denotes a central object of a spatial scene or an object that moves. Example:... [the book] on the desk... Landmark: Landmark is a spatial role label assigned to a word or a phrase that denotes a secondary object of a spatial scene, to which a possible spatial relation (as between two objects in space) can be established. Example:... the book on [the desk]... Spatial Indicator: Spatial indicator is a spatial role label assigned to a word or a phrase that signals a spatial relation between objects (trajectors and landmarks) of a spatial scene. Example:... the book [on] the desk... Generally, a spacial indicator can either be a preposition, verb, noun, adjective and so on. In this work, we are only focusing on preposition as a form of spatial indicators. Additionally, all the words in a sentence are classified into different categories like person, organization, person, date, time or none using Stanford Named Entity Recognition. These may be used to improve spatial role labelling by restricting the search for trajectors and landmarks. The task can thus be divided in three parts: The first part considers identifying the spatial indicators. This involves finding preposition that are used in a spatial sense. In the second part, we try to extract the trajectors and landmarks for each of the prepositions identified as a spatial indicator. The last part involves classifying all the words in a sentence using Stanford NER. 1

2 Motivation Spatial role labelling is a key task for many natural language processing applications which require information about locations of objects referenced in text, or relations between them in space. For example, the phrase the book is on the desk contains information about the location of the object book with respect to another object desk and this information may be used by a robot that is trying to pick up the book or by a software trying to do text-to-scene conversion. Various applications of spatial role labelling includes controlling a robot by audio or textual instructions, navigation, traffic management, performing text-to-scene conversion and query answering systems. 3 Previous Work SpatialML (Mani et al.), 2008: This work focuses on geographical information rather than general spatial relations. Pustejovsky and Moszkowicz, 2009: Here, the verb is classified as a spatial verb or a temporal verb and both these classes have many subclasses. O. Stock, 1997: This work also involved both spatial and temporal Reasoning. All these previous methods are either very specific and application-dependent or they they try to extract both spatial and temporal information at the same time. Also, none of these methods have focused on prepositions which are the most expressive in terms of the spatial information they can provide. 4 Dataset These are the various datasets used in this project: SemEval 2012 dataset: The data we got SemEval 2012 challenge is a subset of the IAPR TC-12 Benchmark. This contains images taken by tourists with descriptions in different languages. TPP dataset: This dataset is from a SemEval-2007 task. The TPP dataset is used for preposition disambiguation. 2

Stanford CoreNLP: The tools used in this project include the Stanford Parser and Named Entity Recognition. The datasets used in these tools are - CoNLL, MUC-7 and ACE datasets. 5 Methodology There task is divided into 3 parts: 5.1 Part 1: Spatial Indicator Extraction The steps involved in this part are: Part of Speech Tagging: Each word in a sentence is tagged with its part of speech. Python s nltk library is used for carrying out POS tagging. Dependency Parsing: Based on the structure of the sentence and the POS tag attached to each word, a dependency graph is constructed (using the Stanford Parser). For example:(example from ḧttp://nlp.stanford.edu/software/stanforddependencies.shtml ) Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas Dependencies are: nsubjpass(submitted, Bills) auxpass(submitted, were) agent(submitted, Brownback) nn(brownback, Senator) appos(brownback, Republican) prep of(republican, Kansas) prep on(bills, ports) conj and(ports, immigration) prep on(bills, immigration) 3

Figure 1: Dependency graph Preposition Disambiguation: In this step, the feature set of the preposition is extracted using the dependency graph. The entities on which the preposition depends are named HEAD 1 and the entities which depend on the preposition are named HEAD 2. Spatial Sense Calculation:Here, the probability of a preposition being a spatial indicator is calculated. Based on the number of occurrences of a preposition in a spatial sense and in a non-spatial sense in the TPP dataset, the probability of the preposition being a spatial indicator is calculated. Identification of Spatial Indicators: The feature set of the preposition (which includes the POS tags and the features extracted from the previous steps) is then used to classify whether the preposition is a spatial indicator or not using a Naive Bayes Classifier. 5.2 Part 2: Trajector and Landmark Extraction For every spatial indicator in the sentence we consider every pair of word as landmark and trajector and find features for both landmark and trajector in the sentence and also tag each set of these features as 1 if it is correctly labelled as trajector or landmark else we tag it to 0 Trajector Extraction: The feature set used to identify trajector in each sentence consisted of word, root word, part of speech tag related 4

to the same. Landmark Extraction:The feature set used to identify trajector in each sentence consisted of word, its part of speech tag and wether that word lies on the left of the spatial indicator to the right of the spatial indicator 5.3 Part 3: Named Entity Recognition This part involved classification of all the words in a sentence in the following categories: Location Organization Person Date Time None of the above This classification can later be used to refine and improve the extraction of trajectors and landmarks. Stanford NER library and CRF The Named Entity Recognition was carried out using the Stanford NER library. It uses Conditional Random Field (CRF) models for classification. Conditional random fields as described in [1] are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to neighboring samples, a CRF can take context into account. It is an undirected graphical model or a Markov random field. Let x 1:N be the observations (e.g., words in a document), and z 1:N be the labels (e.g., tags), then a linear chain Conditional Random Field defines a conditional probability p(z 1:N x 1:N ) = exp( N F n=1 i=1 λ if i (z n 1,z n,x 1:N )) Z where Z is the normalization factor so that the sum of probability is 1, we sum over n=1... N word positions in the sequence. For each position, we 5

sum over i = 1... F weighted features. The scalar λ i is the weight for feature f i (). Example of feature function can be { 1, zn = PERSON and x fi(z n 1, z n, x 1:N ) = n = John 0, o/w { 1, zn = PERSON and x fi(z n 1, z n, x 1:N ) = n+1 = said 0, o/w The motivation behind chosing CRF over HMM for classification is that in HMM inorder to predict the z n we only know z 1:n 1 and x 1:n but in CRF we have the knowledge of z 1:n 1 and x 1:N like in the above feature we have used the value of x n+1 to predict z n. Also in CRF the we can also add the weights of the features like for a sentence John said probabilty of John being labelled as Person increases 6 Results 6.1 Spatial Role Labelling Some example results: The [cat] TRAJECTOR is [under] SPATIAL INDICATOR the [table] LANDMARK. The [pencil] TRAJECTOR 1 is [on] SPATIAL INDICATOR 1 the [bed] LANDMARK 1 and the [bed] TRAJECTOR 2 is [under] SPATIAL INDICATOR 2 the [roof] LANDMARK 2. [Clothes] TRAJECTOR are hanging [on] SPATIAL INDICATOR the [wire] LANDMARK. Some example with incorrect results: [He] TRAJECTOR is [on] SPATIAL INDICATOR [time] LANDMARK. [Cake] TRAJECTOR is not [on] SPATIAL INDICATOR the [list] LANDMARK. The accuracy of the extraction of the spatial roles is as follows: Spatial Role Accuracy Current Best Spatial Indicator 83% 88% Trajector 76% 84% Landmark 79% 91% 6

6.2 Named Entity Recognition Some example results: [Google] ORGANIZATION [Inc.] ORGANIZATION is located in [California] LOCATION. I went to [IIT] ORGANIZATION in the [morning] TIME. Today is 22nd [January] DATE. The performance statistics are available on the Stanford NER website[2] 7 Limitations These are some of the limitations of the proposed method: Only prepositions can be labelled as spatial indicators which is not always the case. A verb or a noun can also provide spatial information. Some words like time or letter can never be landmarks. Similarly, there are words that can never be trajectors. This information is not utilized and in sentences like Be on time, time is classified as a landmark. Many prepositions are multi-word. They have not been handled in our method. While extracting trajectors and landmarks, context is not taken into account. Only their relation to the spatial indicator is a factor. Techniques taking context into account have performed better in these cases (for example: Conditional Random Field models). 8 Conclusion In this project, we have presented a method for spatial language understanding and Named Entity Recognition. These techniques can be combined together to further refine and improve the task of Spatial Role Labelling. Much of this method s success stems from the fact that prepositions provide a substantial amount of spatial information. These prepositions along with words related to them give us spatial information. Several ways of extending and improving the method have also been documented. Improvement in this area would greatly help systems requiring spatial information extraction like robotic systems. 7

9 Future Work Some of the improvements that can be made in this project are as follows The feature set for the spacial indicator classification can be improved by using word2vec instead of the current model which only only gives a static probability of a preposition being a spatial indicator based on the TPP dataset. Currently, only prepositions are being used as spatial indicators. The model can be extended to include other parts of speech as spatial indicators. The Named Entity Recognition can be taken into account to improve the prediction of trajectors and landmarks. Instead of just extracting the spatial roles like trajector, landmark and spatial indicator, the relations between them like path, direction, distance, motion etc. can also be extracted out. A method can be developed to incorporate multi-word prepositions as spatial indicators. For example - in front of, at the back of etc. The trajector and landmark extraction can be improved by incorporating linear chain conditional random field as some of the recent work in this area suggests. 10 Software used Python nltk library Stanford Parser Stanford NER library Code adapted from: https://code.google.com/p/pln-pmt-pract/ Dataset used: TPP, CoNLL, MUC-7, ACE and SemEval 2012 dataset 8

References [1] Spatial role Labeling: Task definition and annotation scheme, P. Kordjamshidi et.al, In Proceedings of the 7th Conference on International Language Resources and Evaluation(LREC) 2010, http://www.researchgate.net/publication/220745904_spatial_ Role_Labeling_Task_Definition_and_Annotation_Scheme [2] Learning to Interpret Spatial Natural Language in terms of Qualitative Spatial Relations Parisa Kordjamshidi1, Representing space in cognition: interrelations of behavior, language, and formal models, Series Explorations in Language and Space, Oxford University Press, Oxford University Press, 2013, pp. 115146 https://lirias.kuleuven.be/bitstream/123456789/351891/1/ KordjamshidiHoisOtterloMoens.pdf [3] SemEval-2013 Task 3: Spatial Role Labeling, http://www.cs.york. ac.uk/semeval-2013/task3/ [4] Spatial Role Labeling: Towards Extraction of Spatial Relations from Natural Language By P. Kordjamshidi et.al, ACM Transactions on Speech and Language Processing, 8(3):article 4, 36 p, 2011 https://lirias.kuleuven.be/bitstream/123456789/321540/1/ Kordjamshidietal-ACM-TSLP-2011.pdf [5] http://nlp.stanford.edu/software/crf-ner.shtml 9