CIKM 18, October 22-26, 2018, Torino, Italy

Similar documents
Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction

Information Extraction from Text

Lab 12: Structured Prediction

arxiv: v3 [cs.lg] 14 Jan 2018

A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

Intelligent Systems (AI-2)

Topic Models and Applications to Short Documents

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Maschinelle Sprachverarbeitung

Statistical Pattern Recognition

The Noisy Channel Model and Markov Models

Maschinelle Sprachverarbeitung

Intelligent Systems (AI-2)

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

A Convolutional Neural Network-based

Discovering molecular pathways from protein interaction and ge

The Naïve Bayes Classifier. Machine Learning Fall 2017

Homework 6: Image Completion using Mixture of Bernoullis

Lecture 13: Structured Prediction

Statistical Methods for NLP

Symbolic methods in TC: Decision Trees

Statistical Methods for NLP

Deep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng

Conditional Random Field

Probabilistic Context Free Grammars. Many slides from Michael Collins

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Online Advertising is Big Business

Linear Classifiers IV

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Building Cognitive Applications

Maxent Models and Discriminative Estimation

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

how to *do* computationally assisted research

Dynamic Programming: Hidden Markov Models

The Relevance of Spatial Relation Terms and Geographical Feature Types

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

April 8, /26. RALI University of Montreal. An Informativeness Approach to Open. Information Extraction Evaluation

Statistical Pattern Recognition

A first model of learning

Recurrent Latent Variable Networks for Session-Based Recommendation

Computational Genomics and Molecular Biology, Fall

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Text Mining. March 3, March 3, / 49

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

Neural Architectures for Image, Language, and Speech Processing

Life And Geologic Time Study Guide Answers READ ONLINE

Gaussian Models

Prepositional Phrase Attachment over Word Embedding Products

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

INTRODUCTION HIERARCHY OF CLASSIFIERS

Introduction to Machine Learning Midterm Exam

Machine Learning for NLP

Mixtures of Gaussians with Sparse Structure

Naïve Bayes classification

CSE 546 Midterm Exam, Fall 2014

Introduction of Structured Learning. Hung-yi Lee

Probabilistic Context-free Grammars

Knowledge Expansion over Probabilistic Knowledge Bases. Yang Chen, Daisy Zhe Wang

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Hidden Markov Models

Spatial Role Labeling CS365 Course Project

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Conditional Random Fields

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Recommendation Systems

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Generative Models for Classification

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Generative Clustering, Topic Modeling, & Bayesian Inference

Logistic Regression & Neural Networks

CORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni

Hidden Markov Models Hamid R. Rabiee

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probabilistic Graphical Models

Chinese Character Handwriting Generation in TensorFlow

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Twitter s Effectiveness on Blackout Detection during Hurricane Sandy

This study is part of a broader project which involved the training and professional

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Hierachical Name Entity Recognition

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Applied Natural Language Processing

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

MLE/MAP + Naïve Bayes

Latent Variable Models in NLP

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Transcription:

903

Session 6B: Knowledge Modelling 1.1 Entity Typing by Structured Features Existing systems [16, 19, 27] cast entity typing as a single-instance multi-label classification problem. For each entity e, they represent the entity with a set of features fe extracted from KBs, and then infer the types of entity e from its feature set fe. To make the solutions general enough, they often use the structured features of a KB (such as attributes, attribute-value pairs and tags) instead of the features extracted from free text such as part-of-speech tags, dependency parsing results, etc. [16]. As a result, their effectiveness is largely dependent on the quality of structured features. Unfortunately, the structured features are far from being perfect: Firstly, many entities don t have or have limited structured features, which disables the type inference from structured features. Current KBs (such as DBpedia [10], Yago [22] and Freebase [2]) are mainly crafted from online encyclopedia websites, such as Wikipedia. According to Catriple [12], only 44.2% of articles in Wikipedia contain infobox information. Also in Baidu Baike 2, the largest Chinese online encyclopedia website, almost 32% (nearly 3 million) of entities lack the infobox and category information altogether [27]. Secondly, entities may belong to multiple types, but their structured features only show the most typical one, which disables the inference of a complete set of types. For example, from the structured features in the infobox on the right-hand side of Figure 1, which is a snippet of Donald Trump s Wikipedia page, we can only infer that Donald Trump belongs to type President. We have no way to infer the other types such as businessman or writer are missing. Types: Person, Politician, President, Businessman, Writer,... Text Corpus Donald Trump is the 45th and current President of the United States S2 Trump started his career at his father's real estate development company S3 Trump Tower is a 58-story, 664-foot-high (202 m) mixed-use skyscraper KB Entity: Donald Trump Mention Context Mention Types Entity Types Donald Trump S1 {person, politician, president} Trump S2 {person, businessman, actor} Trump S3 {place, building} {person, politician, president, businessman, actor, place, building} Figure 2: A Naïve method. Red labels within braces are errors. from the sentence: Trump took over running his family s real estate business in 1971, that Donald Trump also belongs to the type businessman. We can also infer from the sentence: He co-authored several books, that he could also be associated with the type writer. In this paper, we focus on using the text of an entity to infer its types. The advantages for this approach are twofold. Firstly, it is orthogonal to the structured features based typing solutions, and it can be used to complement existing solutions. When an entity has sparse or less specific structured features, a corpus-based solution is the only real alternative. Even for entities that have rich structured features, the results of our solution could help complete and enrich the types found by structured features. Secondly, the features from the corpus, or results derived from the corpus, could be used in an aggregation model (such as multi-view learning [23]) to achieve a better result. By integrating the information in both structured and unstructured parts of the KB, it is more likely that we can achieve a better result for entity typing. Many solutions are available for mention typing in sentences, we will first review this Naïve solution, which unfortunately still has weakness, and then propose our own solution. 1.2.1 Naïve Solutions. There is a significant difference between our problem setting and mention typing: namely that an entity in KB usually has multiple mentions in texts while mention typing usually focusses on the typing for a single mention. A very conservative estimate from in-text links on Wikipedia pages on the latest version of DBpedia 3 is that there are more than 5 million entities that have at least 2 mentions in the Wikipedia corpus. Therefore, to reuse existing mention typing solutions, we still need an appropriate strategy to aggregate the results derived from different mentions. The Naïve solution is therefore a simple union of the types found for each mention. We illustrate this procedure in Example 1.1. However, the Naïve method still has limited performance due to the following reasons: Firstly, mention typing is not a trivial problem and the current mention typing solutions still have a limited performance. There are currently two categories of mention typing solutions. The first one relies on hand-crafted features [11, 19, 30] which relies on a great deal of human effort. Examples of this include: Head, POS, Word Figure 1: Information about Donald Trump. The left-hand and right-hand side are the abstract and infobox information in Wikipedia, respectively, and the upper side is the existing type information in KBs. 1.2 Sentence ID S1 Entity Typing from Corpus In this paper, we argue that structured features are not enough for entity typing, especially for the complete inference of specific types. We notice that an entity in knowledge bases contains both structured and unstructured features (such as description texts). Moreover, in most cases, description texts contain a richer set of information than structured features. For example in the left-hand side of Figure 1, which is Donald Trump s abstract text, we can infer 3 http://downloads.dbpedia.org/2016-10/core-i18n/en/nif_text_links_en.ttl.bz2 2 http://baike.baidu.com/ 904

T E D e E T e T D e D D e e e D e m k k e D e c k m k m k D e m k c k m k t i T N P mk m k P mk = {P(t 1 m k ), P(t 2 m k ),..., P(t N m k )} N T P(t i m k ) m k t i T e E M e = {m 1,m 2,...,m k } P mk T e T e { P m1, P m2,..., P mk } e D e < m i,c i > 905

Offline Text Corpus Training Data Construction Existing Entity Types Constraints Generation Training Data Constraints (m 1,c 1) Pm 1 e D e Entity Mention Discovery (m 2,c 2)... Mention Typing Pm 2... Type Fusion T e (m k,c k) Pm k Online P(t m i ) P mi n i m n 1 i 906

m [c S c 1 ] [m 1 m N ] [c 1 c S ] S N c i i c i i m i i S = 10 N = 5 [c S c 1 ] [m 1 m N ] [c 1 c S ] t i t = σ(w xi x t + W hi h t 1 + b i ) j t = σ(w x j x t + W hj h t 1 + b j ) f t = σ(w xf x t + W hf h t 1 + b f ) o t = tanh(w xo x t + W ho h t 1 + b o ) c t = c t 1 f t + i t j t h t = tanh(c t ) o t σ i, f,o,c j c W b σ(w o [h 1,h 2,h 3 ] + b o ) W o b o h i i N M Loss = 1 N i=1 j=1 [ ] y ij loдyˆ ij + (1 y ij ) loд(1 yˆ ij ) N M y ij j i yˆ ij j i 907

Output Dense Layer Concatenation Layer concat concat concat concat concat concat concat concat concat Left Context Mention Right Context Input e x e,t t e t { 1 e t x e,t = 0 max ( P(t m) θ) x e,t m M t T e m Me P(t m) e t θ t t PMI(t 1, t 2 ) = P(t 1, t 2 ) P(t 1 ) P(t 2 ) P(t 1 ) P(t 2 ) t 1 t 2 P(t 1, t 2 ) t 1 t 2 α α ME(t 1, t 2 ) 908

m Me P(t m) IsA(t 1, t 2 ) t 1 t 2 ( P(t m) θ) x e,t m M t T e ME(t1,t 2 ) x e,t1 + x e,t2 1 IsA(t1,t 2 ) x e,t1 x e,t2 0 t x e,t {0, 1} t 2 θ = 0.5 P(t m 1 ) P(t m 2 ) P(t m 3 ) P(t m) e M e = {m 1,m 2,m 3 } P(t m) θ 909

T P e t e p e precision = ( δ(t e = p e ))/ P e P T recall = ( δ(t e = p e ))/ T e P T precision = 1 p e t e P p e P e recall = 1 p e t e T t e T e e P t e p e precision = e P p e e T t e p e recall = e T t e 2 precision recall f 1 = precision + recall 910

θ = 0.5 911

912