CORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni

Similar documents
CORE: Context-Aware Open Relation Extraction with Factorization Machines

A Convolutional Neural Network-based

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

Information Extraction from Text

Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Topic Models and Applications to Short Documents

Decoupled Collaborative Ranking

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Topics in Natural Language Processing

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Click Prediction and Preference Ranking of RSS Feeds

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Generative Models for Discrete Data

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

Unsupervised Rank Aggregation with Distance-Based Models

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Distributed ML for DOSNs: giving power back to users

PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

From Sentiment Analysis to Preference Aggregation

Dimension Reduction (PCA, ICA, CCA, FLD,

Natural Language Processing

Joint Emotion Analysis via Multi-task Gaussian Processes

7.1 Sampling Error The Need for Sampling Distributions

Machine Learning for natural language processing

Data Mining and Matrices

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

What s Cooking? Predicting Cuisines from Recipe Ingredients

Learning Features from Co-occurrences: A Theoretical Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Bayesian Linear Regression [DRAFT - In Progress]

Word Embeddings 2 - Class Discussions

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Brief Introduction of Machine Learning Techniques for Content Analysis

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Prediction of Citations for Academic Papers

Basics of Multivariate Modelling and Data Analysis

TE4-2 30th Fuzzy System Symposium(Kochi,September 1-3,2014) Advertising Slogan Selection System Using Review Comments on the Web. 2 Masafumi Hagiwara

Using Both Latent and Supervised Shared Topics for Multitask Learning

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Probabilistic Latent Semantic Analysis

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Content-based Recommendation

Multimedia analysis and retrieval

Sparse Stochastic Inference for Latent Dirichlet Allocation

CS60021: Scalable Data Mining. Large Scale Machine Learning

Laconic: Label Consistency for Image Categorization

Interpretable Latent Variable Models

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics

The Perceptron. Volker Tresp Summer 2016

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

CS 6375 Machine Learning

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

A Three-Way Model for Collective Learning on Multi-Relational Data

A Discriminative Model for Semantics-to-String Translation

CLRG Biocreative V

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Latent Variable Models in NLP

What s so Hard about Natural Language Understanding?

Multi-theme Sentiment Analysis using Quantified Contextual

Predicting New Search-Query Cluster Volume

CIKM 18, October 22-26, 2018, Torino, Italy

Tuning as Linear Regression

Logistic Regression & Neural Networks

Multi-Task Structured Prediction for Entity Analysis: Search Based Learning Algorithms

Introduction to Course

Text Mining for Economics and Finance Unsupervised Learning

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Named Entity Recognition using Maximum Entropy Model SEEM5680

International Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 11 Nov p-issn:

Language as a Stochastic Process

Natural Language Processing

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Applied Natural Language Processing

Machine Learning Basics

Text Mining. March 3, March 3, / 49

Introduction to Machine Learning

IBPS RRB Office Assistant Prelims A. 1 B. 2 C. 3. D. no one A. R B. S C.Q D. U

Lab 12: Structured Prediction

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

arxiv: v2 [cs.cl] 1 Jan 2019

Clustering with k-means and Gaussian mixture distributions

Transcription:

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla

Open relation extraction Open relation extraction is the task of extracting new facts for a potentially unbounded set of relations from various sources natural language text EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. knowledge bases 2 of 21

nput data: facts from natural language text surface relation open information extractor extract all facts in text Enrico Fermi was a professor in theoretical physics at Sapienza University of Rome. surface fact rel sub obj "professor at"(fermi,sapienza) tuple EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 3 of 21

nput data: facts from knowledge bases KB fact employee(fermi,sapienza) entity link natural language "professor at"(fermi,sapienza) text e.g., string match heuristic surface fact EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. Fermi employee Sapienza KB relation 4 of 21

Relation extraction techniques taxonomy close in-kb relation extraction open in-kb out of-kb distant supervision set of predefined relations tensor completion latent factors models matrix completion relation clustering "black and white" approach RESCAL PTF NFE CORE (Nickel et al., 2011) (Drumond et al., 2012) (Riedel et al., 2013) (Petroni et al., 2015) limited scalability with the number of relations; large prediction space restricted prediction space EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 5 of 21

Matrix completion for open relation extraction (Caesar,Rome) (Fermi,Rome) 1 1 (Fermi,Sapienza) 1 1 (de Blasio,NY) 1 tuples x relations born in professor at mayor of surface relation employee KB relation EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 6 of 21

Matrix completion for open relation extraction (Caesar,Rome) (Fermi,Rome) 1 1?????? (Fermi,Sapienza)? 1? 1 (de Blasio,NY)?? 1? tuples x relations born in professor at mayor of surface relation employee KB relation EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 6 of 21

Matrix factorization learn latent semantic representations of tuples and relations relation latent factor vector dot product tuple latent factor vector leverage latent representations to predict new facts (Fermi,Sapienza) professor at 0.8 0.9-0.5-0.3 related with sport related with science in real applications latent factors are uninterpretable EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 7 of 21

Matrix factorization CORE integrates contextual information into such models to improve prediction performance EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 7 of 21

Contextual information unspecific relation "join"(peloso,modest Mouse) surface relation Tom Peloso joined Modest Mouse to record their fifth studio album. label entity with coarsegrained type named entity recognizer Contextual information entity types article topic person organization words record album EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 8 of 21

Contextual information unspecific relation "join"(peloso,modest Mouse) Contextual information How to incorporate contextual Tom Peloso joined Modest Mouse to record their fifth studio album. person entity types information within the model? organization article topic surface relation words record album EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 8 of 21

CORE - latent representations of variables associates latent representations f v with each variable v 2 V tuple relation entities context (Peloso,Modest Mouse) join Peloso Modest Mouse person organization record album latent factor vectors EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 9 of 21

CORE - modeling facts Surface KB Context x 1 x 2 x 3 x 4 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 born in (x,y) professor at (x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics models the input data in terms of a matrix in which each row corresponds to a fact x and each column to a variable v groups columns according to the type of the variables in each row the values of each column group sum up to unity EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 10 of 21

CORE - modeling context Surface KB Context x 1 x 2 x 3 x 4 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 born in (x,y) professor at (x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics aggregates and normalizes contextual information by tuple B a fact can be observed multiple times with di erent context B there is no context for new facts (never observed in input) this approach allows us to provide comprehensive contextual information for both observed and unobserved facts EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 11 of 21

CORE - factorization model Surface KB Context x 1 x 2 x 3 x 4 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 born in (x,y) professor at (x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics uses factorization machines as underlying framework associates a score s(x) witheachfactx s(x) = X v 1 2V X v 2 2V \{v 1 } x v1 x v2 f T v 1 f v2 weighted pairwise interactions of latent factor vectors EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 12 of 21

CORE - prediction goal produce a ranked list of tuples for each relation rank reflects the likelihood that the corresponding fact is true to generate this ranked list: B fix a relation r B retrieve all tuples t, s.t.thefactr(t) is not observed B add tuple context B rank unobserved facts by their scores EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 13 of 21

CORE - parameter estimation parameters: = { b v, f v v 2 V } all our observations are positive, no negative training data Bayesian personalized ranking, open-world assumption observed fact professor at Fermi (Fermi,Sapienza) Sapienza fact tuple entities x physics person organization location tuple context sampled negative evidence professor at Caesar (Caesar,Rome) Rome fact tuple entities x- history person location tuple context pairwise approach, x is more likely to be true than x- maximize X x f (s(x) s(x-)) stochastic gradient ascent + r ( ) EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 14 of 21

Experiments - dataset 440k facts extracted from entity mentions linked using string matching 15k facts from corpus Contextual information article metadata entity type bag-of-word sentences where the fact has been extracted news desk (e.g., foreign desk) person descriptors (e.g., finances) organization online section (e.g., sports) section location (e.g., a, d) publication year m miscellaneous t w letters to indicate contextual information considered EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 15 of 21

Experiments - methodology we consider (to keep experiments feasible): 10k tuples 19 Freebase relations 10 surface relations for each relation and method: B we rank the tuples subsample B we consider the top-100 predictions and label them manually evaluation metrics: number of true facts MAP (quality of the ranking) methods: B PTF, tensor factorization method B NFE, matrix completion method (context-agnostic) B CORE, uses relations, tuples and entities as variables B CORE+m, +t, +w, +mt, +mtw EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 16 of 21

Results - Freebase relations Relation # PTF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw person/company 208 70 (0.47) 92 (0.81) 91 (0.83) 90 (0.84) 91 (0.87) 92 (0.87) 95 (0.93) 96 (0.94) person/place of birth 117 1 (0.0) 92 (0.9) 90 (0.88) 92 (0.9) 92 (0.9) 89 (0.87) 93 (0.9) 92 (0.9) location/containedby 102 7 (0.0) 63 (0.47) 62 (0.47) 63 (0.46) 61 (0.47) 61 (0.44) 62 (0.49) 68 (0.55) parent/child 88 9 (0.01) 64 (0.6) 64 (0.56) 64 (0.59) 64 (0.62) 64 (0.57) 67 (0.67) 68 (0.63) person/place of death 71 1 (0.0) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.92) person/parents 67 20 (0.1) 51 (0.64) 52 (0.62) 51 (0.61) 49 (0.64) 47 (0.6) 53 (0.67) 53 (0.65) author/works written 65 24 (0.08) 45 (0.59) 49 (0.62) 51 (0.69) 50 (0.68) 50 (0.68) 51 (0.7) 52 (0.67) person/nationality 61 21 (0.08) 25 (0.19) 27 (0.17) 28 (0.2) 26 (0.2) 29 (0.19) 27 (0.18) 27 (0.21) neighbor./neighborhood of 39 3 (0.0) 24 (0.44) 23 (0.45) 26 (0.5) 27 (0.47) 27 (0.49) 30 (0.51) 30 (0.52)... Average MAP 100 # 0.09 0.46 0.47 0.49 0.47 0.49 0.49 0.51 Weighted Average MAP 100 # 0.14 0.64 0.64 0.66 0.67 0.66 0.70 0.70 Weighted Average MAP 0.5 0.55 0.6 0.65 0.7 NFE CORE 0.64 0.64 CORE+m 0.66 CORE+t 0.67 CORE+w 0.66 CORE+mt CORE+mtw 0.70 0.70 EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 17 of 21

Results - surface relations Relation # PTF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw head 162 34 (0.18) 80 (0.66) 83 (0.66) 82 (0.63) 76 (0.57) 77 (0.57) 83 (0.69) 88 (0.73) scientist 144 44 (0.17) 76 (0.6) 74 (0.55) 73 (0.56) 74 (0.6) 73 (0.59) 78 (0.66) 78 (0.69) base 133 10 (0.01) 85 (0.71) 86 (0.71) 86 (0.78) 88 (0.79) 85 (0.75) 83 (0.76) 89 (0.8) visit 118 4 (0.0) 73 (0.6) 75 (0.61) 76 (0.64) 80 (0.68) 74 (0.64) 75 (0.66) 82 (0.74) attend 92 11 (0.02) 65 (0.58) 64 (0.59) 65 (0.63) 62 (0.6) 66 (0.63) 62 (0.58) 69 (0.64) adviser 56 2 (0.0) 42 (0.56) 47 (0.58) 44 (0.58) 43 (0.59) 45 (0.63) 43 (0.53) 44 (0.63) criticize 40 5 (0.0) 31 (0.66) 33 (0.62) 33 (0.7) 33 (0.67) 33 (0.61) 35 (0.69) 37 (0.69) support 33 3 (0.0) 19 (0.27) 22 (0.28) 18 (0.21) 19 (0.28) 22 (0.27) 23 (0.27) 21 (0.27) praise 5 0 (0.0) 2 (0.0) 2 (0.01) 4 (0.03) 3 (0.01) 3 (0.02) 5 (0.03) 2 (0.01) vote 3 2 (0.01) 3 (0.63) 3 (0.63) 3 (0.32) 3 (0.49) 3 (0.51) 3 (0.59) 3 (0.64) Average MAP 100 # 0.04 0.53 0.53 0.51 0.53 0.53 0.55 0.59 Weighted Average MAP 100 # 0.08 0.62 0.61 0.63 0.63 0.61 0.65 0.70 Weighted Average MAP 0.5 0.6 0.7 NFE 0.62 CORE 0.61 CORE+m CORE+t 0.63 0.63 CORE+w 0.61 CORE+mt 0.65 CORE+mtw 0.70 EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 18 of 21

Anecdotal results author(x,y) ranked list of tuples 1 (Winston Groom, Forrest Gump) 2 (D. M. Thomas, White Hotel) 3 (Roger Rosenblatt, Life tself) 4 (Edmund White, Skinned Alive) 5 (Peter Manso, Brando: The Biography) similar relations 0.98 reviews x by y (x,y) 0.97 book by (x,y) 0.95 author of (x,y) 0.95 s novel (x,y) 0.95 s book (x,y) scientist at (x,y) ranked list of tuples 1 (Riordan Roett, Johns Hopkins University) 2 (Dr. R. M. Roberts, University of Missouri) 3 (Linda Mayes, Yale University) 4 (Daniel T. Jones, Cardiff Business School) 5 (Russell Ross, University of owa) similar relations 0.87 scientist (x,y) 0.84 scientist with (x,y) 0.80 professor at (x,y) 0.79 scientist for (x,y) 0.78 neuroscientist at (x,y) semantic similarity of relations is one aspect of our model similar relations treated di erently in di erent contexts EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 19 of 21

Conclusion CORE, a matrix factorization model for open relation extraction that incorporates contextual information based on factorization machines and open-world assumption extensible model, additional contextual information can be integrated when available experimental study suggests that exploiting context can significantly improve prediction performance Source code, datasets, and supporting material are available at https://github.com/fabiopetroni/core EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 20 of 21

Thank you! Questions? Fabio Petroni Sapienza University of Rome, taly Current position: PhD Student in Engineering in Computer Science Research nterests: data mining, machine learning, big data petroni@dis.uniroma1.it EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 21 of 21