Modeling Topics and Knowledge Bases with Embeddings

Similar documents
Improving Topic Models with Latent Feature Word Representations

A Convolutional Neural Network-based

Improving Topic Models with Latent Feature Word Representations

Analogical Inference for Multi-Relational Embeddings

Improving Topic Models with Latent Feature Word Representations

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

ParaGraphE: A Library for Parallel Knowledge Graph Embedding

STransE: a novel embedding model of entities and relationships in knowledge bases

Supplementary Material: Towards Understanding the Geometry of Knowledge Graph Embeddings

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Learning Entity and Relation Embeddings for Knowledge Graph Completion

Neighborhood Mixture Model for Knowledge Base Completion

A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Knowledge Graph Completion with Adaptive Sparse Transfer Matrix

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

DISTRIBUTIONAL SEMANTICS

GloVe: Global Vectors for Word Representation 1

Generative Clustering, Topic Modeling, & Bayesian Inference

Topic Models and Applications to Short Documents

Gaussian Models

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

Natural Language Processing

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

arxiv: v1 [cs.ai] 12 Nov 2018

Towards Understanding the Geometry of Knowledge Graph Embeddings

arxiv: v2 [cs.cl] 1 Jan 2019

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Knowledge Graph Embedding with Diversity of Structures

Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases

arxiv: v2 [cs.cl] 28 Sep 2015

Text mining and natural language analysis. Jefrey Lijffijt

Latent Dirichlet Allocation Based Multi-Document Summarization

Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory

Deep Learning for NLP Part 2

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

Deep Learning for NLP

PROBABILISTIC LATENT SEMANTIC ANALYSIS

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

arxiv: v2 [cs.cl] 3 May 2017

Dimensionality Reduction and Principle Components Analysis

Word Embeddings 2 - Class Discussions

An overview of word2vec

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

arxiv: v2 [cs.lg] 6 Jul 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Distributional Semantics and Word Embeddings. Chase Geigle

Distributed ML for DOSNs: giving power back to users

ATASS: Word Embeddings

Topic Models. Charles Elkan November 20, 2008

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Embeddings Learned By Matrix Factorization

Deep Learning for NLP

A Bayesian Model of Diachronic Meaning Change

Word Representations via Gaussian Embedding. Luke Vilnis Andrew McCallum University of Massachusetts Amherst

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Natural Language Processing SoSe Words and Language Model

STA141C: Big Data & High Performance Statistical Computing

Latent Variable Models in NLP

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

STA141C: Big Data & High Performance Statistical Computing

Historical Analysis of New York Times Foreign News using Semi-Supervised Models. Kohei Watanabe WIAS

Predictive Discrete Latent Factor Models for large incomplete dyadic data

Knowledge Graph Embedding via Dynamic Mapping Matrix

Latent Dirichlet Allocation Introduction/Overview

Lecture 19, November 19, 2012

Latent Dirichlet Allocation (LDA)

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Data Mining & Machine Learning

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Modeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6)

Document and Topic Models: plsa and LDA

Bayesian Paragraph Vectors

Learning to Represent Knowledge Graphs with Gaussian Embedding

The supervised hierarchical Dirichlet process

Lecture 5 Neural models for NLP

Lecture 6: Neural Networks for Representing Word Meaning

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Term Filtering with Bounded Error


1 Inference in Dirichlet Mixture Models Introduction Problem Statement Dirichlet Process Mixtures... 3

From perceptrons to word embeddings. Simon Šuster University of Groningen

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

CS 188: Artificial Intelligence Fall 2008

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

TransG : A Generative Model for Knowledge Graph Embedding

Prepositional Phrase Attachment over Word Embedding Products

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

text classification 3: neural networks

Kernel Density Topic Models: Visual Topics Without Visual Words

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Discriminative Training

Transcription:

Modeling Topics and Knowledge Bases with Embeddings Dat Quoc Nguyen and Mark Johnson Department of Computing Macquarie University Sydney, Australia December 2016 1 / 15

Vector representations/embeddings of words One-hot representation: high-dimensional and sparse vector Vocabulary size: N N-dimensional vector: filled with 0s, except for a 1 at the position associated with word index 2 / 15

Vector representations/embeddings of words Deep learning evolution: most neural network toolkits do not play well with one-hot representations Dense vector: low-dimensional distributed vector representation Vector size k N, for example: k = 100, Vocabulary size N = 100000 3 / 15

Vector representations/embeddings of words Unsupervised learning models are proposed to learn lowdimensional vectors of words efficiently, e.g W2V Skip-gram Word embeddings learned from large external corpora capture various aspects of word meanings Possibly assign topics (i.e. labels) to clusters of similar meaning words Topic 1 for {banking, bank, transaction, finance, money laundering} Can we incorporate word embeddings to topic models? 4 / 15

Improving topic models with word embeddings Topic models take a corpus of documents as input, and Learn a set of latent topics for the corpus Infer document-to-topic and topic-to-word distributions from co-occurrence of words within documents If the corpus is small and/or the documents are short, the topics will be noisy due to the limited information of word co-occurrence IDEA: Use the word embeddings learned on a large external corpus to improve the topic-word distributions in a topic model 5 / 15

Improving topic models with word embeddings Each word w is associated with a pre-trained word embedding ω w Each topic t is associated with a topic embedding τ t We define a latent feature topic-to-word distribution CatE(w) over words: CatE(w t) = exp(ω w τ t ) w V exp(ω w τ t) Optimize the log-loss to learn τ t Our new topic models mix the CatE distribution with a multinomial distribution over words Combine information from a large, general corpus (via the CatE distribution) and a smaller but more specific corpus (via the multinomial distribution) 6 / 15

Improving topic models with word embeddings Combine Latent Dirichlet Allocation (LDA) and Dirichlet Multinomial Mixture (DMM) with the word embeddings: LF-LDA & LF-DMM (LDA) (LF-LDA) (DMM) (LF-DMM) 7 / 15

Improving topic models with word embeddings Topic 1 Topic 3 Topic 4 DMM LF-DMM DMM LF-DMM DMM LF-DMM japan japan u.s. prices egypt libya nuclear nuclear oil sales china egypt u.s. u.s. japan oil u.s iran crisis plant prices u.s. mubarak mideast plant quake stocks profit bin opposition china radiation sales stocks libya protests libya earthquake profit japan laden leader radiation tsunami fed rise france syria u.n. nuke rise gas bahrain u.n. vote crisis growth growth air tunisia korea disaster wall shares report chief europe power street price rights protesters government oil china profits court mubarak election japanese fall rises u.n. crackdown deal plants shares earnings war bahrain (Topic coherence) Significant improvement of topic coherence scores on all models and experimental corpora Obtain 5+% absolute improvements in clustering and classification evaluation scores on the small or short datasets No reliable difference between pre-trained Word2Vec and Glove vectors 8 / 15

Vector representations/embeddings: distance results v king v queen v man v woman v Vietnam v Hanoi v Japan v Tokyo v Germany v Berlin v some relationship, saying: is capital of v Hanoi + v is capital of v Vietnam v Tokyo + v is capital of v Japan v Berlin + v is capital of v Germany Triple (head entity, relation, tail entity) v h + v r v t v h + v r v t l1/2 0 Link prediction in knowledge bases? 9 / 15

STransE: a new embedding model for link prediction Knowledge bases (KBs) of real-world triple facts (head entity, relation, tail entity) are useful resources for NLP tasks Issue: large KBs are still far from complete So it is useful to perform link prediction in KBs or knowledge base completion: predict which triples not in a knowledge base are likely to be true Embedding models for link prediction in KBs: Associate entities and/or relations with dense feature vectors or matrices Obtain SOTA performance and generalize to large KBs 10 / 15

STransE: a new embedding model for link prediction The TransE model (Bordes et al., 2013) represents each relation r by a translation vector v r, which is chosen so that v h + v r v t l1/2 0 Good for 1-to-1 relationships, e.g: is capital of Not good for 1-to-Many, Many-to-1 and Many-to-Many, e.g: gender STransE: a novel embedding model of entities and relationships in KBs Our STransE represents each entity as a low dimensional vector, and each relation by two matrices and a translation vector STransE choose matrices W r,1 and W r,2, and vector v r so that: W r,1 v h + v r W r,2 v t l1/2 0 Optimize a margin-based objective function to learn the vectors and matrices 11 / 15

STransE results for link prediction in KBs We conducted experiments on two benchmark datasets WN18 and FB15k (Bordes et al., 2013) Dataset #E #R #Train #Valid #Test WN18 40,943 18 141,442 5,000 5,000 FB15k 14,951 1,345 483,142 50,000 59,071 Link prediction task: Predict h given (?, r, t) or predict t given (h, r,?) where? denotes the missing element Evaluation metrics: mean rank (MR) and Hits@10 (H10) Method WN18 FB15k MR H10 MR H10 TransE 251 89.2 125 47.1 TransH 303 86.7 87 64.4 TransR 225 92.0 77 68.7 CTransR 218 92.3 75 70.2 KG2E 348 93.2 59 74.0 TransD 212 92.2 91 77.3 TATEC - - 58 76.7 STransE 206 93.4 69 79.7 rtranse - - 50 76.2 PTransE - - 58 84.6 Find a new relation-path based embedding model in our CoNLL paper! 12 / 15

STransE results for search personalization Two users search using the same keywords, they are often looking for different information (i.e. difference due to the users interests) Personalized search customizes results based on user s search history (i.e. submitted queries and clicked documents) Let (q, u, d) represent a triple (query, user, document) Represent the user u by two matrices W u,1 and W u,2 and a vector v u, which represents the user s topical interests, so that: W u,1 v q + v u W u,2 v d l1/2 0 v d and v q are pre-determined by employing the LDA topic model Optimize a margin-based objective function to learn the user embeddings and matrices Metric Search Eng. L2R SP STransE TransE MRR 0.559 0.631 +12.9% 0.656 +17.3% 0.645 +15.4% P@1 0.385 0.452 +17.4% 0.501 +30.3% 0.481 +24.9% 13 / 15

Conclusions Latent feature vector representations induced from large external corpora can be used to improve topic modeling on smaller datasets Our new embedding model STransE for link prediction in KBs W r,1 v h + v r W r,2 v t l1/2 Applying to a search personalization task, STransE helps to significantly improve the ranking quality https://github.com/datquocnguyen 14 / 15

Thank you for your attention! Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson. Improving Topic Models with Latent Feature Word Representations. Transactions of the ACL, 2015. Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proceedings of NAACL-HLT, 2016. Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. Neighborhood Mixture Model for Knowledge Base Completion. In Proceedings of CoNLL, 2016. Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song and Alistair Willis. Search Personalization with Embeddings. In Proceedings of ECIR 2017, to appear. 15 / 15