ParaGraphE: A Library for Parallel Knowledge Graph Embedding

Similar documents
arxiv: v1 [cs.ai] 12 Nov 2018

A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

arxiv: v2 [cs.cl] 28 Sep 2015

Analogical Inference for Multi-Relational Embeddings

TransG : A Generative Model for Knowledge Graph Embedding

Knowledge Graph Completion with Adaptive Sparse Transfer Matrix

Locally Adaptive Translation for Knowledge Graph Embedding

Knowledge Graph Embedding with Diversity of Structures

STransE: a novel embedding model of entities and relationships in knowledge bases

Learning Entity and Relation Embeddings for Knowledge Graph Completion

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

REVISITING KNOWLEDGE BASE EMBEDDING AS TEN-

A Convolutional Neural Network-based

Supplementary Material: Towards Understanding the Geometry of Knowledge Graph Embeddings

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

Rule Learning from Knowledge Graphs Guided by Embedding Models

Neighborhood Mixture Model for Knowledge Base Completion

Jointly Embedding Knowledge Graphs and Logical Rules

Modeling Topics and Knowledge Bases with Embeddings

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Knowledge Graph Embedding via Dynamic Mapping Matrix

Towards Understanding the Geometry of Knowledge Graph Embeddings

TransT: Type-based Multiple Embedding Representations for Knowledge Graph Completion

Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories

arxiv: v2 [cs.lg] 6 Jul 2017

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases

Modeling Relation Paths for Representation Learning of Knowledge Bases

Modeling Relation Paths for Representation Learning of Knowledge Bases

PROBABILISTIC KNOWLEDGE GRAPH EMBEDDINGS

Mining Inference Formulas by Goal-Directed Random Walks

Learning Multi-Relational Semantics Using Neural-Embedding Models

arxiv: v4 [cs.cl] 24 Feb 2017

Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory

L 2,1 Norm and its Applications

Relation path embedding in knowledge graphs

arxiv: v2 [cs.cl] 3 May 2017

NEAL: A Neurally Enhanced Approach to Linking Citation and Reference

William Yang Wang Department of Computer Science University of California, Santa Barbara Santa Barbara, CA USA

Cutting Plane Training of Structural SVM

Towards Time-Aware Knowledge Graph Completion

Variable Selection in Data Mining Project

An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge

KGBuilder: A System for Large-Scale Scientific Domain Knowledge Graph Building

SVRG++ with Non-uniform Sampling

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

Iterative Laplacian Score for Feature Selection

Importance Reweighting Using Adversarial-Collaborative Training

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Preserving Privacy in Data Mining using Data Distortion Approach

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

arxiv: v1 [cs.lg] 20 Jan 2019 Abstract

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Shape of Gaussians as Feature Descriptors

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

Experimental and numerical simulation studies of the squeezing dynamics of the UBVT system with a hole-plug device

CSC 576: Variants of Sparse Learning

Lecture 10: Support Vector Machine and Large Margin Classifier

Case Doc 1059 Filed 11/26/18 Entered 11/26/18 11:05:25 Desc Main Document Page 1 of 9

UPDATING AND REFINEMENT OF NATIONAL 1: DEM. National Geomatics Center of China, Beijing

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

Variational Knowledge Graph Reasoning

Weighted Fuzzy Time Series Model for Load Forecasting

Block stochastic gradient update method

Aspect Term Extraction with History Attention and Selective Transformation 1

Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

Deep Learning for NLP

Inverse Time Dependency in Convex Regularized Learning

Lock-Free Approaches to Parallelizing Stochastic Gradient Descent

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

Dreem Challenge report (team Bussanati)

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

Low Rank Matrix Completion Formulation and Algorithm

Type-Sensitive Knowledge Base Inference Without Explicit Type Supervision

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Discriminative Direction for Kernel Classifiers

RaRE: Social Rank Regulated Large-scale Network Embedding

Aitso: an artificial immune systems tool for spatial optimization

Supporting Information for

Machine Learning Basics

Learning Multi-Relational Semantics Using Neural-Embedding Models

Learning to Represent Knowledge Graphs with Gaussian Embedding

KAI ZHANG, SHIYU LIU, AND QING WANG

SUPPORT VECTOR MACHINE

Learning gradients: prescriptive models

Consolidation properties of dredger fill under surcharge preloading in coast region of Tianjin

ML (cont.): SUPPORT VECTOR MACHINES

Kai Yu NEC Laboratories America, Cupertino, California, USA

Fantope Regularization in Metric Learning

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley

Transcription:

ParaGraphE: A Library for Parallel Knowledge Graph Embedding Xiao-Fan Niu, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University, China niuxf@lamda.nju.edu.cn, liwujun@nju.edu.cn arxiv:1703.05614v3 [cs.ai] 5 Apr 2017 April 6, 2017 Abstract Knowledge graph embedding aims at translating the knowledge graph into numerical representations by transforming the entities and relations into continuous low-dimensional vectors. Recently, many methods [1, 5, 3, 2, 6] have been proposed to deal with this problem, but existing single-thread implementations of them are time-consuming for large-scale knowledge graphs. Here, we design a unified parallel framework to parallelize these methods, which achieves a significant time reduction without influencing the accuracy. We name our framework as ParaGraphE, which provides a library for parallel knowledge graph embedding. The source code can be downloaded from https: //github.com/libble/libble-multithread/tree/master/paragraphe. 1 Introduction Knowledge graph is widely used for representing knowledge in artificial intelligence. A knowledge graph contains a set of entities denoted as E, and a set of relations denoted as R. Each knowledge in a knowledge graph is represented in a triple form (head, label, tail), which is typically denoted as (h, r, t), with h, t E and r R, meaning that h has a relation r to t. Knowledge graph usually suffers from incompleteness. A main task of knowledge graph research is to complete it by predicting potential relations based on existing observed triples. Traditional methods based on logic and symbols are neither tractable nor robust to deal with large-scale knowledge graphs. Recently, knowledge graph embedding is introduced to solve this problem by encoding entities and relations into continuous low-dimensional vectors, and to perform reasoning in a simple linear algebra way. Some representative knowledge graph embedding methods include TransE [1], TransH [5], TransR [3], TransD [2] and ManifoldE [6]. Although these methods have achieved promising performance on knowledge graph completion, the existing single-thread implementations of them are time-consuming for large-scale knowledge graphs. Here, we design a unified parallel framework to parallelize these methods, which achieves a significant time reduction without influencing the accuracy. We name our framework as ParaGraphE, which provides a library for parallel knowledge graph embedding. The source code can be downloaded from https://github.com/libble/libble-multithread/tree/master/paragraphe. 1

2 Implemented Methods We first introduce the unified framework of our parallel implementations, and then point out the detailed difference between different methods. 2.1 Framework All the methods mentioned above try to minimize a margin-based rank loss which can be formulated as follows: L = (h,r,t) S (h,r,t ) S [s(h, r, t) + γ s(h, r, t )] + (1) s.t. some constraints where S is a set of golden (positive) triples, S is a set of corrupted (negative) triples which are usually constructed by replacing h or t of a golden triplet (h, r, t) with another entity, γ is the margin, s(h, r, t) is the score function of triplet (h, r, t) which is differently defined in different methods. The unified parallel learning framework of ParaGraphE is shown in Algorithm 1. Algorithm 1 ParaGraphE Initialization: p threads, embeddings of each entity and relation for epoch = 1; epoch < num epoches; epoch + + do for each thread do i=1 repeat sample a golden triple (h, r, t) construct a corrupted triple (h, r, t ) if s(h, r, t) + γ s(h, r, t ) > 0 then calculate the gradient of the embeddings of h, t, h, t, r subtract the gradient from the corresponding embeddings end if normalize h, t, h, t, r according to the constraints i++ until i== number of triples a thread handles in an epoch end for end for save the embeddings into files ParaGraphE is implemented based on the lock-free strategies in [4, 7]. 2.2 Implemented Methods Any method with a similar margin-based rank loss in (1) can be easily implemented in ParaGraphE. Here, we implement some representative knowledge graph embedding methods include TransE [1], TransH [5], TransR [3], TransD [2] and ManifoldE [6]. All these methods can be formulated with the loss in (1). The differences between these methods only lie in the score functions and constraints. TransE [1]: 2

embeddings: k-dimensional vectors for both entities and relations. Let us denote the embeddings for h, r, t as h, r, t. score function: s(h, r, t) = h + r t. Either L1-norm or L2-norm can be employed. constraints: h 2 = 1, t 2 = 1, r 2 = 1. TransH [5]: embeddings: k-dimensional vectors h, t for entities h, t. A normal vector w r and a translation vector d r, both in R k, are associated with relation r. score function: s(h, r, t) = (h w r hw r ) + d r (t w r tw r ). constraints: h 2 = 1, t 2 = 1, w r 2 = 1, d r 2 = 1 and w r d r ɛ. Here, ɛ is a hyper-parameter. TransR [3]: embeddings: k-dimensional vector h, t for entities h, t. A translation vector r R d and a projection matrix M r R k d are associated with relation r. score function: s(h, r, t) = M r h + r M r t. constraints: h 2 = 1, t 2 = 1, r 2 = 1, M r h 2 1, M r t 2 1. TransD [2]: embeddings: Two vectors are associated with each entity and relation, i.e., {h, h p R k } for entity h, {t, t p R k } for entity t, and {r, r p R d } for relation r. Subscript p denotes the projection vectors. score function: s(h, r, t) = (h + h p hr p ) + r (t + t p tr p )) constraints: h 2 = 1, t 2 = 1, r 2 = 1, h + h p hr p 2 1, t + t p tr p 2 1 SphereE [6]: embeddings: k-dimensional vectors for both entities and relations, i.e., h, r, t R k for h, r, t. score function: s(h, r, t) = M(h, r, t) D r 2. M(h, r, t) = h + r t. D r is a relation-specific parameter. constraints: h 2 = 1, t 2 = 1, r 2 = 1. Note that SphereE is an implementation for the method sphere of the ManifoldE [6]. For more details about these methods, please refer to their original papers. 3 Experiment Results We use 10 threads to run the experiments on two widely used knowledge graph datasets WN18 and FB15k. Table 1 lists the statistics of these two datasets. The metrics for evaluation contain those popular metrics adopted in existing knowledge graph embedding papers. They are: mr: the value of mean rank. 3

Table 1: Datasets used in the experiments. Dataset #Rel #Ent #Train #Valid #Test WN18 18 40943 141442 5000 5000 FB15k 1345 14951 483142 50000 59071 mrr: the value of mean reciprocal rank. hits@10: the proportion of ranks no larger than 10. hits@1: the proportion of ranks list at first. The experiment results are shown in Table 2 and Table 3. Here, raw denotes the metrics calculated on all corrupted triples, filter denotes the metrics calculated on corrupted triples without those already existing in knowledge graph. Table 2: Experiment results on WN18. method time(s) raw filter mr mrr hits@1 hits@10 mr mrr hits@1 hits@10 TransE 22.6 216 0.269 0.052 0.713 204 0.346 0.080 0.830 TransH 18.8 230 0.318 0.110 0.729 217 0.411 0.163 0.853 TransR 165.0 224 0.396 0.202 0.774 211 0.544 0.320 0.920 TransD 144.2 196 0.371 0.168 0.762 181 0.510 0.274 0.909 SphereE 47.1 607 0.579 0.455 0.807 593 0.871 0.819 0.941 Table 3: Experiment results on FB15k. method time(s) raw filter mr mrr hits@1 hits@10 mr mrr hits@1 hits@10 TransE 61.3 235 0.162 0.073 0.344 120 0.253 0.145 0.468 TransH 89.1 222 0.188 0.086 0.405 87 0.326 0.199 0.579 TransR 226.7 220 0.204 0.098 0.436 75 0.378 0.242 0.640 TransD 149.7 203 0.204 0.096 0.439 66 0.379 0.241 0.650 SphereE 147.8 214 0.252 0.125 0.513 117 0.464 0.331 0.699 We can find that all the accuracy is very close to the accuracy reported in the original papers, except for SphereE on FB15k where the original paper only reports a result using a polynomial kernel. We can also find that our implementations are much faster than those in the original papers. Figure 1 shows the change of epoch loss when running TransR on dataset FB15k with ParaGraphE by setting #threads = 1, 2, 5, 8, 10. Here, the epoch loss is the sum of rank-based hinge loss on the sampled triples in an epoch. We can find that adopting multi-thread learning does not change the convergence of the training procedure. Other embedding methods have similar phenomenon. Figure 2 shows the speedup of training TransR on both datasets with ParaGraphE by running 500 epoches. We can find that ParaGraphE achieves around 8x speedup with 10 threads (cores) on both datasets. 4

(a) epoch loss vs time (b) epoch loss vs #epoches Figure 1: Epoch loss change when training TransR on FB15k. (a) WN18 (b) FB15k Figure 2: The speedup of training TransR on WN18 and FB15k. 4 Conclusion We have designed a unified framework called ParaGraphE to parallelize knowledge graph embedding methods. Our implementations of several existing methods can achieve a significant time reduction without influencing the accuracy. In our future work, we will implement other knowledge graph embedding methods with the framework ParaGraphE. Moreover, besides knowledge graphs, ParaGraphE is actually general enough to be applied for other kinds of graphs, which will also be empirically evaluated in our future work. References [1] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In NIPS, pages 2787 2795, 2013. 5

[2] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In ACL, pages 687 696, 2015. [3] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, pages 2181 2187, 2015. [4] Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693 701, 2011. [5] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In AAAI, pages 1112 1119, 2014. [6] Han Xiao, Minlie Huang, Hao Yu, and Xiaoyan Zhu. From one point to a manifold: Knowledge graph embedding for precise link prediction. In IJCAI, pages 1315 1321, 2016. [7] Shen-Yi Zhao, Gong-Duo Zhang, and Wu-Jun Li. Lock-free optimization for nonconvex problems. In AAAI, pages 2935 2941, 2017. 6