A Three-Way Model for Collective Learning on Multi-Relational Data

Similar documents
Data Mining and Matrices

Using Joint Tensor Decomposition on RDF Graphs

PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016

Multi-Label Informed Latent Semantic Indexing

Reducing the Rank of Relational Factorization Models by Including Observable Patterns

Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?

Accessing the Semantic Web via Statistical Machine Learning

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Introduction to Tensors. 8 May 2014

Computational Linear Algebra

Probabilistic Latent-Factor Database Models

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Learning Multi-Relational Semantics Using Neural-Embedding Models

Large-Scale Factorization of Type-Constrained Multi-Relational Data

A latent factor model for highly multi-relational data

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Deep Convolutional Neural Networks for Pairwise Causality

Faloutsos, Tong ICDE, 2009

Statistical Predicate Invention

Knowledge Graph Completion with Adaptive Sparse Transfer Matrix

The Canonical Tensor Decomposition and Its Applications to Social Network Analysis

RaRE: Social Rank Regulated Large-scale Network Embedding

Linear Algebra (Review) Volker Tresp 2018

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

REVISITING KNOWLEDGE BASE EMBEDDING AS TEN-

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

Integrating Ontological Prior Knowledge into Relational Learning

Tensor Decompositions and Applications

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

Intelligent Data Analysis. Mining Textual Data. School of Computer Science University of Birmingham

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Scalable Tensor Factorizations with Incomplete Data

arxiv: v2 [cs.cl] 28 Sep 2015

Medical Question Answering for Clinical Decision Support

ECE521 Lectures 9 Fully Connected Neural Networks

Clustering based tensor decomposition

arxiv: v1 [cs.lg] 3 Jan 2017

Dictionary Learning Using Tensor Methods

A Review of Relational Machine Learning for Knowledge Graphs

Short Note: Naive Bayes Classifiers and Permanence of Ratios

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Logic and machine learning review. CS 540 Yingyu Liang

The Perceptron. Volker Tresp Summer 2016

Mixed Membership Matrix Factorization

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Nonparametric Latent Feature Models for Link Prediction

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases

arxiv: v1 [cs.ai] 12 Nov 2018

Intelligent Data Analysis Lecture Notes on Document Mining

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

Topic Models and Applications to Short Documents

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

The Perceptron. Volker Tresp Summer 2014

Scalable Gaussian process models on matrices and tensors

Data Mining and Matrices

Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories

COLLECTIVE PAIRWISE CLASSIFICATION FOR MULTI-WAY ANALYSIS OF DISEASE AND DRUG DATA

Multi-Output Regularized Projection

Transductive Experiment Design

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Multiway Analysis of Bridge Structural Types in the National Bridge Inventory (NBI) A Tensor Decomposition Approach

Orthogonal tensor decomposition

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

Neural Word Embeddings from Scratch

Collaborative topic models: motivations cont

Analogical Inference for Multi-Relational Embeddings

Latent Geographic Feature Extraction from Social Media

Learning Entity and Relation Embeddings for Knowledge Graph Completion

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,

Adaptive Crowdsourcing via EM with Prior

Machine Learning - MT & 14. PCA and MDS

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn

ECS289: Scalable Machine Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

arxiv: v1 [cs.lg] 6 May 2016

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

Mixed Membership Matrix Factorization

Protein function prediction via graph kernels

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Stat 315c: Introduction

Latent Dirichlet Allocation

Locally Adaptive Translation for Knowledge Graph Embedding

Detecting Statistical Interactions from Neural Network Weights

Incentive Compatible Regression Learning

DISTRIBUTIONAL SEMANTICS

Linear Algebra (Review) Volker Tresp 2017

Chemical Space: Modeling Exploration & Understanding

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Transcription:

A Three-Way Model for Collective Learning on Multi-Relational Data 28th International Conference on Machine Learning Maximilian Nickel 1 Volker Tresp 2 Hans-Peter Kriegel 1 1 Ludwig-Maximilians Universität, Munich 2 Siemens AG, Corporate Technology, Munich June 30th, 2011 Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 1 / 17

Outline 1 Introduction 2 RESCAL 3 Experiments 4 Summary Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 2 / 17

Introduction Multi-Relational Data Multi-relational data is a part of many different important fields of application, such as Computational Biology, Social Networks, the Semantic Web, the Linked Data cloud (shown below) and many more Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 3 / 17

Introduction Motivation to use Tensors for Relational Learning Why Tensors? Modelling simplicity: Multiple binary relations can be expressed straightforwardly as a three-way tensor No structure learning: Not necessary to have information about independent variables, knowledge bases, etc. or to infer it from data Expected performance: Relational domains are high-dimensional and sparse, a setting where factorization methods have shown very good results Problem: Tensor factorizations like CANDECOMP/PARAFAC (CP) or Tucker can not perform collective learning or in the case of DEDICOM have unreasonable constraints for relational learning. (For an excellent review on tensors see (Kolda and Bader, 2009)) Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 4 / 17

Introduction Modelling and Terminology Modelling binary relations as a tensor: Two modes of a tensor refer to the entities, one mode to the relations. The entries of the tensor are 1 when a relation between two entities exists and 0 otherwise We use the RDF formalism to model relations as (subject, predicate, object) triples Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 5 / 17

Outline 1 Introduction 2 RESCAL 3 Experiments 4 Summary Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 6 / 17

RESCAL Tensor Factorizaion RESCAL takes the inherent structure of dyadic relational data into account, by employing the tensor factorization X k AR k A T A is a n r matrix, representing the global entity-latent-component space R k is an asymmetric r r matrix that specifies the interaction of the latent components per predicate Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 7 / 17

RESCAL Tensor Factorizaion RESCAL takes the inherent structure of dyadic relational data into account, by employing the tensor factorization X k AR k A T A is a n r matrix, representing the global entity-latent-component space R k is an asymmetric r r matrix that specifies the interaction of the latent components per predicate Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 7 / 17

RESCAL Solving canonical relational learning tasks Link Prediction: To predict the existence of a relation between two entities, it is sufficient to look at the rank-reduced reconstruction of the appropriate slice AR k A T Collective Classification: Can be cast as a link prediction problem by including the classes as entities and adding a classof relation. Alternatively, standard classification algorithms could be applied to the entites latent-component representation A Link-based Clustering: Since the entities latent-component representation is computed considering all relations, Link-based clustering can be done by clustering the entities in the latent-component space A Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 8 / 17

RESCAL Computing the factorization To compute the factorization, we solve the optimization problem where loss is the loss function min loss(a, R k ) + reg(a, R k ) A,R k loss(a, R k ) = 1 X k AR k A T 2 F 2 k and reg is the regularization term reg(a, R k ) = 1 2 λ ( A 2 F + k R k 2 F ) Efficient alternating-least squares algorithm based on ASALSAN (Bader et al., 2007) Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 9 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k a j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k a j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k a j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k a j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k a j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill John Party X Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k b j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning Example Predict membership of US (vice) presidents Bill subject John subject Bill object Party X John object Al Lyndon Helpful to consider element-wise version of the loss function f f (A, R k ) = 1 ) 2 (X ijk a T i R k b j 2 i,j,k Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 10 / 17

RESCAL Collective Learning with RESCAL Collective learning is performed via the entities latent-component representation Important aspect of the model: Entities have a unique latent-component representation, regardless of their occurrence as subjects or objects Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 11 / 17

Outline 1 Introduction 2 RESCAL 3 Experiments 4 Summary Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 12 / 17

Experiments Predicting the membership of US (vice) presidents Task: Predict membership of US (vice) presidents No other information included in the data other than the membership and who is (vice) president of whom Prediction of membership 1.0 AUC 0.8 0.6 0.4 0.44 0.64 0.48 0.74 0.78 0.2 0.16 0.0 Random CP DEDICOM SUNS SUNS+AG RESCAL Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 13 / 17

Experiments Comparison to state-of-the-art approaches Task: Perform link prediction on the IRM datasets Kinships, UMLS and Nations Comparison to MRC (Kok & Domingos, 2007), IRM (Kemp et al., 2007) and BCTF (Sutskever et al., 2009) as well as CP and DEDICOM Kinships UMLS Nations 1.0 0.94 0.95 0.90 0.85 0.8 0.69 0.66 1.0 0.95 0.95 0.98 0.98 0.98 0.8 0.70 1.0 0.8 0.83 0.81 0.75 0.75 0.84 AUC 0.6 AUC 0.6 AUC 0.6 0.4 0.4 0.4 0.2 0.0 CP DEDICOM BCTF IRM MRC RESCAL 0.2 0.0 CP DEDICOM BCTF IRM MRC RESCAL 0.2 0.0 CP DEDICOM IRM MRC RESCAL Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 14 / 17

Experiments Runtime and Implementation RESCAL-ALS algorithm features very fast training times Dataset Entities Relations Total Runtime in seconds Rank 10 20 40 Kinships 104 26 1.1 3.7 51.2 Nations 125 57 1.7 5.3 54.4 UMLS 135 49 2.6 4.9 72.3 Cora 2497 7 364 348 680 Table: Average runtime to compute a rank-r factorization in RESCAL Implementation uses only standard matrix operations Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 15 / 17

Outline 1 Introduction 2 RESCAL 3 Experiments 4 Summary Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 16 / 17

Summary RESCAL is an tensor-based relational learning approach capable of collective learning Collective learning mechanism works through information propagation via the entities latent-component representations Good performance compared to current state-of-the-art relational learning approaches Fast training times and simple Implementation Code available at http://www.cip.ifi.lmu.de/~nickel Thank you! Nickel, Tresp, Kriegel A Three-Way Model for Collective Learning June 30th, 2011 17 / 17