Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

Similar documents
Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Natural Language Processing

A Convolutional Neural Network-based

CORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni

Analogical Inference for Multi-Relational Embeddings

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016

Learning Entity and Relation Embeddings for Knowledge Graph Completion

Modeling Topics and Knowledge Bases with Embeddings

Lecture 5 Neural models for NLP

arxiv: v2 [cs.cl] 1 Jan 2019

ParaGraphE: A Library for Parallel Knowledge Graph Embedding

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

NEURAL LANGUAGE MODELS

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

A Three-Way Model for Collective Learning on Multi-Relational Data

KGBuilder: A System for Large-Scale Scientific Domain Knowledge Graph Building

arxiv: v2 [cs.cl] 28 Sep 2015

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Dictionary Learning Using Tensor Methods

Intelligent Systems (AI-2)

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Three right directions and three wrong directions for tensor research

Modeling Relation Paths for Representation Learning of Knowledge Bases

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Modeling Relation Paths for Representation Learning of Knowledge Bases

Introduction to Graphical Models

Learning Features from Co-occurrences: A Theoretical Analysis

Probabilistic Logics and Probabilistic Networks

A Discriminative Model for Semantics-to-String Translation

arxiv: v1 [cs.ai] 12 Nov 2018

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

CSC321 Lecture 20: Autoencoders

Intelligent Systems (AI-2)

Deep Learning for NLP Part 2

Machine Learning I Continuous Reinforcement Learning

Conditional Random Field

Machine Learning Basics

ECE521 Lectures 9 Fully Connected Neural Networks

Tutorial on Machine Learning for Advanced Electronics

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

DISTRIBUTIONAL SEMANTICS

Long-Short Term Memory and Other Gated RNNs

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Based on slides by Richard Zemel

Supplementary Material: Towards Understanding the Geometry of Knowledge Graph Embeddings

What s so Hard about Natural Language Understanding?

Lecture 15. Probabilistic Models on Graph

Knowledge Graph Embedding with Diversity of Structures

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Data Mining Techniques

CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents

Knowledge-based Agents. CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents. Outline. Knowledge-based Agents

Learning to Query, Reason, and Answer Questions On Ambiguous Texts

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Logical Agents. Knowledge based agents. Knowledge based agents. Knowledge based agents. The Wumpus World. Knowledge Bases 10/20/14

Machine Learning Techniques for Computer Vision

Machine Learning CPSC 340. Tutorial 12

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

15780: GRADUATE AI (SPRING 2018) Homework 3: Deep Learning and Probabilistic Modeling

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Probabilistic Reasoning in Deep Learning

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Outline. Logical Agents. Logical Reasoning. Knowledge Representation. Logical reasoning Propositional Logic Wumpus World Inference

Scaling Neighbourhood Methods

Dimensionality Reduction and Principle Components Analysis

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

INTRODUCTION TO DATA SCIENCE

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

CS 4700: Foundations of Artificial Intelligence

Classification with Perceptrons. Reading:

PV211: Introduction to Information Retrieval

NEAL: A Neurally Enhanced Approach to Linking Citation and Reference

Semi-supervised learning for node classification in networks

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Learning Tetris. 1 Tetris. February 3, 2009

Lecture 13: Structured Prediction

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing with Deep Learning CS224N/Ling284

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

A brief introduction to Conditional Random Fields

Using Joint Tensor Decomposition on RDF Graphs

CS 570: Machine Learning Seminar. Fall 2016

From Non-Negative Matrix Factorization to Deep Learning

Latent Semantic Analysis. Hongning Wang

CSC321 Lecture 7 Neural language models

Introduction to Information Retrieval

CS540 ANSWER SHEET

Word Alignment via Submodular Maximization over Matroids

Variational Autoencoders

ML4NLP Multiclass Classification

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Transcription:

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

Probabilistic Models: Downsides Limitation to Logical Relations Embeddings Representation restricted by manual design Clustering? Assymetric implications? Information flows through these s Difficult to generalize to unseen entities/s Everything as dense vectors Can capture many s Learned from data Computational Complexity of Algorithms Complexity depends on explicit dimensionality Often NP-Hard, in size of data More rules, more expensive inference Query-time inference is sometimes NP-Hard Not trivial to parallelize, or use GPUs Complexity depends on latent dimensions Learning using stochastic gradient, back-propagation Querying is often cheap GPU-parallelism friendly 2

Two Related Tasks surface pattern Relation Extraction surface pattern Graph Completion 3

Two Related Tasks surface pattern Relation Extraction surface pattern Graph Completion 4

Relation Extraction From Text John was born in Liverpool, to Julia and Alfred Lennon. born in, to Alfred Lennon Liverpool was born in John Lennon was born to and was born to born in, to Julia Lennon 5

Relation Extraction From Text John was born in Liverpool, to Julia and Alfred Lennon. livedin Alfred Lennon born in, to Liverpool was born in birthplace John Lennon childof was born to childof was born to and born in, to livedin Julia Lennon 6

Distant Supervision Liverpool was born in birthplace John Lennon No direct supervision gives us this information. Supervised: Too expensive to label sentences Rule-based: Too much variety in language Both only work for a small set of s, i.e. 10s, not 100s Honolulu is native to was born in birthplace visited met the senator from Barack Obama 7

Relation Extraction as a Matrix John was born in Liverpool, to Julia and Alfred Lennon. John Lennon, Liverpool 1? John Lennon, Julia Lennon 1 Entity Pairs John Lennon, Alfred Lennon Julia Lennon, Alfred Lennon Barack Obama, Hawaii 1 1 1 1? Barack Obama, Michelle Obama 1 1 Universal Schema, Riedel et al, NAACL (2013) 8

Matrix Factorization n m s n k k m s pairs pairs X bornin(john,liverpool) Universal Schema, Riedel et al, NAACL (2013) 9

Training: Stochastic Updates s s pairs R 0 (x, y) R(i, j) pairs Pick an observed cell, R(i, j) : Update p ij & r R such that R(i, j) is higher Pick any random cell, assume it is negative: Update & such that is lower p xy r R 0 R 0 (x, y) 10

Relation Embeddings 11

Embeddings ~ Logical Relations Relation Embeddings, w Similar embedding for 2 s denote they are paraphrases is married to, spouseof(x,y), /person/spouse One embedding can be contained by another w(topemployeeof) w(employeeof) topemployeeof(x,y) employeeof(x,y) Can capture logical patterns, without needing to specify them! Entity Pair Embeddings, v Similar entity pairs denote similar s between them Entity pairs may describe multiple s independent foundedby and employeeof s From Sebastian Riedel 12

Similar Embeddings similar underlying embedding X own percentage of Y X buy stake in Y similar embedding Time, Inc Amer. Tel. and Comm. Volvo Scania A.B. Campeau Federated Dept Stores Apple HP 1 1 1 Successfully predicts Volvo owns percentage of Scania A.B. from Volvo bought a stake in Scania A.B. From Sebastian Riedel 13

Implications X historian at Y X professor at Y X professor at Y X historian at Y (Freeman,Harvard) (Boyle,OhioState) Kevin Boyle Ohio State R. Freeman Harvard 1 1 Learns asymmetric entailment: PER historian at UNIV PER professor at UNIV But, PER professor at UNIV PER historian at UNIV From Sebastian Riedel 14

Two Related Tasks surface pattern Relation Extraction surface pattern Graph Completion 15

Graph Completion livedin Alfred Lennon born in, to Liverpool was born in birthplace John Lennon childof was born to childof was born to and born in, to livedin Julia Lennon 16

Graph Completion livedin Alfred Lennon childof Liverpool birthplace John Lennon childof spouse spouse livedin Julia Lennon 17

Tensor Formulation of KG R Does an unseen exist? E e1 r e2 E 18

Factorize that Tensor E R k k k E E R E S(r(a, b)) = f(v r, v a, v b ) 19

Many Different Factorizations CANDECOMP/PARAFAC-Decomposition S (r(a, b)) = X k R r,k e a,k e b,k Tucker2 and RESCAL Decompositions Model E S (r(a, b)) = (R r e a ) e b Holographic Embeddings S (r(a, b)) = R r,1 e a + R r,2 e b Not tensor factorization (per se) S(r(a, b)) = R r (e a? e b ) HOLE: Nickel et al, AAAI (2016), Model E: Riedel et al, NAACL (2013), RESCAL: Nickel et al, WWW (2012), CP: Harshman (1970), Tucker2: Tucker (1966) 20

Translation Embeddings TransE birthplace r Honolulu e2 S (r(a, b)) = ke a + R r e b k 2 2 TransH e1 Barack Obama Liverpool S (r(a, b)) = ke? a + R r e? b k 2 2 e? a = e a w T r e a w r birthplace TransR John Lennon S (r(a, b)) = ke a M r + R r e b M r k 2 2 TransE: Bordes et al. XXX (2011), TransH: Bordes et al. XXX (2011), TransR: Bordes et al. XXX (2011) 21

Parameter Estimation R Observed cell: increase score S (r(a, b)) E e1 r e2 E Unobserved cell: decrease score S (r 0 (x, y)) 22

Matrix vs Tensor Factorization Vectors for each entity pair Can only predict for entity pairs that appear in text together No sharing for same entity in different entity pairs Vectors for each entity Assume entity pairs are low-rank But many s are not! Spouse: you can have only ~1 Cannot learn pair specific information 23

What they can, and can t, do.. Red: deterministically implied by Black - needs pair-specific embedding - Only F is able to generalize Green: needs to estimate entity types - needs entity-specific embedding - Tensor factorization generalizes, F doesn't Blue: implied by Red and Green - Nothing works much better than random From Singh et al. VSM (2015), http://sameersingh.org/files/papers/mftf-vsm15.pdf 24

Joint Extraction+Completion surface pattern Relation Extraction surface pattern Joint Model Graph Completion 25

Compositional Neural Models So far, we re learning vectors for each entity/surface pattern/.. But learning vectors independently ignores composition Composition in Surface Patterns Every surface pattern is not unique Synonymy: A is B s spouse. A is married to B. Composition in Relation Paths Every path is not unique Explicit: A parent B, B parent C A grandparent C Inverse: X is Y s parent. Y is one of X s children. Implicit: X bornincity Y, Y cityinstate Z X borninstate Z Can the representation learn this? Can the representation capture this? 26

Composing Dependency Paths was born to s parents are \parentsof (never appears in training data) But we don t need linked data to know they mean similar things Use neural networks to produce the embeddings from text! NN NN was born to s parents are \parentsof Verga et al (2016), https://arxiv.org/pdf/1511.06396v2.pdf 27

Composing Relational Paths countrybasedin statebasedin NN NN isbasedin statelocatedin countrylocatedin Microsoft Seattle Washington USA Neelakantan et al (2015), http://www.aaai.org/ocs/index.php/sss/sss15/paper/viewfile/10254/10032 Lin et al, EMNLP (2015), https://arxiv.org/pdf/1506.00379.pdf 28

Review: Embedding Techniques Two Related Tasks: Relation Extraction from Text Graph (or Link) Completion Relation Extraction: Matrix Factorization Approaches Graph Completion: Tensor Factorization Approaches Compositional Neural Models Compose over dependency paths Compose over paths 29

Using Tutorial Embeddings Overview in MLNs Part 1: Knowledge Graphs Part 2: Knowledge Extraction Part 3: Graph Construction Part 4: Critical Analysis 302