Analogical Inference for Multi-Relational Embeddings

Size: px

Start display at page:

Download "Analogical Inference for Multi-Relational Embeddings"

Rosemary Briggs
5 years ago
Views:

1 Analogical Inference for Multi-Relational Embeddings Hanxiao Liu, Yuexin Wu, Yiming Yang Carnegie Mellon University August 8, 2017 nalogical Inference for Multi-Relational Embeddings 1 / 19

2 Task Description Multi-Relational Embeddings: Finding latent representations of entities and relations. Useful for knowledge base completion (by discovering missing facts), etc. Novel Contribution: Instead of tradition rule-based AI, we impose analogical structures in the learning of entity/relation embedding. nalogical Inference for Multi-Relational Embeddings 2 / 19

3 Why Analogy? (a toy example) nucleus surrounded by electrons scale down attract scale down made of sun surrounded by planets charge attract made of scale down mass Figure: Solar System (blue) v.s. Atomic System (red). Knowing the relational structure in one system will help us to understand the other system by analogy. Analogical Inference for Multi-Relational Embeddings 3 / 19

4 Basic Formulation Denote by vector v e the embedding of entity e. Denote by matrix W r in the embedding of relation r. Assume all valid subject-relation-object (s,r,o) triples approximately satisfy v s W r v o (1) Define the scoring function of any (s,r,o) triple as: φ(s, r, o) = v s W r, v o = v s W r v o (2) Analogical Inference for Multi-Relational Embeddings 4 / 19

5 Real Normal Matrices as Desirable The family of matrices satisfying: Wr W r = W r Wr (3) Special cases: 1. Symmetric Matrices φ(s, r, o) = φ(o, r, s). E.g. is identical. 2. Skew-symmetric Matrices φ(s, r, o) = φ(o, r, s). E.g. is parent of. 3. Orthogonal Matrices Useful if r is a bijection (one-to-one mapping). Analogical Inference for Multi-Relational Embeddings 5 / 19

6 Commutative Matrices as Necessary Observation: Analogical structures often imply parallelograms, e.g., man is to king as woman is to queen Or, in an abstract notion: a is to b as c is to d a r r c b r r d Given the parallelogram, if we know a r b and a r c, then c r d and b r d can be inferred by symmetry. nalogical Inference for Multi-Relational Embeddings 6 / 19

7 Commutative Matrices as Necessary (cont d) Mathematically, the necessary condition for having an analogical structure is the commutativity of relations: r r = r r (4) a r r c b r r d Equivalently, we want the following constraint: W r W r = W r W r (5) nalogical Inference for Multi-Relational Embeddings 7 / 19

8 Optimization: Straightforward Formulation Notation: Label y = +1 for positive examples and 1 otherwise; Data distribution D; Loss function l. min E s,r,o,y D l (φ v,w (s, r, o), y) (6) v,w s.t. W r W r = W r W r r (7) W r W r = W r W r r, r (8) (7) follows the definition of normal matrices. (8) is for the communicative property. The OPT is expensive due to (i) W r s are fully dense matrices (ii) large number of equality constraints. nalogical Inference for Multi-Relational Embeddings 8 / 19

9 Optimization: Complexity Reduced Version Solution v, W for the previous OPT can be exactly recovered by solution v, W of the following problem: Most notably, min E s,r,o,y D l (φ v v,w,w (s, r, o), y) (9) We show that any W r must be block-diagonal with the diagonal block sizes bounded by 2. O(m) free parameters in the m m matrix. We now have an unconstrained optimization instead. Efficiently solved using SGD without projection. nalogical Inference for Multi-Relational Embeddings 9 / 19

10 A Unified View of Existing Work We explain the strong empirical performance of DistMult (Yang et al., ICLR 2015) ComplEx (Trouillon et al., ICML 2016) HolE (Nickel et al., AAAI 2016) by showing that they are implicitly imposing analogical structures and are restricted cases of ours. nalogical Inference for Multi-Relational Embeddings 10 / 19

11 Connections to Existing Work Multiplicative Embeddings (DistMult) φ(s, r, o) = v s, v r, v o (10) where v s, v r, v o R m, s, r, o (11) DistMult embeddings of size m can be fully recovered by ANALOGY embeddings of size m. Intuition: v r can be viewed as a diagonal W r def = diag(v r ). Diagonal matrices are always commutative. nalogical Inference for Multi-Relational Embeddings 11 / 19

12 Connections to Existing Work Complex Embeddings (ComplEx) φ(s, r, o) = R ( v s, v r, v o ) (12) where v s, v r, v o C m, s, r, o (13) ComplEx embeddings of size m can be fully recovered by ANALOGY embeddings of size 2m. Intuition: ( there ) exists a bijection between any a + bj C a b and R b a 2 2. Analogical Inference for Multi-Relational Embeddings 12 / 19

13 Connections to Existing Work Holographic Embeddings (HolE) φ(s, r, o) = v r, v s v o (14) where v s, v r, v o R m, s, r, o (15) HolE embeddings can be equivalently obtained via φ(s, r, o) = R ( v s, v r, v o ) (16) where v s, v r, v o FFT(R m ) C m, s, r, o (17) Hence is a restricted case of ComplEx and ANALOGY. Intuition: Circular convolution can be converted into element-wise product after Fourier transform. nalogical Inference for Multi-Relational Embeddings 13 / 19

14 Experiments Implementation Details Use logistic loss: l(φ(s, r, o), y) = log σ(yφ(s, r, o)) (18) Optimization: Asynchronous AdaGrad (HogWild!) For each valid (s, r, o), generate negative examples (s, r, o), (s, r, o), (s, r, o ) by corrupting s, r, o. Evaluation Hits and Mean Reciprocal Rank (MRR) Benchmark datasets FreeBase-15K and WordNet-18. nalogical Inference for Multi-Relational Embeddings 14 / 19

15 Results (filt.) Models WN18 FB15K Unstructured RESCAL NTN SME SE LFM TransH TransE TransR TKRL 73.4 RTransE 76.2 TransD CTransR KG2E STransE DistMult TransSparse PTransE-MUL 77.7 PTransE-RNN 82.2 PTransE-ADD 84.6 NLF (+external data) ComplEx HolE Our ANALOGY Analogical Inference for Multi-Relational Embeddings 15 / 19

16 Results & MRR WN18 FB15K Models MRR (filt.) MRR (raw) (filt.) (filt.) MRR (filt.) MRR (raw) (filt.) (filt.) RESCAL TransE DistMult HolE ComplEx Our ANALOGY Analogical Inference for Multi-Relational Embeddings 16 / 19

17 Scalability The algorithm scales linearly over the embedding size FB15K WN Embedding size FB15K WN Number of threads Figure: CPU run time per epoch (secs) of ANALOGY. Intuition: O(m) for almost-diagonal matrices instead of O(m 2 ) for dense matrices. nalogical Inference for Multi-Relational Embeddings 17 / 19

18 Conclusion Contributions: A new framework that explicitly exploit analogy in a differentiable manner. Fast algorithm of linear scalability. Unified view of several representative works. Future work: Other applications where analogies might be useful (Machine Translation, Image Captioning, etc.). nalogical Inference for Multi-Relational Embeddings 18 / 19

19 Poster #51 Code: Thank You! nalogical Inference for Multi-Relational Embeddings 19 / 19

arxiv: v2 [cs.lg] 6 Jul 2017

arxiv: v2 [cs.lg] 6 Jul 2017 made of made of Analogical Inference for Multi-relational Embeddings Hanxiao Liu 1 Yuexin Wu 1 Yiming Yang 1 arxiv:1705.02426v2 [cs.lg] 6 Jul 2017 Abstract Large-scale multi-relational embedding refers