BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS. Pauli Miettinen TML September 2013

Similar documents
BOOLEAN TENSOR FACTORIZATIONS. Pauli Miettinen 14 December 2011

Data Mining and Matrices

Walk n Merge: A Scalable Algorithm for Boolean Tensor Factorization

Matrix Factorizations over Non-Conventional Algebras for Data Mining. Pauli Miettinen 28 April 2015

Clustering Boolean Tensors

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft

Data Mining and Matrices

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

Handling Noise in Boolean Matrix Factorization

Clustering Boolean Tensors

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s

Computability and Complexity Theory: An Introduction

Fast and Scalable Distributed Boolean Tensor Factorization

Quantum Algorithms for Finding Constant-sized Sub-hypergraphs

Interesting Patterns. Jilles Vreeken. 15 May 2015

Latent Semantic Analysis. Hongning Wang

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Information Retrieval

Data Mining and Analysis: Fundamental Concepts and Algorithms

On the Exponent of the All Pairs Shortest Path Problem

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Matrices, Vector Spaces, and Information Retrieval

Complexity (Pre Lecture)

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Matrix factorization models for patterns beyond blocks. Pauli Miettinen 18 February 2016

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser

CS60021: Scalable Data Mining. Dimensionality Reduction

Preliminaries and Complexity Theory

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

From Non-Negative Matrix Factorization to Deep Learning

Nonnegative Matrix Factorization. Data Mining

Rank Determination for Low-Rank Data Completion

Siegel s Theorem, Edge Coloring, and a Holant Dichotomy

1 Matrix notation and preliminaries from spectral graph theory

Lecture 14 - P v.s. NP 1

Summarizing Transactional Databases with Overlapped Hyperrectangles

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

Three right directions and three wrong directions for tensor research

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a

Lecture 13: Spectral Graph Theory

Polynomial-time Reductions

Non-convex Robust PCA: Provable Bounds

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

Correlation Preserving Unsupervised Discretization. Outline

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CSE 20 Discrete Math. Algebraic Rules for Propositional Formulas. Summer, July 11 (Day 2) Number Systems/Computer Arithmetic Predicate Logic

Stat 315c: Introduction

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Lecture 5: Web Searching using the SVD

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Latent Semantic Analysis. Hongning Wang

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

1 Searching the World Wide Web

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Lecture 6: Expander Codes

Introduction to Tensors. 8 May 2014

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal?

Structural Evaluation of AES and Chosen-Key Distinguisher of 9-round AES-128

Link Analysis Ranking

9 Searching the Internet with the SVD

13 Searching the Web with the SVD

Functional Maps ( ) Dr. Emanuele Rodolà Room , Informatik IX

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review

NP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar..

Descriptive Data Summarization

LAGRANGE MULTIPLIERS

Graph limits Graph convergence Approximate asymptotic properties of large graphs Extremal combinatorics/computer science : flag algebra method, proper

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

Lecture Notes Introduction to Cluster Algebra

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Shortest paths with negative lengths

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

P, NP, NP-Complete, and NPhard

arxiv: v1 [cs.ds] 18 Mar 2011

Advanced topic: Space complexity

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

My favorite application using eigenvalues: Eigenvalues and the Graham-Pollak Theorem

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments

The Complexity of Change

R ij = 2. Using all of these facts together, you can solve problem number 9.

HOMEWORK #2 - MATH 3260

Antonina Kolokolova Memorial University of Newfoundland

Finding a Heaviest Triangle is not Harder than Matrix Multiplication

Nonnegative Matrices I

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Quantum Geometric Algebra

Sparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University

Spectral Generative Models for Graphs

Chapter 5-2: Clustering

SVD, PCA & Preprocessing

Dichotomy Theorems for Counting Problems. Jin-Yi Cai University of Wisconsin, Madison. Xi Chen Columbia University. Pinyan Lu Microsoft Research Asia

Limitations of Algorithm Power

The P versus NP Problem. Ker-I Ko. Stony Brook, New York

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Combining geometry and combinatorics

Transcription:

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Boolean decompositions are like normal decompositions, except that Input is a binary matrix or tensor Factors are binary Arithmetic is Boolean (so reconstructions are binary) Error measure is (usually) Hamming distance (L1)

BOOLEAN ARITHMETIC Idenpotent, anti-negative semi-ring ({0,1},, ) Like normal arithmetic, but addition is defined as 1+1 = 1 A Boolean matrix is a binary (0/1) matrix endowned with Boolean arithmetic The Boolean matrix product is defined as (A B) ij = R_ r=1 b il c lj

WHY BOOLEAN ARITHMETIC? Boolean decompositions find different type of structure than decompositions under normal arithmetic Not better, not worse, just different Normal decomposition: value is a sum of values from rank-1 components Boolean decomposition: value is 1 if there is any rank-1 component with 1 in this location

WHY BOOLEAN CONT D Boolean artihmetic can be interpret as set operations 1 2 3 ( ) A B C 1 1 0 1 1 1 0 1 1 1 3 A C 2 B Pauli Miettinen 24 September 2012

EXAMPLE Real E-mail analysis Discr. Contacts math. Programming Internet A B C ( ) 1 1 0 1 1 1 0 1 1 ( ) ( ) 1 0 1 1 0 = 1 1 0 1 1 0 1

RESULTS ON BOOLEAN MATRIX FACTORIZATION Computing the Boolean rank is NP-hard As hard to approximate as minimum chromatic number Minimum-error decomposition is NP-hard And hard to approximate in both additive and multiplicative sense Given A and B, finding C such that B C is close to A is hard even to approximate Alternating updates are hard!

SOME MORE RESULTS Boolean rank can be a logarithm of the real rank Sparse matrices have sparse (exact) factorizations The rank of the decomposition can be defined automatically using the MDL principle Planted rank-1 matrix can be recovered under XOR noise (under certain assumptions)

SOME ALGORITHMS Alternating least-squares Proposed in psychometrical litterature in early 1980 s Asso [M. et al. 2006 & 2008] Builds candidate factors based on correlation matrix, and greedily selects them Panda [Lucchese et al. 2010] Expands monochromatic core patterns (tiles) based on MDL-esque rule Various tiling algorithms Do not allow expressing 0 in data as 1 in factorization (false positives) Binary factorizations Normal algebra but binary factors

SOME APPLICATIONS Explorative data analysis Psychometrics Role mining Pattern mining Bipartite community detection Binary matrix completion But requires {0, 1,?} data Co-clustering-y applications

RANK-1 (BOOLEAN) TENSORS c b X X = b a X = a 1 bb 2 c

THE BOOLEAN CP TENSOR DECOMPOSITION c 1 c 2 c R X b 1 _ b 2 _ _ b R a 1 a 2 a R R_ x ijk a ir b jr c kr r=1

THE BOOLEAN CP TENSOR DECOMPOSITION C X A B R_ x ijk a ir b jr c kr r=1

FREQUENT TRI-ITEMSET MINING Rank-1 N-way binary tensors define an N-way itemset Particularly, rank-1 binary matrices define an itemset In itemset mining the induced sub-tensor must be full of 1s Here, the items can have holes Boolean CP decomposition = lossy N-way tiling

BOOLEAN TENSOR RANK The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic. c 1 c 2 c R X = b 1 _ b 2 _ _ b R a 1 a 2 a R

SOME RESULTS ON RANKS Normal tensor rank is NPhard to compute Normal tensor rank of n-by-m-by-k tensor can be more than min{n, m, k} But no more than min{nm, nk, mk} Boolean tensor rank is NP-hard to compute Boolean tensor rank of n-by-m-by-k tensor can be more than min{n, m, k} But no more than min{nm, nk, mk}

SPARSITY Binary N-way tensor Xof Boolean tensor rank R has Boolean rank-r CP-decomposition with factor matrices A 1, A 2,, A N such that i A i N X Binary matrix X of Boolean rank R and X 1s has Boolean rank-r decomposition A o B such that A + B 2 X Both results are existential only and extend to approximate decompositions

SIMPLE ALGORITHM We can use typical alternating algorithm with Boolean algebra Finding the optimal projection is NP-hard even to approximate Good initial values are needed due to multiple local minima X (1) = A (C B) T X (2) = B (C A) T X (3) = C (B A) T Obtained using Boolean matrix factorization to matricizations

THE BOOLEAN TUCKER TENSOR DECOMPOSITION C B X A G P_ Q_ R_ x ijk g pqr a ip b jq c kr p=1 q=1 r=1

THE SIMPLE ALGORITHM WITH TUCKER The core tensor has global effects C Updates are hard X A G B Factors are not orthogonal Assume core tensor is small We can afford more time per element x ijk P_ p=1 Q_ q=1 R_ r=1 g pqr a ip b jq c kr In Boolean case many changes make no difference

WALK N MERGE: MORE SCALABLE ALGORITHM Idea: For exact decomposition, we could find all N-way tiles Then we only need to find the ones we need among them Problem: For approximate decompositions, there might not be any big tiles We need to find tiles with holes, i.e. dense rank-1 subtensors

TENSORS AS GRAPHS Create a graph from the tensor Each 1 in the tensor: one vertex in the graph Edge between two vertices if they differ in at most one coordinate Idea: If two vertices are in the same all-1s rank-1 N-way subtensor, they are at most N steps from each other Small-diameter subgraphs dense rank-1 subtensors

0 1 0 1 1 0 1 1@ 11 10 11 0A @ 1 11 00 00A0 0 0 1 0 EXAMPLE 1,1,1 1,1,2 1,2,1 1,2,2 1,4,1 1,4,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 3,1,2 3,3,1

RANDOM WALKS We can identify the small-diameter subgraphs by random walks If many (short) random walks re-visit the same nodes often, they re on a small-diameter subgraph Problem: The random walks might return many overlapping dense areas and miss the smallest rank-1 decompositions

MERGE We can exhaustively look for all small (e.g. 2-by-2-by-2) all-1s sub-tensors outside the already-found dense subtensors We can now merge all partially overlapping rank-1 subtensors if the resulting subtensor is dense enough Result: A Boolean CP-decomposition of some rank False positive rate controlled by the density, false negative by the exhaustive search

MDL STRIKES AGAIN We have a decomposition with some rank, but what would be a good rank? Normally: pre-defined by the user (but how does she know) MDL principle: The best model to describe your data is the one that does it with the least number of bits We can use MDL to choose the rank

HOW YOU COUNT THE BITS? MDL asks for an exact representation of the data In case of Boolean CP, we represent the tensor X with Factor matrices Error tensor E The bit-strings representing these are encoded to compute the description length

WHY MDL AND TUCKER DECOMPOSITION Balance between accuracy and complexity High rank: more bits in factor matrices, less in error tensor Small rank: less bits in factor matrices, more in error tensor If one mode uses the same factor multiple times, CP contains it multiple times The Tucker decomposition needs to have that factor only once

FROM CP TO TUCKER WITH MDL CP is Tucker with hyper-diagonal core tensor If we can remove a repeated column from a factor matrix and adjust the core accordingly, our encoding is more efficient Algorithm: Try mergin similar factors and see if that reduces the encoding length

APPLICATION: FACT DISCOVERY Input: noun phrase verbal phrase noun phrase triples Non-disambiguated E.g. from OpenIE Goal: Find the facts (entity relation entity triples) underlying the observed data and mappings from surface forms to entities and relations

CONNECTION TO BOOLEAN TENSORS We should see an np 1 vp np 2 triple if there exists at least one fact e 1 r e 2 such that np 1 is the surface form of e 1 vp is the surface form of r np 2 is the surface form of e 2

CONNECTION TO BOOLEAN TENSORS What we want is Boolean Tucker3 decomposition Core tensor contains the facts Factors contain the mappings from entities and relations to surface forms x ijk P_ p=1 Q_ q=1 R_ r=1 g pqr a ip b jq c kr

PROS & CONS Pros: Naturally sparse core tensor Core will be huge must be sparse Natural interpretation Cons: No levels of certainity Either is or not Can only handle binary data

EXAMPLE RESULT Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec 39,500-by-8,000-by-21,000 tensor with 804 000non-zeros

CONCLUSIONS Boolean factorizations are more combinatorial in their flavour Interpretations as sets or graphs Boolean matrix factorization is computationally harder than most normal factorizations With tensors the difference is not so big

FUTURE DIRECTIONS L Thank You! L When should one apply Boolean factorizations? More education is needed Better algorithms & implementations I ve been asking for this for 7 years now