BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS. Pauli Miettinen TML September 2013

Size: px
Start display at page:

Download "BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS. Pauli Miettinen TML September 2013"

Transcription

1 BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML September 2013

2 BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Boolean decompositions are like normal decompositions, except that Input is a binary matrix or tensor Factors are binary Arithmetic is Boolean (so reconstructions are binary) Error measure is (usually) Hamming distance (L1)

3 BOOLEAN ARITHMETIC Idenpotent, anti-negative semi-ring ({0,1},, ) Like normal arithmetic, but addition is defined as 1+1 = 1 A Boolean matrix is a binary (0/1) matrix endowned with Boolean arithmetic The Boolean matrix product is defined as (A B) ij = R_ r=1 b il c lj

4 WHY BOOLEAN ARITHMETIC? Boolean decompositions find different type of structure than decompositions under normal arithmetic Not better, not worse, just different Normal decomposition: value is a sum of values from rank-1 components Boolean decomposition: value is 1 if there is any rank-1 component with 1 in this location

5 WHY BOOLEAN CONT D Boolean artihmetic can be interpret as set operations ( ) A B C A C 2 B Pauli Miettinen 24 September 2012

6 EXAMPLE Real analysis Discr. Contacts math. Programming Internet A B C ( ) ( ) ( ) =

7 RESULTS ON BOOLEAN MATRIX FACTORIZATION Computing the Boolean rank is NP-hard As hard to approximate as minimum chromatic number Minimum-error decomposition is NP-hard And hard to approximate in both additive and multiplicative sense Given A and B, finding C such that B C is close to A is hard even to approximate Alternating updates are hard!

8 SOME MORE RESULTS Boolean rank can be a logarithm of the real rank Sparse matrices have sparse (exact) factorizations The rank of the decomposition can be defined automatically using the MDL principle Planted rank-1 matrix can be recovered under XOR noise (under certain assumptions)

9 SOME ALGORITHMS Alternating least-squares Proposed in psychometrical litterature in early 1980 s Asso [M. et al & 2008] Builds candidate factors based on correlation matrix, and greedily selects them Panda [Lucchese et al. 2010] Expands monochromatic core patterns (tiles) based on MDL-esque rule Various tiling algorithms Do not allow expressing 0 in data as 1 in factorization (false positives) Binary factorizations Normal algebra but binary factors

10 SOME APPLICATIONS Explorative data analysis Psychometrics Role mining Pattern mining Bipartite community detection Binary matrix completion But requires {0, 1,?} data Co-clustering-y applications

11 RANK-1 (BOOLEAN) TENSORS c b X X = b a X = a 1 bb 2 c

12 THE BOOLEAN CP TENSOR DECOMPOSITION c 1 c 2 c R X b 1 _ b 2 _ _ b R a 1 a 2 a R R_ x ijk a ir b jr c kr r=1

13 THE BOOLEAN CP TENSOR DECOMPOSITION C X A B R_ x ijk a ir b jr c kr r=1

14 FREQUENT TRI-ITEMSET MINING Rank-1 N-way binary tensors define an N-way itemset Particularly, rank-1 binary matrices define an itemset In itemset mining the induced sub-tensor must be full of 1s Here, the items can have holes Boolean CP decomposition = lossy N-way tiling

15 BOOLEAN TENSOR RANK The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic. c 1 c 2 c R X = b 1 _ b 2 _ _ b R a 1 a 2 a R

16 SOME RESULTS ON RANKS Normal tensor rank is NPhard to compute Normal tensor rank of n-by-m-by-k tensor can be more than min{n, m, k} But no more than min{nm, nk, mk} Boolean tensor rank is NP-hard to compute Boolean tensor rank of n-by-m-by-k tensor can be more than min{n, m, k} But no more than min{nm, nk, mk}

17 SPARSITY Binary N-way tensor Xof Boolean tensor rank R has Boolean rank-r CP-decomposition with factor matrices A 1, A 2,, A N such that i A i N X Binary matrix X of Boolean rank R and X 1s has Boolean rank-r decomposition A o B such that A + B 2 X Both results are existential only and extend to approximate decompositions

18 SIMPLE ALGORITHM We can use typical alternating algorithm with Boolean algebra Finding the optimal projection is NP-hard even to approximate Good initial values are needed due to multiple local minima X (1) = A (C B) T X (2) = B (C A) T X (3) = C (B A) T Obtained using Boolean matrix factorization to matricizations

19 THE BOOLEAN TUCKER TENSOR DECOMPOSITION C B X A G P_ Q_ R_ x ijk g pqr a ip b jq c kr p=1 q=1 r=1

20 THE SIMPLE ALGORITHM WITH TUCKER The core tensor has global effects C Updates are hard X A G B Factors are not orthogonal Assume core tensor is small We can afford more time per element x ijk P_ p=1 Q_ q=1 R_ r=1 g pqr a ip b jq c kr In Boolean case many changes make no difference

21 WALK N MERGE: MORE SCALABLE ALGORITHM Idea: For exact decomposition, we could find all N-way tiles Then we only need to find the ones we need among them Problem: For approximate decompositions, there might not be any big tiles We need to find tiles with holes, i.e. dense rank-1 subtensors

22 TENSORS AS GRAPHS Create a graph from the tensor Each 1 in the tensor: one vertex in the graph Edge between two vertices if they differ in at most one coordinate Idea: If two vertices are in the same all-1s rank-1 N-way subtensor, they are at most N steps from each other Small-diameter subgraphs dense rank-1 subtensors

23 @ A EXAMPLE 1,1,1 1,1,2 1,2,1 1,2,2 1,4,1 1,4,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 3,1,2 3,3,1

24 RANDOM WALKS We can identify the small-diameter subgraphs by random walks If many (short) random walks re-visit the same nodes often, they re on a small-diameter subgraph Problem: The random walks might return many overlapping dense areas and miss the smallest rank-1 decompositions

25 MERGE We can exhaustively look for all small (e.g. 2-by-2-by-2) all-1s sub-tensors outside the already-found dense subtensors We can now merge all partially overlapping rank-1 subtensors if the resulting subtensor is dense enough Result: A Boolean CP-decomposition of some rank False positive rate controlled by the density, false negative by the exhaustive search

26 MDL STRIKES AGAIN We have a decomposition with some rank, but what would be a good rank? Normally: pre-defined by the user (but how does she know) MDL principle: The best model to describe your data is the one that does it with the least number of bits We can use MDL to choose the rank

27 HOW YOU COUNT THE BITS? MDL asks for an exact representation of the data In case of Boolean CP, we represent the tensor X with Factor matrices Error tensor E The bit-strings representing these are encoded to compute the description length

28 WHY MDL AND TUCKER DECOMPOSITION Balance between accuracy and complexity High rank: more bits in factor matrices, less in error tensor Small rank: less bits in factor matrices, more in error tensor If one mode uses the same factor multiple times, CP contains it multiple times The Tucker decomposition needs to have that factor only once

29 FROM CP TO TUCKER WITH MDL CP is Tucker with hyper-diagonal core tensor If we can remove a repeated column from a factor matrix and adjust the core accordingly, our encoding is more efficient Algorithm: Try mergin similar factors and see if that reduces the encoding length

30 APPLICATION: FACT DISCOVERY Input: noun phrase verbal phrase noun phrase triples Non-disambiguated E.g. from OpenIE Goal: Find the facts (entity relation entity triples) underlying the observed data and mappings from surface forms to entities and relations

31 CONNECTION TO BOOLEAN TENSORS We should see an np 1 vp np 2 triple if there exists at least one fact e 1 r e 2 such that np 1 is the surface form of e 1 vp is the surface form of r np 2 is the surface form of e 2

32 CONNECTION TO BOOLEAN TENSORS What we want is Boolean Tucker3 decomposition Core tensor contains the facts Factors contain the mappings from entities and relations to surface forms x ijk P_ p=1 Q_ q=1 R_ r=1 g pqr a ip b jq c kr

33 PROS & CONS Pros: Naturally sparse core tensor Core will be huge must be sparse Natural interpretation Cons: No levels of certainity Either is or not Can only handle binary data

34 EXAMPLE RESULT Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec 39,500-by-8,000-by-21,000 tensor with non-zeros

35 CONCLUSIONS Boolean factorizations are more combinatorial in their flavour Interpretations as sets or graphs Boolean matrix factorization is computationally harder than most normal factorizations With tensors the difference is not so big

36 FUTURE DIRECTIONS L Thank You! L When should one apply Boolean factorizations? More education is needed Better algorithms & implementations I ve been asking for this for 7 years now

BOOLEAN TENSOR FACTORIZATIONS. Pauli Miettinen 14 December 2011

BOOLEAN TENSOR FACTORIZATIONS. Pauli Miettinen 14 December 2011 BOOLEAN TENSOR FACTORIZATIONS Pauli Miettinen 14 December 2011 BACKGROUND: TENSORS AND TENSOR FACTORIZATIONS X BACKGROUND: TENSORS AND TENSOR FACTORIZATIONS C X A B BACKGROUND: BOOLEAN MATRIX FACTORIZATIONS

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Walk n Merge: A Scalable Algorithm for Boolean Tensor Factorization

Walk n Merge: A Scalable Algorithm for Boolean Tensor Factorization Walk n Merge: A Scalable Algorithm for Boolean Tensor Factorization Dóra Erdős Boston University Boston, MA, USA dora.erdos@bu.edu Pauli Miettinen Max-Planck-Institut für Informatik Saarbrücken, Germany

More information

Matrix Factorizations over Non-Conventional Algebras for Data Mining. Pauli Miettinen 28 April 2015

Matrix Factorizations over Non-Conventional Algebras for Data Mining. Pauli Miettinen 28 April 2015 Matrix Factorizations over Non-Conventional Algebras for Data Mining Pauli Miettinen 28 April 2015 Chapter 1. A Bit of Background Data long-haired well-known male Data long-haired well-known male ( ) 1

More information

Clustering Boolean Tensors

Clustering Boolean Tensors Data Mining and Knowledge Discovery manuscript No. (will be inserted by the editor) Clustering Boolean Tensors Saskia Metzler Pauli Miettinen the date of receipt and acceptance should be inserted later

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft

Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft Manuscript Number: Title: Summarizing transactional databases with overlapped hyperrectangles, theories and algorithms Article

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

1.6: Solutions 17. Solution to exercise 1.6 (p.13). 1.6: Solutions 17 A slightly more careful answer (short of explicit computation) goes as follows. Taking the approximation for ( N K) to the next order, we find: ( N N/2 ) 2 N 1 2πN/4. (1.40) This approximation

More information

Handling Noise in Boolean Matrix Factorization

Handling Noise in Boolean Matrix Factorization Handling Noise in Boolean Matrix Factorization Radim Belohlavek, Martin Trnecka DEPARTMENT OF COMPUTER SCIENCE PALACKÝ UNIVERSITY OLOMOUC 26th International Joint Conference on Artificial Intelligence

More information

Clustering Boolean Tensors

Clustering Boolean Tensors Clustering Boolean Tensors Saskia Metzler and Pauli Miettinen Max Planck Institut für Informatik Saarbrücken, Germany {saskia.metzler, pauli.miettinen}@mpi-inf.mpg.de Abstract. Tensor factorizations are

More information

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make

More information

Computability and Complexity Theory: An Introduction

Computability and Complexity Theory: An Introduction Computability and Complexity Theory: An Introduction meena@imsc.res.in http://www.imsc.res.in/ meena IMI-IISc, 20 July 2006 p. 1 Understanding Computation Kinds of questions we seek answers to: Is a given

More information

Fast and Scalable Distributed Boolean Tensor Factorization

Fast and Scalable Distributed Boolean Tensor Factorization Fast and Scalable Distributed Boolean Tensor Factorization Namyong Park Seoul National University Email: namyong.park@snu.ac.kr Sejoon Oh Seoul National University Email: ohhenrie@snu.ac.kr U Kang Seoul

More information

Quantum Algorithms for Finding Constant-sized Sub-hypergraphs

Quantum Algorithms for Finding Constant-sized Sub-hypergraphs Quantum Algorithms for Finding Constant-sized Sub-hypergraphs Seiichiro Tani (Joint work with François Le Gall and Harumichi Nishimura) NTT Communication Science Labs., NTT Corporation, Japan. The 20th

More information

Interesting Patterns. Jilles Vreeken. 15 May 2015

Interesting Patterns. Jilles Vreeken. 15 May 2015 Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness? what is a pattern? and how can we mine interesting patterns? What is a pattern? Data Pattern y = x - 1 What

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

On the Exponent of the All Pairs Shortest Path Problem

On the Exponent of the All Pairs Shortest Path Problem On the Exponent of the All Pairs Shortest Path Problem Noga Alon Department of Mathematics Sackler Faculty of Exact Sciences Tel Aviv University Zvi Galil Department of Computer Science Sackler Faculty

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information

Complexity (Pre Lecture)

Complexity (Pre Lecture) Complexity (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2018 Dantam (Mines CSCI-561) Complexity (Pre Lecture) Fall 2018 1 / 70 Why? What can we always compute efficiently? What

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Matrix factorization models for patterns beyond blocks. Pauli Miettinen 18 February 2016

Matrix factorization models for patterns beyond blocks. Pauli Miettinen 18 February 2016 Matrix factorization models for patterns beyond blocks 18 February 2016 What does a matrix factorization do?? A = U V T 2 For SVD that s easy! 3 Inner-product interpretation Element (AB) ij is the inner

More information

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser Value-Ordering and Discrepancies Maintaining Arc Consistency (MAC) Achieve (generalised) arc consistency (AC3, etc). If we have a domain wipeout, backtrack. If all domains have one value, we re done. Pick

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance. 9. Distance measures 9.1 Classical information measures How similar/close are two probability distributions? Trace distance Fidelity Example: Flipping two coins, one fair one biased Head Tail Trace distance

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

Nonnegative Matrix Factorization. Data Mining

Nonnegative Matrix Factorization. Data Mining The Nonnegative Matrix Factorization in Data Mining Amy Langville langvillea@cofc.edu Mathematics Department College of Charleston Charleston, SC Yahoo! Research 10/18/2005 Outline Part 1: Historical Developments

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

Siegel s Theorem, Edge Coloring, and a Holant Dichotomy

Siegel s Theorem, Edge Coloring, and a Holant Dichotomy Siegel s Theorem, Edge Coloring, and a Holant Dichotomy Jin-Yi Cai (University of Wisconsin-Madison) Joint with: Heng Guo and Tyson Williams (University of Wisconsin-Madison) 1 / 117 2 / 117 Theorem (Siegel

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

Summarizing Transactional Databases with Overlapped Hyperrectangles

Summarizing Transactional Databases with Overlapped Hyperrectangles Noname manuscript No. (will be inserted by the editor) Summarizing Transactional Databases with Overlapped Hyperrectangles Yang Xiang Ruoming Jin David Fuhry Feodor F. Dragan Abstract Transactional data

More information

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan From Matrix to Tensor: The Transition to Numerical Multilinear Algebra Lecture 4. Tensor-Related Singular Value Decompositions Charles F. Van Loan Cornell University The Gene Golub SIAM Summer School 2010

More information

Three right directions and three wrong directions for tensor research

Three right directions and three wrong directions for tensor research Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney

More information

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a Outlines TimeCrunch: Interpretable Dynamic Graph Summarization by Neil Shah et. al. (KDD 2015) From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics by Linyun

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Polynomial-time Reductions

Polynomial-time Reductions Polynomial-time Reductions Disclaimer: Many denitions in these slides should be taken as the intuitive meaning, as the precise meaning of some of the terms are hard to pin down without introducing the

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

CSE 20 Discrete Math. Algebraic Rules for Propositional Formulas. Summer, July 11 (Day 2) Number Systems/Computer Arithmetic Predicate Logic

CSE 20 Discrete Math. Algebraic Rules for Propositional Formulas. Summer, July 11 (Day 2) Number Systems/Computer Arithmetic Predicate Logic CSE 20 Discrete Math Algebraic Rules for Propositional Formulas Equivalences between propositional formulas (similar to algebraic equivalences): Associative Summer, 2006 July 11 (Day 2) Number Systems/Computer

More information

Stat 315c: Introduction

Stat 315c: Introduction Stat 315c: Introduction Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Stat 315c: Introduction 1 / 14 Stat 315c Analysis of Transposable Data Usual Statistics Setup there s Y (we ll

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM ALEX FINK 1. Introduction and background Consider the discrete conditional independence model M given by {X 1 X 2 X 3, X 1 X 3 X 2 }. The intersection axiom

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Lecture 6: Expander Codes

Lecture 6: Expander Codes CS369E: Expanders May 2 & 9, 2005 Lecturer: Prahladh Harsha Lecture 6: Expander Codes Scribe: Hovav Shacham In today s lecture, we will discuss the application of expander graphs to error-correcting codes.

More information

Introduction to Tensors. 8 May 2014

Introduction to Tensors. 8 May 2014 Introduction to Tensors 8 May 2014 Introduction to Tensors What is a tensor? Basic Operations CP Decompositions and Tensor Rank Matricization and Computing the CP Dear Tullio,! I admire the elegance of

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal?

CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal? CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal? The conjecture that P is different from NP made its way on to several lists of the most important unsolved problems in Mathematics (never

More information

Structural Evaluation of AES and Chosen-Key Distinguisher of 9-round AES-128

Structural Evaluation of AES and Chosen-Key Distinguisher of 9-round AES-128 Structural Evaluation of AES and Chosen-Key Distinguisher of 9-round AES-128 Pierre-Alain Fouque 1 Jérémy Jean 2 Thomas Peyrin 3 1 Université de Rennes 1, France 2 École Normale Supérieure, France 3 Nanyang

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

Functional Maps ( ) Dr. Emanuele Rodolà Room , Informatik IX

Functional Maps ( ) Dr. Emanuele Rodolà Room , Informatik IX Functional Maps (12.06.2014) Dr. Emanuele Rodolà rodola@in.tum.de Room 02.09.058, Informatik IX Seminar «LP relaxation for elastic shape matching» Fabian Stark Wednesday, June 18th 14:00 Room 02.09.023

More information

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review INFO 4300 / CS4300 Information Retrieval IR 9: Linear Algebra Review Paul Ginsparg Cornell University, Ithaca, NY 24 Sep 2009 1/ 23 Overview 1 Recap 2 Matrix basics 3 Matrix Decompositions 4 Discussion

More information

NP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar..

NP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar.. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar.. Complexity Class P NP-Complete Problems Abstract Problems. An abstract problem Q is a binary relation on sets I of input instances

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

LAGRANGE MULTIPLIERS

LAGRANGE MULTIPLIERS LAGRANGE MULTIPLIERS MATH 195, SECTION 59 (VIPUL NAIK) Corresponding material in the book: Section 14.8 What students should definitely get: The Lagrange multiplier condition (one constraint, two constraints

More information

Graph limits Graph convergence Approximate asymptotic properties of large graphs Extremal combinatorics/computer science : flag algebra method, proper

Graph limits Graph convergence Approximate asymptotic properties of large graphs Extremal combinatorics/computer science : flag algebra method, proper Jacob Cooper Dan Král Taísa Martins University of Warwick Monash University - Discrete Maths Research Group Graph limits Graph convergence Approximate asymptotic properties of large graphs Extremal combinatorics/computer

More information

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

A Sparse QS-Decomposition for Large Sparse Linear System of Equations A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department

More information

Lecture Notes Introduction to Cluster Algebra

Lecture Notes Introduction to Cluster Algebra Lecture Notes Introduction to Cluster Algebra Ivan C.H. Ip Update: May 16, 2017 5 Review of Root Systems In this section, let us have a brief introduction to root system and finite Lie type classification

More information

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Algorithms for sparse analysis Lecture I: Background on sparse approximation Algorithms for sparse analysis Lecture I: Background on sparse approximation Anna C. Gilbert Department of Mathematics University of Michigan Tutorial on sparse approximations and algorithms Compress data

More information

Shortest paths with negative lengths

Shortest paths with negative lengths Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no

More information

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle 8.5 Sequencing Problems Basic genres. Packing problems: SET-PACKING, INDEPENDENT SET. Covering problems: SET-COVER, VERTEX-COVER. Constraint satisfaction problems: SAT, 3-SAT. Sequencing problems: HAMILTONIAN-CYCLE,

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

P, NP, NP-Complete, and NPhard

P, NP, NP-Complete, and NPhard P, NP, NP-Complete, and NPhard Problems Zhenjiang Li 21/09/2011 Outline Algorithm time complicity P and NP problems NP-Complete and NP-Hard problems Algorithm time complicity Outline What is this course

More information

arxiv: v1 [cs.ds] 18 Mar 2011

arxiv: v1 [cs.ds] 18 Mar 2011 Incremental dimension reduction of tensors with random index Fredrik Sandin and Blerim Emruli EISLAB, Luleå University of Technology, 971 87 Luleå, Sweden Magnus Sahlgren Gavagai AB, Skånegatan 97, 116

More information

Advanced topic: Space complexity

Advanced topic: Space complexity Advanced topic: Space complexity CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016 1/28 Review: time complexity We have looked at how long it takes to

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently

More information

My favorite application using eigenvalues: Eigenvalues and the Graham-Pollak Theorem

My favorite application using eigenvalues: Eigenvalues and the Graham-Pollak Theorem My favorite application using eigenvalues: Eigenvalues and the Graham-Pollak Theorem Michael Tait Winter 2013 Abstract The famous Graham-Pollak Theorem states that one needs at least n 1 complete bipartite

More information

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments François Le Gall The University of Tokyo Technical version available at arxiv:1407.0085 [quant-ph]. Background. Triangle finding

More information

The Complexity of Change

The Complexity of Change The Complexity of Change JAN VAN DEN HEUVEL UQ, Brisbane, 26 July 2016 Department of Mathematics London School of Economics and Political Science A classical puzzle: the 15-Puzzle 13 2 3 12 1 2 3 4 9 11

More information

R ij = 2. Using all of these facts together, you can solve problem number 9.

R ij = 2. Using all of these facts together, you can solve problem number 9. Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the

More information

HOMEWORK #2 - MATH 3260

HOMEWORK #2 - MATH 3260 HOMEWORK # - MATH 36 ASSIGNED: JANUARAY 3, 3 DUE: FEBRUARY 1, AT :3PM 1) a) Give by listing the sequence of vertices 4 Hamiltonian cycles in K 9 no two of which have an edge in common. Solution: Here is

More information

Antonina Kolokolova Memorial University of Newfoundland

Antonina Kolokolova Memorial University of Newfoundland 6 0 1 5 2 4 3 Antonina Kolokolova Memorial University of Newfoundland Understanding limits of proof techniques: Diagonalization Algebrization (limits of arithmetization technique) Natural proofs Power

More information

Finding a Heaviest Triangle is not Harder than Matrix Multiplication

Finding a Heaviest Triangle is not Harder than Matrix Multiplication Finding a Heaviest Triangle is not Harder than Matrix Multiplication Artur Czumaj Department of Computer Science New Jersey Institute of Technology aczumaj@acm.org Andrzej Lingas Department of Computer

More information

Nonnegative Matrices I

Nonnegative Matrices I Nonnegative Matrices I Daisuke Oyama Topics in Economic Theory September 26, 2017 References J. L. Stuart, Digraphs and Matrices, in Handbook of Linear Algebra, Chapter 29, 2006. R. A. Brualdi and H. J.

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Quantum Geometric Algebra

Quantum Geometric Algebra ANPA 22: Quantum Geometric Algebra Quantum Geometric Algebra ANPA Conference Cambridge, UK by Dr. Douglas J. Matzke matzke@ieee.org Aug 15-18, 22 8/15/22 DJM ANPA 22: Quantum Geometric Algebra Abstract

More information

Sparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University

Sparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University Sparsity of Matrix Canonical Forms Xingzhi Zhan zhan@math.ecnu.edu.cn East China Normal University I. Extremal sparsity of the companion matrix of a polynomial Joint work with Chao Ma The companion matrix

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Chapter 5-2: Clustering

Chapter 5-2: Clustering Chapter 5-2: Clustering Jilles Vreeken Revision 1, November 20 th typo s fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point x as a member of its own ε-neighborhood 12 Nov 2015

More information

SVD, PCA & Preprocessing

SVD, PCA & Preprocessing Chapter 1 SVD, PCA & Preprocessing Part 2: Pre-processing and selecting the rank Pre-processing Skillicorn chapter 3.1 2 Why pre-process? Consider matrix of weather data Monthly temperatures in degrees

More information

Dichotomy Theorems for Counting Problems. Jin-Yi Cai University of Wisconsin, Madison. Xi Chen Columbia University. Pinyan Lu Microsoft Research Asia

Dichotomy Theorems for Counting Problems. Jin-Yi Cai University of Wisconsin, Madison. Xi Chen Columbia University. Pinyan Lu Microsoft Research Asia Dichotomy Theorems for Counting Problems Jin-Yi Cai University of Wisconsin, Madison Xi Chen Columbia University Pinyan Lu Microsoft Research Asia 1 Counting Problems Valiant defined the class #P, and

More information

Limitations of Algorithm Power

Limitations of Algorithm Power Limitations of Algorithm Power Objectives We now move into the third and final major theme for this course. 1. Tools for analyzing algorithms. 2. Design strategies for designing algorithms. 3. Identifying

More information

The P versus NP Problem. Ker-I Ko. Stony Brook, New York

The P versus NP Problem. Ker-I Ko. Stony Brook, New York The P versus NP Problem Ker-I Ko Stony Brook, New York ? P = NP One of the seven Millenium Problems The youngest one A folklore question? Has hundreds of equivalent forms Informal Definitions P : Computational

More information

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University Algorithms NP -Complete Problems Dong Kyue Kim Hanyang University dqkim@hanyang.ac.kr The Class P Definition 13.2 Polynomially bounded An algorithm is said to be polynomially bounded if its worst-case

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information