Similarity Search. The Binary Branch Distance. Nikolaus Augsten.

Similar documents
Outline. Binary Tree

, each of which is a tree, and whose roots r 1. , respectively, are children of r. Data Structures & File Management

CSC Design and Analysis of Algorithms. Example: Change-Making Problem

Math 61 : Discrete Structures Final Exam Instructor: Ciprian Manolescu. You have 180 minutes.

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

Why the Junction Tree Algorithm? The Junction Tree Algorithm. Clique Potential Representation. Overview. Chris Williams 1.

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

Paths. Connectivity. Euler and Hamilton Paths. Planar graphs.

QUESTIONS BEGIN HERE!

CSE 373: AVL trees. Warmup: Warmup. Interlude: Exploring the balance invariant. AVL Trees: Invariants. AVL tree invariants review

An undirected graph G = (V, E) V a set of vertices E a set of unordered edges (v,w) where v, w in V

Module graph.py. 1 Introduction. 2 Graph basics. 3 Module graph.py. 3.1 Objects. CS 231 Naomi Nishimura

Graph Isomorphism. Graphs - II. Cayley s Formula. Planar Graphs. Outline. Is K 5 planar? The number of labeled trees on n nodes is n n-2

CSE 373. Graphs 1: Concepts, Depth/Breadth-First Search reading: Weiss Ch. 9. slides created by Marty Stepp

CSE 373: More on graphs; DFS and BFS. Michael Lee Wednesday, Feb 14, 2018

Algorithmic and NP-Completeness Aspects of a Total Lict Domination Number of a Graph

CS 461, Lecture 17. Today s Outline. Example Run

CS200: Graphs. Graphs. Directed Graphs. Graphs/Networks Around Us. What can this represent? Sometimes we want to represent directionality:

Planar Upward Drawings

COMPLEXITY OF COUNTING PLANAR TILINGS BY TWO BARS

CS61B Lecture #33. Administrivia: Autograder will run this evening. Today s Readings: Graph Structures: DSIJ, Chapter 12

Outline. 1 Introduction. 2 Min-Cost Spanning Trees. 4 Example

Exam 1 Solution. CS 542 Advanced Data Structures and Algorithms 2/14/2013

A Simple Code Generator. Code generation Algorithm. Register and Address Descriptors. Example 3/31/2008. Code Generation

CS September 2018

QUESTIONS BEGIN HERE!

12/3/12. Outline. Part 10. Graphs. Circuits. Euler paths/circuits. Euler s bridge problem (Bridges of Konigsberg Problem)

5/9/13. Part 10. Graphs. Outline. Circuits. Introduction Terminology Implementing Graphs

The Plan. Honey, I Shrunk the Data. Why Compress. Data Compression Concepts. Braille Example. Braille. x y xˆ

Section 10.4 Connectivity (up to paths and isomorphism, not including)

Weighted graphs -- reminder. Data Structures LECTURE 15. Shortest paths algorithms. Example: weighted graph. Two basic properties of shortest paths

Graphs. CSC 1300 Discrete Structures Villanova University. Villanova CSC Dr Papalaskari

Problem solving by search

b. How many ternary words of length 23 with eight 0 s, nine 1 s and six 2 s?

Constructive Geometric Constraint Solving

(a) v 1. v a. v i. v s. (b)

Outline. Computer Science 331. Computation of Min-Cost Spanning Trees. Costs of Spanning Trees in Weighted Graphs

CS 241 Analysis of Algorithms

(2) If we multiplied a row of B by λ, then the value is also multiplied by λ(here lambda could be 0). namely

A 4-state solution to the Firing Squad Synchronization Problem based on hybrid rule 60 and 102 cellular automata

12. Traffic engineering

More Foundations. Undirected Graphs. Degree. A Theorem. Graphs, Products, & Relations

Graphs. Graphs. Graphs: Basic Terminology. Directed Graphs. Dr Papalaskari 1

CSI35 Chapter 11 Review

0.1. Exercise 1: the distances between four points in a graph

Solutions for HW11. Exercise 34. (a) Use the recurrence relation t(g) = t(g e) + t(g/e) to count the number of spanning trees of v 1

IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. TK, NO. TK, MONTHTK YEARTK 1. Hamiltonian Walks of Phylogenetic Treespaces

Garnir Polynomial and their Properties

Using the Printable Sticker Function. Using the Edit Screen. Computer. Tablet. ScanNCutCanvas

learning objectives learn what graphs are in mathematical terms learn how to represent graphs in computers learn about typical graph algorithms

Page 1. Question 19.1b Electric Charge II Question 19.2a Conductors I. ConcepTest Clicker Questions Chapter 19. Physics, 4 th Edition James S.

UNCORRECTED SAMPLE PAGES 4-1. Naming fractions KEY IDEAS. 1 Each shape represents ONE whole. a i ii. b i ii

Present state Next state Q + M N

Discovering Pairwise Compatibility Graphs

Cycles and Simple Cycles. Paths and Simple Paths. Trees. Problem: There is No Completely Standard Terminology!

CS553 Lecture Register Allocation I 3

Computational Biology, Phylogenetic Trees. Consensus methods

LEO VAN IERSEL TU DELFT

Outline. Circuits. Euler paths/circuits 4/25/12. Part 10. Graphs. Euler s bridge problem (Bridges of Konigsberg Problem)

SOLVED EXAMPLES. be the foci of an ellipse with eccentricity e. For any point P on the ellipse, prove that. tan

ECE 407 Computer Aided Design for Electronic Systems. Circuit Modeling and Basic Graph Concepts/Algorithms. Instructor: Maria K. Michael.

Seven-Segment Display Driver

TURFGRASS DISEASE RESEARCH REPORT J. M. Vargas, Jr. and R. Detweiler Department of Botany and Plant Pathology Michigan State University

arxiv: v1 [cs.ds] 20 Feb 2008

( ) { } [ ] { } [ ) { } ( ] { }

16.unified Introduction to Computers and Programming. SOLUTIONS to Examination 4/30/04 9:05am - 10:00am

MULTIPLE-LEVEL LOGIC OPTIMIZATION II

A Low Noise and Reliable CMOS I/O Buffer for Mixed Low Voltage Applications

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 23, April 5, 2016

Construction 11: Book I, Proposition 42

Linear Algebra Existence of the determinant. Expansion according to a row.

1 Introduction to Modulo 7 Arithmetic

ECE COMBINATIONAL BUILDING BLOCKS - INVEST 13 DECODERS AND ENCODERS

The University of Sydney MATH2969/2069. Graph Theory Tutorial 5 (Week 12) Solutions 2008

5/7/13. Part 10. Graphs. Theorem Theorem Graphs Describing Precedence. Outline. Theorem 10-1: The Handshaking Theorem

Designing A Concrete Arch Bridge

Partitioning Algorithms. UCLA Department of Computer Science, Los Angeles, CA y Cadence Design Systems, Inc., San Jose, CA 95134

KENDRIYA VIDYALAYA IIT KANPUR HOME ASSIGNMENTS FOR SUMMER VACATIONS CLASS - XII MATHEMATICS (Relations and Functions & Binary Operations)

INTEGRALS. Chapter 7. d dx. 7.1 Overview Let d dx F (x) = f (x). Then, we write f ( x)

RAM Model. I/O Model. Real Machine Example: Nehalem : Algorithms in the Real World 4/9/13

FSA. CmSc 365 Theory of Computation. Finite State Automata and Regular Expressions (Chapter 2, Section 2.3) ALPHABET operations: U, concatenation, *

S i m p l i f y i n g A l g e b r a SIMPLIFYING ALGEBRA.

Announcements. Not graphs. These are Graphs. Applications of Graphs. Graph Definitions. Graphs & Graph Algorithms. A6 released today: Risk

Kernels. ffl A kernel K is a function of two objects, for example, two sentence/tree pairs (x1; y1) and (x2; y2)

Formal Concept Analysis

Compression. Compression. Compression. This part of the course... Ifi, UiO Norsk Regnesentral Vårsemester 2005 Wolfgang Leister

DEVELOPING COMPUTER PROGRAM FOR COMPUTING EIGENPAIRS OF 2 2 MATRICES AND 3 3 UPPER TRIANGULAR MATRICES USING THE SIMPLE ALGORITHM

Aquauno Video 6 Plus Page 1

( ) Geometric Operations and Morphing. Geometric Transformation. Forward v.s. Inverse Mapping. I (x,y ) Image Processing - Lesson 4 IDC-CG 1

C-201 Sheet Bar Measures 1 inch

COMP108 Algorithmic Foundations

Numbering Boundary Nodes

Decimals DECIMALS.

Clustering for Processing Rate Optimization

First order differential equation Linear equation; Method of integrating factors

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

a b v a v b v c v = a d + bd +c d +ae r = p + a 0 s = r + b 0 4 ac + ad + bc + bd + e 5 = a + b = q 0 c + qc 0 + qc (a) s v (b)

Graph-Based Workflow Recommendation: On Improving Business Process Modeling

a b c cat CAT A B C Aa Bb Cc cat cat Lesson 1 (Part 1) Verbal lesson: Capital Letters Make The Same Sound Lesson 1 (Part 1) continued...

Survey and Taxonomy of IP Address Lookup Algorithms

Transcription:

Similrity Srh Th Binry Brnh Distn Nikolus Augstn nikolus.ugstn@sg..t Dpt. of Computr Sins Univrsity of Slzurg http://rsrh.uni-slzurg.t Vrsion Jnury 11, 2017 Wintrsmstr 2016/2017 Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 1 / 28

Outlin 1 Binry Brnh Distn Binry Rprsnttion of Tr Binry Brnhs Lowr Boun for th Eit Distn Complxity Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 2 / 28

Outlin Binry Brnh Distn Binry Rprsnttion of Tr 1 Binry Brnh Distn Binry Rprsnttion of Tr Binry Brnhs Lowr Boun for th Eit Distn Complxity Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 3 / 28

Binry Tr Binry Brnh Distn Binry Rprsnttion of Tr In inry tr h no hs t most two hilrn; lft hil n right hil r istinguish: no n hv right hil without hving lft hil; Nottion: T B = (N, E l, E r ) T B nots inry tr N r th nos of th inry tr E l n E r r th gs to th lft n right hilrn, rsptivly Full inry tr: inry tr h no hs xtly zro or two hilrn. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 4 / 28

Exmpl: Binry Tr Binry Brnh Distn Binry Rprsnttion of Tr Two iffrnt inry trs: T B = (N, E l, E r ) T B1 = ({,,,,, f, g}, {(, ), (, ), (, ), (, f )}, {(, ), (, g)}) T B2 = ({,,,,, f, g}, {(, ), (, ), (, f )}, {(, ), (, ), (, g)}) T B1 T B2 f g f g A full inry tr: h i f g Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 5 / 28

Binry Brnh Distn Binry Rprsnttion of Tr Binry Rprsnttion of Tr Binry tr trnsformtion: (i) link ll nighoring silings in tr with gs (ii) lt ll prnt-hil gs xpt th g to th first hil Trnsformtion mintins ll informtion strutur informtion Originl tr n ronstrut from th inry tr: lft g rprsnts prnt-hil rltionships in th originl tr right gs rprsnts right-siling rltionship in th originl tr Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 6 / 28

Binry Brnh Distn Binry Rprsnttion of Tr Exmpl: Binry Tr Trnsformtion Rprsnt tr T s inry tr: T inry rprsnttion of T Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 7 / 28

Binry Brnh Distn Binry Rprsnttion of Tr Normliz Binry Tr Rprsnttion W xtn th inry tr with null nos s follows: null no for h missing lft hil of non-null no null no for h missing right hil of non-null no Not: Lf nos gt two null-hilrn. Th rsulting normliz inry rprsnttion is full inry tr ll non-null nos hv two hilrn ll lvs r null-nos (n ll null-nos r lvs) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 8 / 28

Binry Brnh Distn Binry Rprsnttion of Tr Exmpl: Normliz Binry Tr Trnsforming T to th normliz inry tr B(T): T B(T) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 9 / 28

Outlin Binry Brnh Distn Binry Brnhs 1 Binry Brnh Distn Binry Rprsnttion of Tr Binry Brnhs Lowr Boun for th Eit Distn Complxity Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 10 / 28

Binry Brnh Binry Brnh Distn Binry Brnhs A inry rnh BiB(v) is sutr of th normliz inry tr B(T) onsisting of non-null no v n its two hilrn Exmpl: BiB() = ({,, }, {(, )}, {(, )}) BiB() = ({, 1, 2 }, {(, 1 )}, {(, 2 )}) 1 1 2 1 Although th two null nos hv intil lls (), thy r iffrnt nos. W mphsiz this y showing thir IDs in susript. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 11 / 28

Binry Brnh Distn Binry Brnhs Binry Brnhs of Trs n Dtsts Binry rnhs n sriliz s strings: BiB(v) = ({v,, }, {(v, )}, {(v, )}) λ(v) λ() λ() w n sort ths strings ( > λ(v) for ll non-null nos v) Binry rnh sts: BiB(T) is th st of ll inry rnhs of B(T) BiB(S) = T S BiB(T) is th st of ll inry rnhs of tst S BiB sort (S) is th vtor of sort sriliz strings of BiB(S) Not: nos r uniqu in th tr, thus inry rnhs r uniqu lls r not uniqu, thus th sriliz inry rnhs r not uniqu Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 12 / 28

Binry Brnh Distn Binry Brnhs Exmpl: Binry Brnhs of Trs n Dtsts 1 3 T 1 T 2 4 6 BiB( 1 ) BiB( 4 ): BiB( 1 ) = ({ 1, 2, 3 }, {( 1, 2 )}, {( 1, 3 )}) BiB( 4 ) = ({ 4, 5, 6 }, {( 4, 5 )}, {( 4, 6 )}) Sriliztion of oth, BiB( 1 ) n BiB( 2 ), is intil: Sort vtor of sriliz strings of BiB(S), whr S = {T 1, T 2 }: BiB sort (S) = (,,,,,,,,, ) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 13 / 28

Binry Brnh Vtor Binry Brnh Distn Binry Brnhs Th inry rnh vtor BBV (T) is rprsnttion of th inry rnh st BiB(T) Constrution of th inry rnh vtor BBV (T): omput BiB sort (S) (sriliz n sort BiB(S)) i is th i-th sriliz inry rnh in sort orr ( i = BiB sort (S)[i]) BBV (T)[i]) is th numr of inry rnhs in B(T) tht sriliz to i Not: BBV (T)[i] is zro if i os not ppr in BiB(T) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 14 / 28

Binry Brnh Distn Exmpl: Binry Brnh Vtors Binry Brnhs T 1 T2 S = {T 1, T 2 } is th t st BiB sort (S) is th vtor of sort sriliz strings of BiB(S) BBV (T i ) is th inry rnh vtor of T i th vtor of sriliz strings n th inry rnh vtors r: BiB sort (S) BBV (T 1 ) BBV (T 2 ) 1 1 0 1 0 2 0 0 2 1 1 0 1 0 1 2 1 1 0 2 Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 15 / 28

Outlin Binry Brnh Distn Lowr Boun for th Eit Distn 1 Binry Brnh Distn Binry Rprsnttion of Tr Binry Brnhs Lowr Boun for th Eit Distn Complxity Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 16 / 28

Binry Brnh Distn Binry Brnh Distn [YKT05] Lowr Boun for th Eit Distn Dfinition (Binry Brnh Distn) Lt BBV (T) = ( 1,..., k ) n BBV (T ) = ( 1,..., k ) inry rnh vtors of trs T n T, rsptivly. Th inry rnh istn of T n T is k δ B (T, T ) = i i. i=1 Intuition: W ount th inry rnhs tht o not mth twn th two trs. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 17 / 28

Binry Brnh Distn Exmpl: Binry Brnh Distn Lowr Boun for th Eit Distn W omput th inry rnh istn twn T 1 n T 2 : T 1 T 2 Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 18 / 28

Binry Brnh Distn Lowr Boun for th Eit Distn Exmpl: Binry Brnh Distn Th normliz inry tr rprsnttions r: B (T 1 ) B (T 2 ) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 19 / 28

Binry Brnh Distn Exmpl: Binry Brnh Distn Lowr Boun for th Eit Distn Th inry rnh vtors of T 1 n T 2 r: BiB sort (S) BBV (T 1 ) BBV (T 2 ) Th inry rnh istn is 1 1 0 1 0 2 0 0 2 1 1 0 1 0 1 2 1 1 0 2 δ B (T 1, T 2 ) = 10 i=1 1,i 2,i = 1 1 + 1 0 + 0 1 + 1 0 + 0 1 + 2 2 + 0 1 + 0 1 + 2 0 + 1 2 = 9, whr 1,i n 2,i r th i-th imnsion of th vtors BBV (T 1 ) n BBV (T 2 ), rsptivly. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 20 / 28

Binry Brnh Distn Lowr Boun Thorm Lowr Boun for th Eit Distn Thorm (Lowr Boun) Lt T n T two trs. If th tr it istn twn T n T is δ t (T, T ), thn th inry rnh istn twn thm stisfis δ B (T, T ) 5 δ t (T, T ). Proof (Skth Full Proof in [YKT05]). Eh no v pprs in t most two inry rnhs. Rnm: Rnming no uss t most two inry rnhs in h tr to mismth. Th sum is 4. Similr rtionl for insrt n its omplmntry oprtion lt (t most 5 inry rnhs mismth). Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 21 / 28

Binry Brnh Distn Lowr Boun for th Eit Distn Proof Skth: Illustrtion for Rnm trnsform T 1 to T 2 : rn(, x) f g inry trs B(T 1 ) n B(T 2 ) f g x f g Two inry rnhs (, g) xist only in B(T 1 ) Two inry rnhs (x, xg) xist only in B(T 2 ) δ t (T 1, T 2 ) = 1 (1 rnm) δ B (T 1, T 2 ) = 4 (4 inry rnhs iffrnt) f x g Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 22 / 28

Binry Brnh Distn Proof Skth: Illustrtion for Insrt trnsform T 1 to T 2 : ins(x,, 2, 3) f g inry trs B(T 1 ) n B(T 2 ) f g x f g f x g Lowr Boun for th Eit Distn Two inry rnhs (, f g) xist only in B(T 1 ) Tr inry rnhs (x, f, xg) xist only in B(T 2 ) δ t (T 1, T 2 ) = 1 (1 insrtion) δ B (T 1, T 2 ) = 5 (5 inry rnhs iffrnt) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 23 / 28

Proof Skth Binry Brnh Distn Lowr Boun for th Eit Distn In gnrl it n shown tht Rnm hngs t most 4 inry rnhs Insrt hngs t most 5 inry rnhs Dlt hngs t most 5 inry rnhs Eh it oprtion hngs t most 5 inry rnhs, thus δ B (T, T ) 5 δ t (T, T ). Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 24 / 28

Outlin Binry Brnh Distn Complxity 1 Binry Brnh Distn Binry Rprsnttion of Tr Binry Brnhs Lowr Boun for th Eit Distn Complxity Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 25 / 28

Binry Brnh Distn Complxity Complxity: Binry Brnh Distn Comput th istn twn two trs of siz O(n): (S = {T 1, T 2 }, n = mx{ T 1, T 2 }) Constrution of th inry rnh vtors BBV (T 1 ) n BBV (T 2 ): 1. BiB(S) omput th inry rnhs of T 1 n T 2 : O(n) tim n sp (trvrs T 1 n T 2 ) 2. BiB sort (S) sort sriliz inry rnhs of BiB(S): O(n log n) tim n O(n) sp 3. onstrut BBV (T 1 ) n BBV (T 2 ): () trvrs ll inry rnhs: O(n) tim n sp () for h inry rnh fin position i in BiB sort (S): O(n log n) tim (inry srh in BiB sort (S) for n inry rnhs) () BBV (T)[i] is inrmnt: O(1) Computing th istn: th two inry rnh vtors r of siz O(n) omputing th istn hs tim omplxity O(n) (sutrting two inry rnh vtors) Th ovrll omplxity is O(n log n) tim n O(n) sp. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 26 / 28

Binry Brnh Distn Complxity Improving th Tim Complxity with Hsh Funtion Not: Improvmnt using hsh funtion: w ssum hsh funtion tht mps th O(n) inry rnhs to O(n) ukts without ollision w o not sort BiB(S) position i in th vtor BBV (T) is omput using th hsh funtion O(n) tim (inst of O(n log n)) n O(n) sp In th following w ssum th sort lgorithm with O(n log n) runtim. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 27 / 28

Binry Brnh Distn Complxity for Similrity Joins Complxity Join two sts with N trs h (tr siz: n): Comput Binry Brnh Vtors (BBVs): O(Nn log(nn)) tim, O(N 2 n) sp BBVs r of siz O(Nn) tim: sort O(Nn) inry rnhs / O(Nn) inry srhs in BBVs sp: O(N) BBVs must stor Comput Distns: O(N 3 n) tim omputing th istn twn two trs hs O(Nn) tim omplxity (sutrting two inry rnh vtors) O(N 2 ) istn omputtions rquir Ovrl Complxity: O(N 3 n + Nn log n) 2 tim n O(N 2 n) sp 2 O(N 3 n + Nn log(nn)) = O(N 3 n + Nn log N + Nn log n) = O(N 3 n + Nn log n) Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 28 / 28

Rui Yng, Pnos Klnis, n Anthony K. H. Tung. Similrity vlution on tr-strutur t. In Proings of th ACM SIGMOD Intrntionl Confrn on Mngmnt of Dt, pgs 754 765, Bltimor, Mryln, USA, Jun 2005. ACM Prss. Augstn (Univ. Slzurg) Similrity Srh Wintrsmstr 2016/2017 28 / 28