Outlin Similrity Srh Th Binry Brnh Distn Nikolus Austn nikolus.ustn@s..t Dpt. o Computr Sins Univrsity o Slzur http://rsrh.uni-slzur.t 1 Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnhs Lowr Boun or th Eit Distn Complxity Vrsion Jnury 11, 2017 Wintrsmstr 2016/2017 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 1 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 2 / 28 Outlin Binry Brnh Distn Binry Rprsnttion o Tr Binry Tr Binry Brnh Distn Binry Rprsnttion o Tr 1 Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnhs Lowr Boun or th Eit Distn Complxity In inry tr h no hs t most two hilrn; lt hil n riht hil r istinuish: no n hv riht hil without hvin lt hil; Nottion: T B = (N, E l, E r ) T B nots inry tr N r th nos o th inry tr E l n E r r th s to th lt n riht hilrn, rsptivly Full inry tr: inry tr h no hs xtly zro or two hilrn. Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 3 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 4 / 28
Exmpl: Binry Tr Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnh Distn Binry Rprsnttion o Tr Binry Rprsnttion o Tr Two irnt inry trs: T B = (N, E l, E r ) T B1 = ({,,,,,, }, {(, ), (, ), (, ), (, )}, {(, ), (, )}) T B2 = ({,,,,,, }, {(, ), (, ), (, )}, {(, ), (, ), (, )}) T B1 A ull inry tr: h T B2 i Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 5 / 28 Binry Brnh Distn Binry Rprsnttion o Tr Exmpl: Binry Tr Trnsormtion Binry tr trnsormtion: (i) link ll nihorin silins in tr with s (ii) lt ll prnt-hil s xpt th to th irst hil Trnsormtion mintins ll inormtion strutur inormtion Oriinl tr n ronstrut rom th inry tr: lt rprsnts prnt-hil rltionships in th oriinl tr riht s rprsnts riht-silin rltionship in th oriinl tr Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 6 / 28 Binry Brnh Distn Binry Rprsnttion o Tr Normliz Binry Tr Rprsnttion Rprsnt tr T s inry tr: T inry rprsnttion o T W xtn th inry tr with null nos s ollows: null no or h missin lt hil o non-null no null no or h missin riht hil o non-null no Not: L nos t two null-hilrn. Th rsultin normliz inry rprsnttion is ull inry tr ll non-null nos hv two hilrn ll lvs r null-nos (n ll null-nos r lvs) Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 7 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 8 / 28
Binry Brnh Distn Exmpl: Normliz Binry Tr Binry Rprsnttion o Tr Outlin Binry Brnh Distn Binry Brnhs Trnsormin T to th normliz inry tr B(T): T B(T) 1 Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnhs Lowr Boun or th Eit Distn Complxity Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 9 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 10 / 28 Binry Brnh Binry Brnh Distn Binry Brnhs Binry Brnh Distn Binry Brnhs Binry Brnhs o Trs n Dtsts A inry rnh BiB(v) is sutr o th normliz inry tr B(T) onsistin o non-null no v n its two hilrn Exmpl: BiB() = ({,, }, {(, )}, {(, )}) BiB() = ({, 1, 2 }, {(, 1 )}, {(, 2 )}) 1 Binry rnhs n sriliz s strins: BiB(v) = ({v,, }, {(v, )}, {(v, )}) λ(v) λ() λ() w n sort ths strins ( > λ(v) or ll non-null nos v) Binry rnh sts: BiB(T) is th st o ll inry rnhs o B(T) BiB(S) = T S BiB(T) is th st o ll inry rnhs o tst S BiB sort (S) is th vtor o sort sriliz strins o BiB(S) Not: nos r uniqu in th tr, thus inry rnhs r uniqu lls r not uniqu, thus th sriliz inry rnhs r not uniqu 1 2 1 Althouh th two null nos hv intil lls (), thy r irnt nos. W mphsiz this y showin thir IDs in susript. Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 11 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 12 / 28
Binry Brnh Distn Binry Brnhs Exmpl: Binry Brnhs o Trs n Dtsts Binry Brnh Vtor Binry Brnh Distn Binry Brnhs T 1 T 2 1 3 4 6 BiB( 1 ) BiB( 4 ): BiB( 1 ) = ({ 1, 2, 3 }, {( 1, 2 )}, {( 1, 3 )}) BiB( 4 ) = ({ 4, 5, 6 }, {( 4, 5 )}, {( 4, 6 )}) Sriliztion o oth, BiB( 1 ) n BiB( 2 ), is intil: Sort vtor o sriliz strins o BiB(S), whr S = {T 1, T 2 }: BiB sort (S) = (,,,,,,,,, ) Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 13 / 28 Th inry rnh vtor BBV (T) is rprsnttion o th inry rnh st BiB(T) Constrution o th inry rnh vtor BBV (T): omput BiB sort (S) (sriliz n sort BiB(S)) i is th i-th sriliz inry rnh in sort orr ( i = BiB sort (S)[i]) BBV (T)[i]) is th numr o inry rnhs in B(T) tht sriliz to i Not: BBV (T)[i] is zro i i os not ppr in BiB(T) Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 14 / 28 Binry Brnh Distn Exmpl: Binry Brnh Vtors Binry Brnhs Outlin Binry Brnh Distn Lowr Boun or th Eit Distn T 1 T2 S = {T 1, T 2 } is th t st BiB sort (S) is th vtor o sort sriliz strins o BiB(S) BBV (T i ) is th inry rnh vtor o T i th vtor o sriliz strins n th inry rnh vtors r: BiB sort (S) BBV (T 1 ) BBV (T 2 ) 1 1 0 1 0 2 0 0 2 1 1 0 1 0 1 2 1 1 0 2 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 15 / 28 1 Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnhs Lowr Boun or th Eit Distn Complxity Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 16 / 28
Binry Brnh Distn Binry Brnh Distn [YKT05] Lowr Boun or th Eit Distn Binry Brnh Distn Exmpl: Binry Brnh Distn Lowr Boun or th Eit Distn Dinition (Binry Brnh Distn) Lt BBV (T) = ( 1,..., k ) n BBV (T ) = ( 1,..., k ) inry rnh vtors o trs T n T, rsptivly. Th inry rnh istn o T n T is k δ B (T, T ) = i i. i=1 Intuition: W ount th inry rnhs tht o not mth twn th two trs. W omput th inry rnh istn twn T 1 n T 2 : T 1 T 2 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 17 / 28 Binry Brnh Distn Lowr Boun or th Eit Distn Exmpl: Binry Brnh Distn Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 18 / 28 Binry Brnh Distn Lowr Boun or th Eit Distn Exmpl: Binry Brnh Distn Th normliz inry tr rprsnttions r: B (T 1 ) B (T 2 ) Th inry rnh vtors o T 1 n T 2 r: BiB sort (S) BBV (T 1 ) BBV (T 2 ) Th inry rnh istn is 1 1 0 1 0 2 0 0 2 1 1 0 1 0 1 2 1 1 0 2 δ B (T 1, T 2 ) = 10 i=1 1,i 2,i = 1 1 + 1 0 + 0 1 + 1 0 + 0 1 + 2 2 + 0 1 + 0 1 + 2 0 + 1 2 = 9, whr 1,i n 2,i r th i-th imnsion o th vtors BBV (T 1 ) n BBV (T 2 ), rsptivly. Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 19 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 20 / 28
Binry Brnh Distn Lowr Boun Thorm Lowr Boun or th Eit Distn Binry Brnh Distn Lowr Boun or th Eit Distn Proo Skth: Illustrtion or Rnm Thorm (Lowr Boun) Lt T n T two trs. I th tr it istn twn T n T is δ t (T, T ), thn th inry rnh istn twn thm stisis δ B (T, T ) 5 δ t (T, T ). Proo (Skth Full Proo in [YKT05]). Eh no v pprs in t most two inry rnhs. Rnm: Rnmin no uss t most two inry rnhs in h tr to mismth. Th sum is 4. Similr rtionl or insrt n its omplmntry oprtion lt (t most 5 inry rnhs mismth). Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 21 / 28 trnsorm T 1 to T 2 : rn(, x) inry trs B(T 1 ) n B(T 2 ) x Two inry rnhs (, ) xist only in B(T 1 ) Two inry rnhs (x, x) xist only in B(T 2 ) δ t (T 1, T 2 ) = 1 (1 rnm) δ B (T 1, T 2 ) = 4 (4 inry rnhs irnt) x Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 22 / 28 Binry Brnh Distn Proo Skth: Illustrtion or Insrt Lowr Boun or th Eit Distn Proo Skth Binry Brnh Distn Lowr Boun or th Eit Distn trnsorm T 1 to T 2 : ins(x,, 2, 3) inry trs B(T 1 ) n B(T 2 ) x x Two inry rnhs (, ) xist only in B(T 1 ) Tr inry rnhs (x,, x) xist only in B(T 2 ) δ t (T 1, T 2 ) = 1 (1 insrtion) δ B (T 1, T 2 ) = 5 (5 inry rnhs irnt) Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 23 / 28 In nrl it n shown tht Rnm hns t most 4 inry rnhs Insrt hns t most 5 inry rnhs Dlt hns t most 5 inry rnhs Eh it oprtion hns t most 5 inry rnhs, thus δ B (T, T ) 5 δ t (T, T ). Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 24 / 28
Outlin Binry Brnh Distn Complxity Binry Brnh Distn Complxity Complxity: Binry Brnh Distn 1 Binry Brnh Distn Binry Rprsnttion o Tr Binry Brnhs Lowr Boun or th Eit Distn Complxity Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 25 / 28 Binry Brnh Distn Complxity Improvin th Tim Complxity with Hsh Funtion Comput th istn twn two trs o siz O(n): (S = {T 1, T 2 }, n = mx{ T 1, T 2 }) Constrution o th inry rnh vtors BBV (T 1 ) n BBV (T 2 ): 1. BiB(S) omput th inry rnhs o T 1 n T 2 : O(n) tim n sp (trvrs T 1 n T 2 ) 2. BiB sort (S) sort sriliz inry rnhs o BiB(S): O(n lo n) tim n O(n) sp 3. onstrut BBV (T 1 ) n BBV (T 2 ): () trvrs ll inry rnhs: O(n) tim n sp () or h inry rnh in position i in BiB sort(s): O(n lo n) tim (inry srh in BiB sort(s) or n inry rnhs) () BBV (T)[i] is inrmnt: O(1) Computin th istn: th two inry rnh vtors r o siz O(n) omputin th istn hs tim omplxity O(n) (sutrtin two inry rnh vtors) Th ovrll omplxity is O(n lo n) tim n O(n) sp. Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 26 / 28 Binry Brnh Distn Complxity or Similrity Joins Complxity Not: Improvmnt usin hsh untion: w ssum hsh untion tht mps th O(n) inry rnhs to O(n) ukts without ollision w o not sort BiB(S) position i in th vtor BBV (T) is omput usin th hsh untion O(n) tim (inst o O(n lo n)) n O(n) sp In th ollowin w ssum th sort lorithm with O(n lo n) runtim. Join two sts with N trs h (tr siz: n): Comput Binry Brnh Vtors (BBVs): O(Nn lo(nn)) tim, O(N 2 n) sp BBVs r o siz O(Nn) tim: sort O(Nn) inry rnhs / O(Nn) inry srhs in BBVs sp: O(N) BBVs must stor Comput Distns: O(N 3 n) tim omputin th istn twn two trs hs O(Nn) tim omplxity (sutrtin two inry rnh vtors) O(N 2 ) istn omputtions rquir Ovrl Complxity: O(N 3 n + Nn lo n) 2 tim n O(N 2 n) sp 2 O(N 3 n + Nn lo(nn)) = O(N 3 n + Nn lo N + Nn lo n) = O(N 3 n + Nn lo n) Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 27 / 28 Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 28 / 28
Rui Yn, Pnos Klnis, n Anthony K. H. Tun. Similrity vlution on tr-strutur t. In Proins o th ACM SIGMOD Intrntionl Conrn on Mnmnt o Dt, ps 754 765, Bltimor, Mryln, USA, Jun 2005. ACM Prss. Austn (Univ. Slzur) Similrity Srh Wintrsmstr 2016/2017 28 / 28