Periodi string omprison Alexnder Tiskin Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin Alexnder Tiskin (Wrwik) Periodi string omprison 1 / 51
1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 2 / 51
1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 3 / 51
Introdution String mthing: finding n ext pttern in string String omprison: finding similr ptterns in two strings (Also known s pproximte string mthing, no reltion to pproximtion lgorithms!) Applitions: omputtionl iology, imge reognition,... Alexnder Tiskin (Wrwik) Periodi string omprison 4 / 51
Introdution String mthing: finding n ext pttern in string String omprison: finding similr ptterns in two strings (Also known s pproximte string mthing, no reltion to pproximtion lgorithms!) Applitions: omputtionl iology, imge reognition,... Stndrd types of string omprison: glol: whole string vs whole string lol: sustrings vs sustrings Min fous of this work: semi-lol: whole string vs sustrings; prefixes vs suffixes Min tool: impliit unit-monge mtries Alexnder Tiskin (Wrwik) Periodi string omprison 4 / 51
Introdution Terminology nd nottion Integers:... 2, 1, 0, 1, 2,... Odd hlf-integers:... 5 2, 3 2, 1 2, 1 2, 3 2, 5 2,... We onsider finite nd infinite integer mtries over integer nd odd hlf-integer indies. For simpliity, index rnge will usully e ignored. A permuttion mtrix is 0/1 mtrix with extly one nonzero per row nd per olumn Alexnder Tiskin (Wrwik) Periodi string omprison 5 / 51
Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51
Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Given mtrix E, its density mtrix is E (i, j) = E(i, j + ) E(i, j ) E(i +, j + ) + E(i +, j ) where i ± = i ± 1 2 ; DΣ, E over integers; D, E over odd hlf-integers Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51
Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Given mtrix E, its density mtrix is E (i, j) = E(i, j + ) E(i, j ) E(i +, j + ) + E(i +, j ) where i ± = i ± 1 2 ; DΣ, E over integers; D, E over odd hlf-integers (D Σ ) = D for ll D Mtrix E is simple, if (E ) Σ = E Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51
Introdution Terminology nd nottion Mtrix E is Monge, if E is nonnegtive Intuition: order-to-order distnes in (weighted) plnr grph Mtrix E is unit-monge, if E is permuttion mtrix Intuition: order-to-order distnes in grid-like grph Alexnder Tiskin (Wrwik) Periodi string omprison 7 / 51
Introdution Terminology nd nottion P P Σ (P Σ ) = P Alexnder Tiskin (Wrwik) Periodi string omprison 8 / 51
Introdution Impliit unit-monge mtries Impliit P Σ : rnge tree on nonzeros of P [Bentley: 1980] inry serh tree y i-oordinte under every node, inry serh tree y j-oordinte Alexnder Tiskin (Wrwik) Periodi string omprison 9 / 51
Introdution Impliit unit-monge mtries Impliit P Σ (ontd.) Every node of the rnge tree represents nonil rnge (retngulr region), nd stores its nonzero ount Overll, n log n nonil rnges re non-empty Rnge tree supports dominne ounting queries: how mny nonzeros re dominted y given point? Answered y deomposing query rnge into log 2 n disjoint nonil rnges. Totl size O(n log n), query time O(log 2 n) There re symptotilly more effiient (ut less prtil) dt strutures Alexnder Tiskin (Wrwik) Periodi string omprison 10 / 51
Introdution Mtrix -multiplition Mtrix -multiplition (.k.. distne, (min, +) or tropil multiplition) A B = C C(i, k) = min j ( A(i, j) + B(j, k) ) Alexnder Tiskin (Wrwik) Periodi string omprison 11 / 51
Introdution Mtrix -multiplition Mtrix -multiplition (.k.. distne, (min, +) or tropil multiplition) A B = C C(i, k) = min j ( A(i, j) + B(j, k) ) Mtrix lsses losed under -multiplition: generl numeril (integer, rel) mtries Monge mtries simple unit-monge mtries Simple unit-monge mtries of size n form n periodi monoid (i.e. monoid s fr s possile from group) under -multiplition We ll it the seweed monoid T n Alexnder Tiskin (Wrwik) Periodi string omprison 11 / 51
Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51
Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51
Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51
Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P C P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51
Introdution Mtrix -multiplition Seweeds: similr to rids, generted y strnd rossings Unlike in rids, ll seweed rossings re level (not underpss/overpss) idempotent, i.e. two seweeds n ross t most one Seweed omposition: ssoitive, no inverse ( rossing nnot e nelled) Identity: 1 x = x, no seweeds rossing Zero: 0 x = 0, ll seweeds rossing ) Σ 1 = 0 = ( ( ) Σ Alexnder Tiskin (Wrwik) Periodi string omprison 13 / 51
Introdution Mtrix -multiplition The seweed monoid T n : n! elements (permuttions of size n) n 1 genertors g 1, g 2,..., g n 1 (elementry rossings) g 2 i = g i for ll i (idempotene) g i g j = g j g i j i > 1 (fr ommuttivity) g i g j g i = g j g i g j j i = 1 (rid reltions) Computtion: onfluent rewriting system n e otined y softwre (Semigroupe, GAP) Generlistion: Coxeter monoids (sugroup monoids in groups) [Tsrnov: 90] Alexnder Tiskin (Wrwik) Periodi string omprison 14 / 51
Introdution Mtrix -multiplition The seweed monoid T 3 Genertors: 1, = g 1, = g 2 Other elements:,, = 0 Rewriting system: 0 0 Alexnder Tiskin (Wrwik) Periodi string omprison 15 / 51
Introdution Mtrix -multiplition The seweed monoid T 4 Genertors: 1, = g 1, = g 2, = g 3 Other elements:,,,,,,,,,,,,,,,,,,, = 0 Rewriting system: 0 Alexnder Tiskin (Wrwik) Periodi string omprison 16 / 51
Introdution Mtrix -multiplition The impliit mtrix -multiplition prolem Given permuttion mtries P A, P B, ompute P C, suh tht P Σ A PΣ B = PΣ C Alexnder Tiskin (Wrwik) Periodi string omprison 17 / 51
Introdution Mtrix -multiplition The impliit mtrix -multiplition prolem Given permuttion mtries P A, P B, ompute P C, suh tht P Σ A PΣ B = PΣ C Mtrix -multiplition: running time mtrix type time generl O(n 3 ) stndrd Monge O(n 2 ) y [Aggrwl+: 1987] impliit simple unit-monge (P Σ ) O(n 1.5 ) [T: 2006] O(n log n) [T: NEW] Alexnder Tiskin (Wrwik) Periodi string omprison 17 / 51
Introdution Mtrix -multiplition P B P A P C? Alexnder Tiskin (Wrwik) Periodi string omprison 18 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C,lo + P C,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51
Introdution Mtrix -multiplition Impliit mtrix -multiplition: the lgorithm PC Σ(i, k) = min ( j P Σ A (i, j) + PB Σ (j, k)) Divide-nd-onquer on the rnge of j Divide P A horizontlly, P B vertilly; two suprolems of effetive size n/2: P Σ A,lo PΣ B,lo = PΣ C,lo P Σ A,hi PΣ B,hi = P Σ C,hi Conquer: most (ut not ll!) nonzeros of P C,lo, P C,hi pper in P C Missing nonzeros n e otined in time O(n) using the Monge property Overll time O(n log n) Alexnder Tiskin (Wrwik) Periodi string omprison 20 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C,lo + P C,hi Alexnder Tiskin (Wrwik) Periodi string omprison 21 / 51
Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C Alexnder Tiskin (Wrwik) Periodi string omprison 21 / 51
1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 22 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne Consider strings (= sequenes) over n lphet of size σ Distinguish ontiguous sustrings nd not neessrily ontiguous susequenes Speil ses of sustring: prefix, suffix Nottion: strings, of length m, n respetively Assume where neessry: m n; m, n resonly lose Alexnder Tiskin (Wrwik) Periodi string omprison 23 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne Consider strings (= sequenes) over n lphet of size σ Distinguish ontiguous sustrings nd not neessrily ontiguous susequenes Speil ses of sustring: prefix, suffix Nottion: strings, of length m, n respetively Assume where neessry: m n; m, n resonly lose The longest ommon susequene (LCS) sore: length of longest string tht is susequene of oth nd equivlently, lignment sore, where sore(mth) = 1 nd sore(mismth) = 0 Alexnder Tiskin (Wrwik) Periodi string omprison 23 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem Give the LCS sore for vs Alexnder Tiskin (Wrwik) Periodi string omprison 24 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem Give the LCS sore for vs LCS: running time O(mn) [Wgner, Fisher: 1974] O ( ) mn log n O ( mn(log log n) 2 ) log n σ = O(1) [Msek, Pterson: 1980] [Crohemore+: 2003] [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 24 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne LCS on the lignment grph (direted, yli) lue = 0 red = 1 LCS("", "") = "" LCS = highest-sore orner-to-orner pth Alexnder Tiskin (Wrwik) Periodi string omprison 25 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne LCS: dynmi progrmming (DP) lgorithm [Wgner, Fisher: 1974] Sweep lignment grph, respeting node dependenies Running time O(mn) Alexnder Tiskin (Wrwik) Periodi string omprison 26 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne LCS: dynmi progrmming (DP) lgorithm [Wgner, Fisher: 1974] Sweep lignment grph, respeting node dependenies Running time O(mn) LCS: miro-lok DP lgorithm [Msek, Pterson: 1980] Sweep lignment grph in squre loks, respeting lok dependenies Blok size: t = O(log n) Blok interfe: O(t) inputs/outputs, eh of size O(log σ) Use preomputed mpping of ll possile input/output omintions Running time O ( mn log n) when σ = O(1), even on log-ost RAM Alexnder Tiskin (Wrwik) Periodi string omprison 26 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS The three-wy semi-lol LCS prolem Give the (impliit) mtrix of O(n 2 ) LCS sores: string-sustring, prefix-suffix, suffix-prefix LCS no sustring-string LCS Suitle for m n Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS The three-wy semi-lol LCS prolem Give the (impliit) mtrix of O(n 2 ) LCS sores: string-sustring, prefix-suffix, suffix-prefix LCS no sustring-string LCS Suitle for m n Cf.: dynmi progrmming gives prefix-prefix LCS Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne Semi-lol LCS on the lignment grph lue = 0 red = 1 LCS("", "...") = "" Semi-lol LCS = ll highest-sore order-to-order pths (string-sustring = top-to-ottom, et.) Alexnder Tiskin (Wrwik) Periodi string omprison 28 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem is speil se of the lignment sore prolem with weighted mthes, mismthes nd gps LCS sore: w mth = 1, w mismth = w gp = 0 Levenshtein sore: w mth = 2, w mismth = 1, w gp = 0 An lignment sore is rtionl, if w mth, w mismth, w gp re rtionl; redues to LCS sore y onstnt-ftor low-up of lignment grph Alexnder Tiskin (Wrwik) Periodi string omprison 29 / 51
Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem is speil se of the lignment sore prolem with weighted mthes, mismthes nd gps LCS sore: w mth = 1, w mismth = w gp = 0 Levenshtein sore: w mth = 2, w mismth = 1, w gp = 0 An lignment sore is rtionl, if w mth, w mismth, w gp re rtionl; redues to LCS sore y onstnt-ftor low-up of lignment grph The semi-lol lignment sore prolem: string-sustring, prefix-suffix, sustring-string, suffix-prefix lignment sores Edit distne: minimum ost to trnsform into y weighted hrter edits (insertion, deletion, sustitution) The semi-lol edit distne prolem: semi-lol lignment sore prolem with w mth = 0, w mismth = w su, w gp = w indel Alexnder Tiskin (Wrwik) Periodi string omprison 29 / 51
Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" A(0, 13) = LCS(, ) = 8 = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j Alexnder Tiskin (Wrwik) Periodi string omprison 30 / 51
Semi-lol string omprison Highest-sore mtries Semi-lol LCS: output representtion nd running time size query time O(n 2 ) O(1) trivil O(m 1/2 n) O(log n) string-sustring [Alves+: 2003] O(n) O(n) string-sustring [Alves+: 2005] O(n log n) O(log 2 n) [T: 2006] running time O(mn 2 ) nive O(mn) string-sustring [Shmidt: 1998] string-sustring [Alves+: 2005] O(mn) [T: 2006] O ( ) mn [T: 2006] log 0.5 n O ( mn(log log n) 2 ) log n [T: 2007] Alexnder Tiskin (Wrwik) Periodi string omprison 31 / 51
Semi-lol string omprison Highest-sore mtries A: the semi-lol LCS sore mtrix for vs A(i, j): the numer of mthed hrters for vs sustring of Q(i, j) = j i A(i, j): the numer of unmthed hrters Properties of mtrix Q: Q is simple unit-monge therefore, Q = P Σ for some permuttion mtrix P P = Q = A is n impliit representtion of A Rnge tree for P: memory O(n log n), query time O(log 2 n) Alexnder Tiskin (Wrwik) Periodi string omprison 32 / 51
Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51
Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j lue: differene 0 red: differene 1 Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51
Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j lue: differene 0 red: differene 1 green: P(i, j) = 1 A(i, j) = j i P Σ (i, j) Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51
Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A Alexnder Tiskin (Wrwik) Periodi string omprison 34 / 51
Semi-lol string omprison Highest-sore mtries The seweeds in the lignment grph = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A P(i, j) = 1 orresponds to seweed (top, i) (ottom, j) Alexnder Tiskin (Wrwik) Periodi string omprison 35 / 51
Semi-lol string omprison Highest-sore mtries The seweeds in the lignment grph = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A P(i, j) = 1 orresponds to seweed (top, i) (ottom, j) Also define top right, left right, left ottom seweeds Gives omplete order-to-order grph-theoreti mthing Alexnder Tiskin (Wrwik) Periodi string omprison 35 / 51
1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 36 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51
The seweed lgorithm The seweed lgorithm Semi-lol LCS: the seweed lgorithm [T: 2006] Iterte over lignment grph, tring seweeds Pik ells in ny order, respeting dependenies In every ell, the two entering seweeds ross, if mismth nd they hve not rossed efore end otherwise Running time O(mn) Alexnder Tiskin (Wrwik) Periodi string omprison 38 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51
The seweed lgorithm The miro-lok seweed lgorithm Semi-lol LCS: the miro-lok seweed lgorithm [T: 2007] Iterte over lignment grph in loks, tring seweeds Use preomputed mpping of ll possile lok inputs to outputs Blok size: t = O ( log n ) log log n Blok interfe: O(t) vlues (input hrs nd seweeds), eh of size O(log n) ut n e ompressed to O(log log n) y reursive sheme Running time O ( m t n t t log log n) = O ( mn(log log n) 2 ) log n, even on log-ost RAM Alexnder Tiskin (Wrwik) Periodi string omprison 40 / 51
The seweed lgorithm Cyli LCS The yli LCS prolem Give the mximum LCS sore for vs ll yli rottions of Alexnder Tiskin (Wrwik) Periodi string omprison 41 / 51
The seweed lgorithm Cyli LCS The yli LCS prolem Give the mximum LCS sore for vs ll yli rottions of Cyli LCS: running time O ( ) mn 2 log n nive O(mn log m) [Mes: 1990] O(mn) [Bunke, Bühler: 1993; Lndu+: 1998; Shmidt: 1998] O ( mn(log log n) 2 ) log n [T: 2007] Cyli LCS: the lgorithm Run the miro-lok seweed lgorithm on vs, time O ( mn(log log n) 2 ) log n Mke n string-sustring LCS queries, time negligile Alexnder Tiskin (Wrwik) Periodi string omprison 41 / 51
The seweed lgorithm Longest repeting susequene The longest repeting susequene prolem Find the longest susequene of tht is squre ( repetition of two identil strings) Motivted y tndem repets in genome Alexnder Tiskin (Wrwik) Periodi string omprison 42 / 51
The seweed lgorithm Longest repeting susequene The longest repeting susequene prolem Find the longest susequene of tht is squre ( repetition of two identil strings) Motivted y tndem repets in genome Longest repeting susequene: running time O(n 3 ) nive O(n 2 ) [Kosowski: 2004] O ( n 2 (log log n) 2 ) log n [T: 2007] Longest repeting susequene: the lgorithm Run the miro-lok seweed lgorithm on vs, time O ( mn(log log n) 2 ) log n Mke n 1 suffix-prefix LCS queries, time negligile Alexnder Tiskin (Wrwik) Periodi string omprison 42 / 51
The seweed lgorithm Approximte mthing The pproximte pttern mthing prolem Give the sustring losest to y lignment sore, strting t eh position in Assume rtionl lignment sore Approximte pttern mthing: running time O(mn) [Sellers: 1980] O ( ) mn log n σ = O(1) vi [Msek, Pterson: 1980] O ( mn(log log n) 2 ) log n vi [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 43 / 51
The seweed lgorithm Approximte mthing Approximte pttern mthing: the lgorithm Run the miro-lok seweed lgorithm on vs under given lignment sore in time O ( mn(log log n) 2 ) log n The impliit semi-lol edit sore mtrix: n nti-monge mtrix pproximte pttern mthing row minim Row minim in O(n) element queries [Aggrwl+: 1987] Eh query in time O(log 2 n) using the rnge tree representtion, omined query time negligile Overll running time dominted y lok seweed lgorithm, sme s [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 44 / 51
The seweed lgorithm The periodi seweed lgorithm The periodi string-sustring LCS prolem Give (impliit) LCS sores for vs eh sustring of =... uuu... = u ± Let u e of length p; my ssume tht every hrter of ours in u The tndem LCS prolem Give LCS sore for vs = u k We hve n = kp; my ssume k m (otherwise LCS sore is m) Tndem LCS: running time O(mkp) nive O(m(k + p)) [Lndu, Ziv-Ukelson: 2001] O(mp) [NEW] Alexnder Tiskin (Wrwik) Periodi string omprison 45 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51
The seweed lgorithm The periodi seweed lgorithm Periodi string-sustring LCS: The periodi seweed lgorithm Iterte over lignment grph, tring seweeds row-y-row In every row, strt from mth ell nd move rightwrds, wrpping round t the grph edge In every ell, the two entering seweeds ross, if mismth nd they hve not rossed efore end otherwise t right edge of the grph, wrp round k to left edge Running time O(mn) Querying string-sustring LCS sore (inluding tndem LCS): ount eh nonzero dominted y query point with pproprite multipliity, either diretly or vi rnge tree Alexnder Tiskin (Wrwik) Periodi string omprison 47 / 51
The seweed lgorithm The periodi seweed lgorithm The tndem lignment prolem Give the sustring losest to y lignment sore mong ertin sustrings of = u ± : glol: sustrings of the form k ross ll k yli: sustrings of length kp ross ll k lol: sustrings of ny length Tndem lignment: running time O(m 2 p) ll nive O(mp) glol [Myers, Miller: 1989] O(mp log p) yli [Benson: 2005] O(mp) yli [NEW] O(mp) lol [Myers, Miller: 1989] Alexnder Tiskin (Wrwik) Periodi string omprison 48 / 51
The seweed lgorithm The periodi seweed lgorithm Cyli tndem lignment: the lgorithm Run periodi seweed lgorithm (under given lignment sore), time O(np) For eh k [1 : m]: solve tndem LCS (under given lignment sore) for ginst k otin p suessive string-sustring lignment sores y inrementl sore updting, eh in time O(1) Running time O(mp) Alexnder Tiskin (Wrwik) Periodi string omprison 49 / 51
1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 50 / 51
Conlusions nd future work Semi-lol LCS prolem: representtion y impliit unit-monge mtries generlistion to rtionl lignment sores open: rel lignment sores? The seweed nd miro-lok seweed lgorithms: simple lgorithm for semi-lol LCS semi-lol LCS in time o(mn) vi miro-loks improvements on relted prolems The periodi seweed lgorithm: strightforwrd extension of the seweed lgorithm periodi semi-lol LCS in time O(mp) nturl pplitions open: o(mp) vi miro-loks? Alexnder Tiskin (Wrwik) Periodi string omprison 51 / 51