Periodic string comparison

Similar documents
Semi-local string comparison

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Prefix-Free Regular-Expression Matching

Data Structures and Algorithm. Xiaoqing Zheng

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Reference : Croft & Davison, Chapter 12, Blocks 1,2. A matrix ti is a rectangular array or block of numbers usually enclosed in brackets.

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Computing the Cyclic Edit Distance for Pattern Classification by Ranking Edit Paths

Learning Objectives of Module 2 (Algebra and Calculus) Notes:

Powering a number. More Divide & Conquer

Finite State Automata and Determinisation

Introduction to Olympiad Inequalities

( ) { } [ ] { } [ ) { } ( ] { }

Fast index for approximate string matching

Hybrid Systems Modeling, Analysis and Control

Linear Algebra Introduction

The Word Problem in Quandles

NON-DETERMINISTIC FSA

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α

Nondeterministic Finite Automata

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

8 THREE PHASE A.C. CIRCUITS

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Alpha Algorithm: Limitations

Solutions to Assignment 1

Lecture 2: Cayley Graphs

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Chapter 8 Roots and Radicals

University of Sioux Falls. MAT204/205 Calculus I/II

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

Learning Partially Observable Markov Models from First Passage Times

Chapter 4 State-Space Planning

Computing with finite semigroups: part I

Discrete Structures Lecture 11

First Midterm Examination

5. Every rational number have either terminating or repeating (recurring) decimal representation.

Lecture Notes No. 10

Computing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt

Aperiodic tilings and substitutions

TIME AND STATE IN DISTRIBUTED SYSTEMS

Nondeterministic Automata vs Deterministic Automata

Abstraction of Nondeterministic Automata Rong Su

A Study on the Properties of Rational Triangles

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Can one hear the shape of a drum?

Fast Frequent Free Tree Mining in Graph Databases

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

Fingerprint idea. Assume:

Figure 1. The left-handed and right-handed trefoils

y1 y2 DEMUX a b x1 x2 x3 x4 NETWORK s1 s2 z1 z2

QUADRATIC EQUATION. Contents

Coalgebra, Lecture 15: Equations for Deterministic Automata

= state, a = reading and q j

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Lossless Compression Lossy Compression

GM1 Consolidation Worksheet

A Differential Approach to Inference in Bayesian Networks

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Similar Right Triangles

Introduction to Bioinformatics

CSCI565 - Compiler Design

Random subgroups of a free group

General Suffix Automaton Construction Algorithm and Space Bounds

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

Lecture Summaries for Multivariable Integral Calculus M52B

MATH Final Review

HS Pre-Algebra Notes Unit 9: Roots, Real Numbers and The Pythagorean Theorem

Håkan Lennerstad, Lars Lundberg

Functions. mjarrar Watch this lecture and download the slides

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then

PYTHAGORAS THEOREM WHAT S IN CHAPTER 1? IN THIS CHAPTER YOU WILL:

Comparing the Pre-image and Image of a Dilation

Three termination problems. Patrick Dehornoy. Laboratoire de Mathématiques Nicolas Oresme, Université de Caen

Calculus Cheat Sheet. Integrals Definitions. where F( x ) is an anti-derivative of f ( x ). Fundamental Theorem of Calculus. dx = f x dx g x dx

Unit 4. Combinational Circuits

Graph Theory. Simple Graph G = (V, E). V={a,b,c,d,e,f,g,h,k} E={(a,b),(a,g),( a,h),(a,k),(b,c),(b,k),...,(h,k)}

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Algebra 2 Semester 1 Practice Final

CIT 596 Theory of Computation 1. Graphs and Digraphs

@#? Text Search ] { "!" Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata

KENDRIYA VIDYALAYA IIT KANPUR HOME ASSIGNMENTS FOR SUMMER VACATIONS CLASS - XII MATHEMATICS (Relations and Functions & Binary Operations)

Hyers-Ulam stability of Pielou logistic difference equation

12.4 Similarity in Right Triangles

Symmetrical Components 1

Pre-Lie algebras, rooted trees and related algebraic structures

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

A Non-parametric Approach in Testing Higher Order Interactions

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

Subsequence Automata with Default Transitions

Logic Synthesis and Verification

Transcription:

Periodi string omprison Alexnder Tiskin Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin Alexnder Tiskin (Wrwik) Periodi string omprison 1 / 51

1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 2 / 51

1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 3 / 51

Introdution String mthing: finding n ext pttern in string String omprison: finding similr ptterns in two strings (Also known s pproximte string mthing, no reltion to pproximtion lgorithms!) Applitions: omputtionl iology, imge reognition,... Alexnder Tiskin (Wrwik) Periodi string omprison 4 / 51

Introdution String mthing: finding n ext pttern in string String omprison: finding similr ptterns in two strings (Also known s pproximte string mthing, no reltion to pproximtion lgorithms!) Applitions: omputtionl iology, imge reognition,... Stndrd types of string omprison: glol: whole string vs whole string lol: sustrings vs sustrings Min fous of this work: semi-lol: whole string vs sustrings; prefixes vs suffixes Min tool: impliit unit-monge mtries Alexnder Tiskin (Wrwik) Periodi string omprison 4 / 51

Introdution Terminology nd nottion Integers:... 2, 1, 0, 1, 2,... Odd hlf-integers:... 5 2, 3 2, 1 2, 1 2, 3 2, 5 2,... We onsider finite nd infinite integer mtries over integer nd odd hlf-integer indies. For simpliity, index rnge will usully e ignored. A permuttion mtrix is 0/1 mtrix with extly one nonzero per row nd per olumn Alexnder Tiskin (Wrwik) Periodi string omprison 5 / 51

Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51

Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Given mtrix E, its density mtrix is E (i, j) = E(i, j + ) E(i, j ) E(i +, j + ) + E(i +, j ) where i ± = i ± 1 2 ; DΣ, E over integers; D, E over odd hlf-integers Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51

Introdution Terminology nd nottion Given mtrix D, its distriution mtrix is D Σ (i, j) = i >i,j <j D(i, j ) In other words, D Σ (i, j) is the sum of ll D(i, j ), where (i, j ) is dominted y (i, j) Given mtrix E, its density mtrix is E (i, j) = E(i, j + ) E(i, j ) E(i +, j + ) + E(i +, j ) where i ± = i ± 1 2 ; DΣ, E over integers; D, E over odd hlf-integers (D Σ ) = D for ll D Mtrix E is simple, if (E ) Σ = E Alexnder Tiskin (Wrwik) Periodi string omprison 6 / 51

Introdution Terminology nd nottion Mtrix E is Monge, if E is nonnegtive Intuition: order-to-order distnes in (weighted) plnr grph Mtrix E is unit-monge, if E is permuttion mtrix Intuition: order-to-order distnes in grid-like grph Alexnder Tiskin (Wrwik) Periodi string omprison 7 / 51

Introdution Terminology nd nottion P P Σ (P Σ ) = P Alexnder Tiskin (Wrwik) Periodi string omprison 8 / 51

Introdution Impliit unit-monge mtries Impliit P Σ : rnge tree on nonzeros of P [Bentley: 1980] inry serh tree y i-oordinte under every node, inry serh tree y j-oordinte Alexnder Tiskin (Wrwik) Periodi string omprison 9 / 51

Introdution Impliit unit-monge mtries Impliit P Σ (ontd.) Every node of the rnge tree represents nonil rnge (retngulr region), nd stores its nonzero ount Overll, n log n nonil rnges re non-empty Rnge tree supports dominne ounting queries: how mny nonzeros re dominted y given point? Answered y deomposing query rnge into log 2 n disjoint nonil rnges. Totl size O(n log n), query time O(log 2 n) There re symptotilly more effiient (ut less prtil) dt strutures Alexnder Tiskin (Wrwik) Periodi string omprison 10 / 51

Introdution Mtrix -multiplition Mtrix -multiplition (.k.. distne, (min, +) or tropil multiplition) A B = C C(i, k) = min j ( A(i, j) + B(j, k) ) Alexnder Tiskin (Wrwik) Periodi string omprison 11 / 51

Introdution Mtrix -multiplition Mtrix -multiplition (.k.. distne, (min, +) or tropil multiplition) A B = C C(i, k) = min j ( A(i, j) + B(j, k) ) Mtrix lsses losed under -multiplition: generl numeril (integer, rel) mtries Monge mtries simple unit-monge mtries Simple unit-monge mtries of size n form n periodi monoid (i.e. monoid s fr s possile from group) under -multiplition We ll it the seweed monoid T n Alexnder Tiskin (Wrwik) Periodi string omprison 11 / 51

Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51

Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51

Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51

Introdution Mtrix -multiplition Simple unit-monge mtries: -multiplition = seweed omposition P A P B P C P Σ A PΣ B = PΣ C P A P C P B Alexnder Tiskin (Wrwik) Periodi string omprison 12 / 51

Introdution Mtrix -multiplition Seweeds: similr to rids, generted y strnd rossings Unlike in rids, ll seweed rossings re level (not underpss/overpss) idempotent, i.e. two seweeds n ross t most one Seweed omposition: ssoitive, no inverse ( rossing nnot e nelled) Identity: 1 x = x, no seweeds rossing Zero: 0 x = 0, ll seweeds rossing ) Σ 1 = 0 = ( ( ) Σ Alexnder Tiskin (Wrwik) Periodi string omprison 13 / 51

Introdution Mtrix -multiplition The seweed monoid T n : n! elements (permuttions of size n) n 1 genertors g 1, g 2,..., g n 1 (elementry rossings) g 2 i = g i for ll i (idempotene) g i g j = g j g i j i > 1 (fr ommuttivity) g i g j g i = g j g i g j j i = 1 (rid reltions) Computtion: onfluent rewriting system n e otined y softwre (Semigroupe, GAP) Generlistion: Coxeter monoids (sugroup monoids in groups) [Tsrnov: 90] Alexnder Tiskin (Wrwik) Periodi string omprison 14 / 51

Introdution Mtrix -multiplition The seweed monoid T 3 Genertors: 1, = g 1, = g 2 Other elements:,, = 0 Rewriting system: 0 0 Alexnder Tiskin (Wrwik) Periodi string omprison 15 / 51

Introdution Mtrix -multiplition The seweed monoid T 4 Genertors: 1, = g 1, = g 2, = g 3 Other elements:,,,,,,,,,,,,,,,,,,, = 0 Rewriting system: 0 Alexnder Tiskin (Wrwik) Periodi string omprison 16 / 51

Introdution Mtrix -multiplition The impliit mtrix -multiplition prolem Given permuttion mtries P A, P B, ompute P C, suh tht P Σ A PΣ B = PΣ C Alexnder Tiskin (Wrwik) Periodi string omprison 17 / 51

Introdution Mtrix -multiplition The impliit mtrix -multiplition prolem Given permuttion mtries P A, P B, ompute P C, suh tht P Σ A PΣ B = PΣ C Mtrix -multiplition: running time mtrix type time generl O(n 3 ) stndrd Monge O(n 2 ) y [Aggrwl+: 1987] impliit simple unit-monge (P Σ ) O(n 1.5 ) [T: 2006] O(n log n) [T: NEW] Alexnder Tiskin (Wrwik) Periodi string omprison 17 / 51

Introdution Mtrix -multiplition P B P A P C? Alexnder Tiskin (Wrwik) Periodi string omprison 18 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C,lo + P C,hi Alexnder Tiskin (Wrwik) Periodi string omprison 19 / 51

Introdution Mtrix -multiplition Impliit mtrix -multiplition: the lgorithm PC Σ(i, k) = min ( j P Σ A (i, j) + PB Σ (j, k)) Divide-nd-onquer on the rnge of j Divide P A horizontlly, P B vertilly; two suprolems of effetive size n/2: P Σ A,lo PΣ B,lo = PΣ C,lo P Σ A,hi PΣ B,hi = P Σ C,hi Conquer: most (ut not ll!) nonzeros of P C,lo, P C,hi pper in P C Missing nonzeros n e otined in time O(n) using the Monge property Overll time O(n log n) Alexnder Tiskin (Wrwik) Periodi string omprison 20 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C,lo + P C,hi Alexnder Tiskin (Wrwik) Periodi string omprison 21 / 51

Introdution Mtrix -multiplition P B,lo, P B,hi P A,lo, P A,hi P C Alexnder Tiskin (Wrwik) Periodi string omprison 21 / 51

1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 22 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne Consider strings (= sequenes) over n lphet of size σ Distinguish ontiguous sustrings nd not neessrily ontiguous susequenes Speil ses of sustring: prefix, suffix Nottion: strings, of length m, n respetively Assume where neessry: m n; m, n resonly lose Alexnder Tiskin (Wrwik) Periodi string omprison 23 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne Consider strings (= sequenes) over n lphet of size σ Distinguish ontiguous sustrings nd not neessrily ontiguous susequenes Speil ses of sustring: prefix, suffix Nottion: strings, of length m, n respetively Assume where neessry: m n; m, n resonly lose The longest ommon susequene (LCS) sore: length of longest string tht is susequene of oth nd equivlently, lignment sore, where sore(mth) = 1 nd sore(mismth) = 0 Alexnder Tiskin (Wrwik) Periodi string omprison 23 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem Give the LCS sore for vs Alexnder Tiskin (Wrwik) Periodi string omprison 24 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem Give the LCS sore for vs LCS: running time O(mn) [Wgner, Fisher: 1974] O ( ) mn log n O ( mn(log log n) 2 ) log n σ = O(1) [Msek, Pterson: 1980] [Crohemore+: 2003] [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 24 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne LCS on the lignment grph (direted, yli) lue = 0 red = 1 LCS("", "") = "" LCS = highest-sore orner-to-orner pth Alexnder Tiskin (Wrwik) Periodi string omprison 25 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne LCS: dynmi progrmming (DP) lgorithm [Wgner, Fisher: 1974] Sweep lignment grph, respeting node dependenies Running time O(mn) Alexnder Tiskin (Wrwik) Periodi string omprison 26 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne LCS: dynmi progrmming (DP) lgorithm [Wgner, Fisher: 1974] Sweep lignment grph, respeting node dependenies Running time O(mn) LCS: miro-lok DP lgorithm [Msek, Pterson: 1980] Sweep lignment grph in squre loks, respeting lok dependenies Blok size: t = O(log n) Blok interfe: O(t) inputs/outputs, eh of size O(log σ) Use preomputed mpping of ll possile input/output omintions Running time O ( mn log n) when σ = O(1), even on log-ost RAM Alexnder Tiskin (Wrwik) Periodi string omprison 26 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS The three-wy semi-lol LCS prolem Give the (impliit) mtrix of O(n 2 ) LCS sores: string-sustring, prefix-suffix, suffix-prefix LCS no sustring-string LCS Suitle for m n Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The semi-lol LCS prolem Give the (impliit) mtrix of O(m 2 + n 2 ) LCS sores: string-sustring LCS: string vs every sustring of prefix-suffix LCS: every prefix of vs every suffix of symmetrilly, sustring-string nd suffix-prefix LCS The three-wy semi-lol LCS prolem Give the (impliit) mtrix of O(n 2 ) LCS sores: string-sustring, prefix-suffix, suffix-prefix LCS no sustring-string LCS Suitle for m n Cf.: dynmi progrmming gives prefix-prefix LCS Alexnder Tiskin (Wrwik) Periodi string omprison 27 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne Semi-lol LCS on the lignment grph lue = 0 red = 1 LCS("", "...") = "" Semi-lol LCS = ll highest-sore order-to-order pths (string-sustring = top-to-ottom, et.) Alexnder Tiskin (Wrwik) Periodi string omprison 28 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem is speil se of the lignment sore prolem with weighted mthes, mismthes nd gps LCS sore: w mth = 1, w mismth = w gp = 0 Levenshtein sore: w mth = 2, w mismth = 1, w gp = 0 An lignment sore is rtionl, if w mth, w mismth, w gp re rtionl; redues to LCS sore y onstnt-ftor low-up of lignment grph Alexnder Tiskin (Wrwik) Periodi string omprison 29 / 51

Semi-lol string omprison Semi-lol LCS nd edit distne The LCS prolem is speil se of the lignment sore prolem with weighted mthes, mismthes nd gps LCS sore: w mth = 1, w mismth = w gp = 0 Levenshtein sore: w mth = 2, w mismth = 1, w gp = 0 An lignment sore is rtionl, if w mth, w mismth, w gp re rtionl; redues to LCS sore y onstnt-ftor low-up of lignment grph The semi-lol lignment sore prolem: string-sustring, prefix-suffix, sustring-string, suffix-prefix lignment sores Edit distne: minimum ost to trnsform into y weighted hrter edits (insertion, deletion, sustitution) The semi-lol edit distne prolem: semi-lol lignment sore prolem with w mth = 0, w mismth = w su, w gp = w indel Alexnder Tiskin (Wrwik) Periodi string omprison 29 / 51

Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" A(0, 13) = LCS(, ) = 8 = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j Alexnder Tiskin (Wrwik) Periodi string omprison 30 / 51

Semi-lol string omprison Highest-sore mtries Semi-lol LCS: output representtion nd running time size query time O(n 2 ) O(1) trivil O(m 1/2 n) O(log n) string-sustring [Alves+: 2003] O(n) O(n) string-sustring [Alves+: 2005] O(n log n) O(log 2 n) [T: 2006] running time O(mn 2 ) nive O(mn) string-sustring [Shmidt: 1998] string-sustring [Alves+: 2005] O(mn) [T: 2006] O ( ) mn [T: 2006] log 0.5 n O ( mn(log log n) 2 ) log n [T: 2007] Alexnder Tiskin (Wrwik) Periodi string omprison 31 / 51

Semi-lol string omprison Highest-sore mtries A: the semi-lol LCS sore mtrix for vs A(i, j): the numer of mthed hrters for vs sustring of Q(i, j) = j i A(i, j): the numer of unmthed hrters Properties of mtrix Q: Q is simple unit-monge therefore, Q = P Σ for some permuttion mtrix P P = Q = A is n impliit representtion of A Rnge tree for P: memory O(n log n), query time O(log 2 n) Alexnder Tiskin (Wrwik) Periodi string omprison 32 / 51

Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51

Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j lue: differene 0 red: differene 1 Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51

Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix 0 1 2 3 4 5 6 6 7 8 8 8 8 8 1 0 1 2 3 4 5 5 6 7 7 7 7 7 2 1 0 1 2 3 4 4 5 6 6 6 6 7 3 2 1 0 1 2 3 3 4 5 5 6 6 7 4 3 2 1 0 1 2 2 3 4 4 5 5 6 5 4 3 2 1 0 1 2 3 4 4 5 5 6 6 5 4 3 2 1 0 1 2 3 3 4 4 5 7 6 5 4 3 2 1 0 1 2 2 3 3 4 8 7 6 5 4 3 2 1 0 1 2 3 3 4 9 8 7 6 5 4 3 2 1 0 1 2 3 4 10 9 8 7 6 5 4 3 2 1 0 1 2 3 11 10 9 8 7 6 5 4 3 2 1 0 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 = "" = "" = "..." A(4, 11) = LCS(, ) = 5 A(i, j) = j i if i > j lue: differene 0 red: differene 1 green: P(i, j) = 1 A(i, j) = j i P Σ (i, j) Alexnder Tiskin (Wrwik) Periodi string omprison 33 / 51

Semi-lol string omprison Highest-sore mtries The semi-lol LCS sore mtrix = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A Alexnder Tiskin (Wrwik) Periodi string omprison 34 / 51

Semi-lol string omprison Highest-sore mtries The seweeds in the lignment grph = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A P(i, j) = 1 orresponds to seweed (top, i) (ottom, j) Alexnder Tiskin (Wrwik) Periodi string omprison 35 / 51

Semi-lol string omprison Highest-sore mtries The seweeds in the lignment grph = "" = "" = "..." A(4, 11) = LCS(, ) = 11 4 P Σ (i, j) = 11 4 2 = 5 P gives n impliit representtion of A P(i, j) = 1 orresponds to seweed (top, i) (ottom, j) Also define top right, left right, left ottom seweeds Gives omplete order-to-order grph-theoreti mthing Alexnder Tiskin (Wrwik) Periodi string omprison 35 / 51

1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 36 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 37 / 51

The seweed lgorithm The seweed lgorithm Semi-lol LCS: the seweed lgorithm [T: 2006] Iterte over lignment grph, tring seweeds Pik ells in ny order, respeting dependenies In every ell, the two entering seweeds ross, if mismth nd they hve not rossed efore end otherwise Running time O(mn) Alexnder Tiskin (Wrwik) Periodi string omprison 38 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 39 / 51

The seweed lgorithm The miro-lok seweed lgorithm Semi-lol LCS: the miro-lok seweed lgorithm [T: 2007] Iterte over lignment grph in loks, tring seweeds Use preomputed mpping of ll possile lok inputs to outputs Blok size: t = O ( log n ) log log n Blok interfe: O(t) vlues (input hrs nd seweeds), eh of size O(log n) ut n e ompressed to O(log log n) y reursive sheme Running time O ( m t n t t log log n) = O ( mn(log log n) 2 ) log n, even on log-ost RAM Alexnder Tiskin (Wrwik) Periodi string omprison 40 / 51

The seweed lgorithm Cyli LCS The yli LCS prolem Give the mximum LCS sore for vs ll yli rottions of Alexnder Tiskin (Wrwik) Periodi string omprison 41 / 51

The seweed lgorithm Cyli LCS The yli LCS prolem Give the mximum LCS sore for vs ll yli rottions of Cyli LCS: running time O ( ) mn 2 log n nive O(mn log m) [Mes: 1990] O(mn) [Bunke, Bühler: 1993; Lndu+: 1998; Shmidt: 1998] O ( mn(log log n) 2 ) log n [T: 2007] Cyli LCS: the lgorithm Run the miro-lok seweed lgorithm on vs, time O ( mn(log log n) 2 ) log n Mke n string-sustring LCS queries, time negligile Alexnder Tiskin (Wrwik) Periodi string omprison 41 / 51

The seweed lgorithm Longest repeting susequene The longest repeting susequene prolem Find the longest susequene of tht is squre ( repetition of two identil strings) Motivted y tndem repets in genome Alexnder Tiskin (Wrwik) Periodi string omprison 42 / 51

The seweed lgorithm Longest repeting susequene The longest repeting susequene prolem Find the longest susequene of tht is squre ( repetition of two identil strings) Motivted y tndem repets in genome Longest repeting susequene: running time O(n 3 ) nive O(n 2 ) [Kosowski: 2004] O ( n 2 (log log n) 2 ) log n [T: 2007] Longest repeting susequene: the lgorithm Run the miro-lok seweed lgorithm on vs, time O ( mn(log log n) 2 ) log n Mke n 1 suffix-prefix LCS queries, time negligile Alexnder Tiskin (Wrwik) Periodi string omprison 42 / 51

The seweed lgorithm Approximte mthing The pproximte pttern mthing prolem Give the sustring losest to y lignment sore, strting t eh position in Assume rtionl lignment sore Approximte pttern mthing: running time O(mn) [Sellers: 1980] O ( ) mn log n σ = O(1) vi [Msek, Pterson: 1980] O ( mn(log log n) 2 ) log n vi [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 43 / 51

The seweed lgorithm Approximte mthing Approximte pttern mthing: the lgorithm Run the miro-lok seweed lgorithm on vs under given lignment sore in time O ( mn(log log n) 2 ) log n The impliit semi-lol edit sore mtrix: n nti-monge mtrix pproximte pttern mthing row minim Row minim in O(n) element queries [Aggrwl+: 1987] Eh query in time O(log 2 n) using the rnge tree representtion, omined query time negligile Overll running time dominted y lok seweed lgorithm, sme s [Pterson, Dnik: 1994] Alexnder Tiskin (Wrwik) Periodi string omprison 44 / 51

The seweed lgorithm The periodi seweed lgorithm The periodi string-sustring LCS prolem Give (impliit) LCS sores for vs eh sustring of =... uuu... = u ± Let u e of length p; my ssume tht every hrter of ours in u The tndem LCS prolem Give LCS sore for vs = u k We hve n = kp; my ssume k m (otherwise LCS sore is m) Tndem LCS: running time O(mkp) nive O(m(k + p)) [Lndu, Ziv-Ukelson: 2001] O(mp) [NEW] Alexnder Tiskin (Wrwik) Periodi string omprison 45 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Alexnder Tiskin (Wrwik) Periodi string omprison 46 / 51

The seweed lgorithm The periodi seweed lgorithm Periodi string-sustring LCS: The periodi seweed lgorithm Iterte over lignment grph, tring seweeds row-y-row In every row, strt from mth ell nd move rightwrds, wrpping round t the grph edge In every ell, the two entering seweeds ross, if mismth nd they hve not rossed efore end otherwise t right edge of the grph, wrp round k to left edge Running time O(mn) Querying string-sustring LCS sore (inluding tndem LCS): ount eh nonzero dominted y query point with pproprite multipliity, either diretly or vi rnge tree Alexnder Tiskin (Wrwik) Periodi string omprison 47 / 51

The seweed lgorithm The periodi seweed lgorithm The tndem lignment prolem Give the sustring losest to y lignment sore mong ertin sustrings of = u ± : glol: sustrings of the form k ross ll k yli: sustrings of length kp ross ll k lol: sustrings of ny length Tndem lignment: running time O(m 2 p) ll nive O(mp) glol [Myers, Miller: 1989] O(mp log p) yli [Benson: 2005] O(mp) yli [NEW] O(mp) lol [Myers, Miller: 1989] Alexnder Tiskin (Wrwik) Periodi string omprison 48 / 51

The seweed lgorithm The periodi seweed lgorithm Cyli tndem lignment: the lgorithm Run periodi seweed lgorithm (under given lignment sore), time O(np) For eh k [1 : m]: solve tndem LCS (under given lignment sore) for ginst k otin p suessive string-sustring lignment sores y inrementl sore updting, eh in time O(1) Running time O(mp) Alexnder Tiskin (Wrwik) Periodi string omprison 49 / 51

1 Introdution 2 Semi-lol string omprison 3 The seweed lgorithm 4 Conlusions nd future work Alexnder Tiskin (Wrwik) Periodi string omprison 50 / 51

Conlusions nd future work Semi-lol LCS prolem: representtion y impliit unit-monge mtries generlistion to rtionl lignment sores open: rel lignment sores? The seweed nd miro-lok seweed lgorithms: simple lgorithm for semi-lol LCS semi-lol LCS in time o(mn) vi miro-loks improvements on relted prolems The periodi seweed lgorithm: strightforwrd extension of the seweed lgorithm periodi semi-lol LCS in time O(mp) nturl pplitions open: o(mp) vi miro-loks? Alexnder Tiskin (Wrwik) Periodi string omprison 51 / 51