Ukkonen's suffix tree construction algorithm

Size: px
Start display at page:

Download "Ukkonen's suffix tree construction algorithm"

Transcription

1 Ukkonen's suffix tree construction algorithm aba$ $ab aba$ $ab a ba $ 3 $ $ab a ba $ $ $ String Algorithms; Nov

2 Motivation Yet another suffix tree construction algorithm... Why?

3 Motivation Yet another suffix tree construction algorithm... Why? An online algorithm, i.e. one we can update with new sequences

4 Sketch of algorithm Recall McCreight s approach: For i = 1.. n+1, build compressed trie of {x[..n]$ i} Ukkonen s approach: For i = 1.. n+1, build compressed trie of {$ i} Compressed trie of all suffixes of prefix x[1..i]$ of x$ A suffix tree except for leaf property

5 McCreight's algorithm x=aba T 1 : {x$[..n] 1 } aba$ 1 T 2 : {x$[..n] 2 } T 3 : {x$[..n] 3 } $a b 2 2 $ab aba$ a ba $ 3 1 $ 1 T 4 : {x$[..n] 4 } 2 $ab a ba $ $ 4 $ 3 1

6 Ukkonen's algorithm x=aba T 1 : {x$[..1]} a 1 T 2 : {x$[..2]} b ab 2 1 T 3 : {x$[..3]} 2 aba ba 1 Note: no node for x[3..3] = a T 4 : {x$[..n]} 2 $ab $ 4 a ba$ $ 1 3

7 Tasks in iteration i In iteration i we must Update each to x[..i+1] Add string (special case of above) Leaf: Inner node: Edge:

8 First attempt... Obvious algorithm: For i=1,...,n+1: for =1,...,i: find append Running time O(n 3 ) Need lots of tricks to get O(n)!

9 Updating leaves for free If we label leaves with (k, ) denoting k to the current i, updating a leaf is automatic Leaf: Inner node: Edge:

10 Updating existing strings is free If x[..i+1] is already in the tree, the update is automatic Leaf: Inner node: Edge:

11 Real operations If we can recognize the free operations, we need only deal with the remaining Leaf: Inner node: Edge:

12 Lemma Let denote suffix of x[1..i] a)if >1 is a leaf node in T i, then so is -1 b)if, from <i, there is a path in T i that begins with a, then there is a path in T i from +1 beginning with a

13 Lemma Let denote suffix of x[1..i] a)if >1 is a leaf node in T i, then so is -1 b)if, Once from an <i, edge, there always is a path an in edge T i that begins with a, then there is a path in T i from +1 beginning with a

14 Lemma Let denote suffix of x[1..i] a)if Once >1 is a leaf, node always in been T i, then a leaf so is -1 b)if, from <i, there is a path in T i that begins with a, then there is a path in T i from +1 beginning with a

15 Proof of lemma a) If >1 is a leaf node in T i, then so is -1 Assume -1 is not a leaf. Then there exists k<-1 such that: x[k..i] x[-1..i] x[k..i] x[-1..i] -1 k Then -1 k k+1 thus: x[k+1..i] x[k+1..i]

16 Proof of lemma b) If, from <i, there is a path in T i that begins with a, then there is a path in T i from +1 beginning with a Assume is followed by a, then there exists k< such that: k a thus: +1 k+1 a Hence +1 is followed by a.

17 Corollary In iteration i, there exist indices L and R such that: All suffixes L are leaves All suffixes R are already in the trie I II III L R

18 Consequence of corollary I and III are free operations I II III L R

19 Updated algorithm Implicitly handling free operations: For i=1,...,n+1: for = L,..., R : find append L in iteration i is the last leaf inserted in iteration i-1 (all smaller indices are already leaves) R in iteration i is the first index where x[..i+1] is already in the trie (all larger indices are already in the trie)

20 Suffixes in II are made into leaves Whenever L < < R, is made a leaf:

21 Suffixes in II are made into leaves Whenever L < < R, is made a leaf: Once is a leaf, it will be in I and never in II again

22 Time to go from II to I We handle in II or implicitly in III time 2n: 2 2 Index in II n+1 Path length is 2n Iterations n+1

23 Updated running time Only 2n of these: For i=1,...,n+1: for = L,..., R : find append Constant time Running time is 2n*T(find )

24 Updated running time Only 2n of these: For i=1,...,n+1: for = L,..., R : find append Constant time? Constant time Running time is 2n*T(find ) We ust have to deal with T(find ) in O(1) No worries!

25 Using fastscan and s(-) When searching for, it is already in the trie We can use fastscan for the search T(find ) in O(d) where d is the (node-)depth of If we keep suffix links, s(-), in the tree we can use these as shortcuts

26 Suffix links Invariant: All inner nodes have suffix links

27 Suffix links Invariant: All inner nodes have suffix links Ensuring the invariant: We only insert inner nodes when adding leaves Whenever we insert a new node, for some <i, we also find or insert x[+1..i], and can update s() := x[+1..i] If we insert x[i..i], then s(x[i..i]) :=

28 Finding x[+1..i] from parent() s s(parent()) x[+1..i] Starting from here (initial is L and we can keep a pointer to that node between iterations) Using fastscan here

29 Bound on fastscan Time usage by fastscan is bounded by n for the maximal (node-)depth in the trie plus total decrease of (node-)depth Decrease in depth: Moving to parent(): 1 Moving to s(parent()): max 1 Restarting at L :?

30 Bound on fastscan Time usage by fastscan is bounded by n for the maximal (node-)depth in the trie plus total decrease of (node-)depth Decrease in depth: Moving to parent(): 1 Moving to s(parent()): max 1 Restarting at L :? parent() s s(parent()) x[+1..i] Using fastscan here

31 Hacking the suffix link When searching for x[ L +1..i], update s(x[ L..i]) to point to the nearest ancestor of x[ L +1..i] parent( L ) s s(parent( L )) L x[ L +1..i] Nearest ancestor of x[ L +1..i]

32 Hacking the suffix link When searching for x[ L +1..i], update s(x[ L..i]) to point to the nearest ancestor of x[ L +1..i] parent( L ) s s(parent( L )) L x[ L +1..i] Restart in constant time Nearest ancestor of x[ L +1..i]

33 Iterations Searching time Vertical steps are paid for by the previous horizontal step (free restarting) Horizontal steps are total fastscan bounded by O(n) Runtime O(n) Index in II 2 2 n+1 n+1

34 Summary We have seen a new suffix tree construction algorithm Ukkonen s algorithm is an online algorithm: As long as no suffix is a prefix of another, the intermediate trees are suffix trees Generalized suffix trees can be built one string at a time

McCreight's suffix tree construction algorithm

McCreight's suffix tree construction algorithm McCreight's suffix tree construction algorithm b 2 $baa $ 5 $ $ba 6 3 a b 4 $ aab$ 1 Motivation Recall: the suffix tree is an extremely useful data structure with space usage and construction time in O(n).

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information

Space-Efficient Construction Algorithm for Circular Suffix Tree

Space-Efficient Construction Algorithm for Circular Suffix Tree Space-Efficient Construction Algorithm for Circular Suffix Tree Wing-Kai Hon, Tsung-Han Ku, Rahul Shah and Sharma Thankachan CPM2013 1 Outline Preliminaries and Motivation Circular Suffix Tree Our Indexes

More information

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available

More information

Converting SLP to LZ78 in almost Linear Time

Converting SLP to LZ78 in almost Linear Time CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP

More information

Analysis of tree edit distance algorithms

Analysis of tree edit distance algorithms Analysis of tree edit distance algorithms Serge Dulucq 1 and Hélène Touzet 1 LaBRI - Universit Bordeaux I 33 0 Talence cedex, France Serge.Dulucq@labri.fr LIFL - Universit Lille 1 9 6 Villeneuve d Ascq

More information

CS4311 Design and Analysis of Algorithms. Analysis of Union-By-Rank with Path Compression

CS4311 Design and Analysis of Algorithms. Analysis of Union-By-Rank with Path Compression CS4311 Design and Analysis of Algorithms Tutorial: Analysis of Union-By-Rank with Path Compression 1 About this lecture Introduce an Ackermann-like function which grows even faster than Ackermann Use it

More information

String Indexing for Patterns with Wildcards

String Indexing for Patterns with Wildcards MASTER S THESIS String Indexing for Patterns with Wildcards Hjalte Wedel Vildhøj and Søren Vind Technical University of Denmark August 8, 2011 Abstract We consider the problem of indexing a string t of

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

Integer Sorting on the word-ram

Integer Sorting on the word-ram Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words

More information

Online Sorted Range Reporting and Approximating the Mode

Online Sorted Range Reporting and Approximating the Mode Online Sorted Range Reporting and Approximating the Mode Mark Greve Progress Report Department of Computer Science Aarhus University Denmark January 4, 2010 Supervisor: Gerth Stølting Brodal Online Sorted

More information

Computing Connected Components Given a graph G = (V; E) compute the connected components of G. Connected-Components(G) 1 for each vertex v 2 V [G] 2 d

Computing Connected Components Given a graph G = (V; E) compute the connected components of G. Connected-Components(G) 1 for each vertex v 2 V [G] 2 d Data Structures for Disjoint Sets Maintain a Dynamic collection of disjoint sets. Each set has a unique representative (an arbitrary member of the set). x. Make-Set(x) - Create a new set with one member

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

Outline. Complexity Theory. Example. Sketch of a log-space TM for palindromes. Log-space computations. Example VU , SS 2018

Outline. Complexity Theory. Example. Sketch of a log-space TM for palindromes. Log-space computations. Example VU , SS 2018 Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 3. Logarithmic Space Reinhard Pichler Institute of Logic and Computation DBAI Group TU Wien 3. Logarithmic Space 3.1 Computational

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Disjoint-Set Forests

Disjoint-Set Forests Disjoint-Set Forests Thanks for Showing Up! Outline for Today Incremental Connectivity Maintaining connectivity as edges are added to a graph. Disjoint-Set Forests A simple data structure for incremental

More information

CS 161: Design and Analysis of Algorithms

CS 161: Design and Analysis of Algorithms CS 161: Design and Analysis of Algorithms Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling Disjoint Sets, Continued Each

More information

Asymptotic redundancy and prolixity

Asymptotic redundancy and prolixity Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends

More information

Compressed Index for Dynamic Text

Compressed Index for Dynamic Text Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution

More information

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 19, Lecture 10 COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Cardinality Networks: a Theoretical and Empirical Study

Cardinality Networks: a Theoretical and Empirical Study Constraints manuscript No. (will be inserted by the editor) Cardinality Networks: a Theoretical and Empirical Study Roberto Asín, Robert Nieuwenhuis, Albert Oliveras, Enric Rodríguez-Carbonell Received:

More information

Monday, April 29, 13. Tandem repeats

Monday, April 29, 13. Tandem repeats Tandem repeats Tandem repeats A string w is a tandem repeat (or square) if: A tandem repeat αα is primitive if and only if α is primitive A string w = α k for k = 3,4,... is (by some) called a tandem array

More information

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a

More information

arxiv: v2 [cs.ds] 8 Apr 2016

arxiv: v2 [cs.ds] 8 Apr 2016 Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl

More information

25. Minimum Spanning Trees

25. Minimum Spanning Trees 695 25. Minimum Spanning Trees Motivation, Greedy, Algorithm Kruskal, General Rules, ADT Union-Find, Algorithm Jarnik, Prim, Dijkstra, Fibonacci Heaps [Ottman/Widmayer, Kap. 9.6, 6.2, 6.1, Cormen et al,

More information

25. Minimum Spanning Trees

25. Minimum Spanning Trees Problem Given: Undirected, weighted, connected graph G = (V, E, c). 5. Minimum Spanning Trees Motivation, Greedy, Algorithm Kruskal, General Rules, ADT Union-Find, Algorithm Jarnik, Prim, Dijkstra, Fibonacci

More information

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to (To appear in ALGORITHMICA) On{line construction of sux trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN{00014 University of Helsinki,

More information

Dynamic Ordered Sets with Exponential Search Trees

Dynamic Ordered Sets with Exponential Search Trees Dynamic Ordered Sets with Exponential Search Trees Arne Andersson Computing Science Department Information Technology, Uppsala University Box 311, SE - 751 05 Uppsala, Sweden arnea@csd.uu.se http://www.csd.uu.se/

More information

CMPT 710/407 - Complexity Theory Lecture 4: Complexity Classes, Completeness, Linear Speedup, and Hierarchy Theorems

CMPT 710/407 - Complexity Theory Lecture 4: Complexity Classes, Completeness, Linear Speedup, and Hierarchy Theorems CMPT 710/407 - Complexity Theory Lecture 4: Complexity Classes, Completeness, Linear Speedup, and Hierarchy Theorems Valentine Kabanets September 13, 2007 1 Complexity Classes Unless explicitly stated,

More information

Online Learning Summer School Copenhagen 2015 Lecture 1

Online Learning Summer School Copenhagen 2015 Lecture 1 Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture

More information

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Chapter 3 Deterministic planning

Chapter 3 Deterministic planning Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille 1, Rolf Fagerberg 2, and Inge Li Gørtz 3 1 IT University of Copenhagen. Rued Langgaards

More information

9 Union Find. 9 Union Find. List Implementation. 9 Union Find. Union Find Data Structure P: Maintains a partition of disjoint sets over elements.

9 Union Find. 9 Union Find. List Implementation. 9 Union Find. Union Find Data Structure P: Maintains a partition of disjoint sets over elements. Union Find Data Structure P: Maintains a partition of disjoint sets over elements. P. makeset(x): Given an element x, adds x to the data-structure and creates a singleton set that contains only this element.

More information

Learning Regular Sets

Learning Regular Sets Learning Regular Sets Author: Dana Angluin Presented by: M. Andreína Francisco Department of Computer Science Uppsala University February 3, 2014 Minimally Adequate Teachers A Minimally Adequate Teacher

More information

2. A tableau algorithm for ALC with TBoxes, number restrictions, and inverse roles. Extend ALC-tableau algorithm from first session with

2. A tableau algorithm for ALC with TBoxes, number restrictions, and inverse roles. Extend ALC-tableau algorithm from first session with 2. A tableau algorithm for ALC with TBoxes, number restrictions, and inverse roles Extend ALC-tableau algorithm from first session with 1 general TBoxes 2 inverse roles 3 number restrictions Goal: Design

More information

k-distinct In- and Out-Branchings in Digraphs

k-distinct In- and Out-Branchings in Digraphs k-distinct In- and Out-Branchings in Digraphs Gregory Gutin 1, Felix Reidl 2, and Magnus Wahlström 1 arxiv:1612.03607v2 [cs.ds] 21 Apr 2017 1 Royal Holloway, University of London, UK 2 North Carolina State

More information

Lecture 18 April 26, 2012

Lecture 18 April 26, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information

Foundations of

Foundations of 91.304 Foundations of (Theoretical) Computer Science Chapter 3 Lecture Notes (Section 3.2: Variants of Turing Machines) David Martin dm@cs.uml.edu With some modifications by Prof. Karen Daniels, Fall 2012

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

Small-Space Dictionary Matching (Dissertation Proposal)

Small-Space Dictionary Matching (Dissertation Proposal) Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length

More information

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application Examples of Regular Expressions 1. 0 10, L(0 10 ) = {w w contains exactly a single 1} 2. Σ 1Σ, L(Σ 1Σ ) = {w w contains at least one 1} 3. Σ 001Σ, L(Σ 001Σ ) = {w w contains the string 001 as a substring}

More information

CS1800: Mathematical Induction. Professor Kevin Gold

CS1800: Mathematical Induction. Professor Kevin Gold CS1800: Mathematical Induction Professor Kevin Gold Induction: Used to Prove Patterns Just Keep Going For an algorithm, we may want to prove that it just keeps working, no matter how big the input size

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:

More information

Exact and Approximate Equilibria for Optimal Group Network Formation

Exact and Approximate Equilibria for Optimal Group Network Formation Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich and Bugra Caskurlu Computer Science Department, RPI, 110 8th Street, Troy, NY 12180 {eanshel,caskub}@cs.rpi.edu Abstract.

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

LABELED CAUSETS IN DISCRETE QUANTUM GRAVITY

LABELED CAUSETS IN DISCRETE QUANTUM GRAVITY LABELED CAUSETS IN DISCRETE QUANTUM GRAVITY S. Gudder Department of Mathematics University of Denver Denver, Colorado 80208, U.S.A. sgudder@du.edu Abstract We point out that labeled causets have a much

More information

On-Line Construction of Suffix Trees I. E. Ukkonen z

On-Line Construction of Suffix Trees I. E. Ukkonen z .. t Algorithmica (1995) 14:249-260 Algorithmica 9 1995 Springer-Verlag New York Inc. On-Line Construction of Suffix Trees I E. Ukkonen z Abstract. An on-line algorithm is presented for constructing the

More information

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Theory of Computation

Theory of Computation Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what

More information

The null-pointers in a binary search tree are replaced by pointers to special null-vertices, that do not carry any object-data

The null-pointers in a binary search tree are replaced by pointers to special null-vertices, that do not carry any object-data Definition 1 A red black tree is a balanced binary search tree in which each internal node has two children. Each internal node has a color, such that 1. The root is black. 2. All leaf nodes are black.

More information

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized

More information

arxiv: v2 [cs.ds] 16 Mar 2015

arxiv: v2 [cs.ds] 16 Mar 2015 Longest common substrings with k mismatches Tomas Flouri 1, Emanuele Giaquinta 2, Kassian Kobert 1, and Esko Ukkonen 3 arxiv:1409.1694v2 [cs.ds] 16 Mar 2015 1 Heidelberg Institute for Theoretical Studies,

More information

Chapter 8: Recursion. March 10, 2008

Chapter 8: Recursion. March 10, 2008 Chapter 8: Recursion March 10, 2008 Outline 1 8.1 Recursively Defined Sequences 2 8.2 Solving Recurrence Relations by Iteration 3 8.4 General Recursive Definitions Recursively Defined Sequences As mentioned

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

Temporal logics and explicit-state model checking. Pierre Wolper Université de Liège

Temporal logics and explicit-state model checking. Pierre Wolper Université de Liège Temporal logics and explicit-state model checking Pierre Wolper Université de Liège 1 Topics to be covered Introducing explicit-state model checking Finite automata on infinite words Temporal Logics and

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Lecture notes for Advanced Graph Algorithms : Verification of Minimum Spanning Trees

Lecture notes for Advanced Graph Algorithms : Verification of Minimum Spanning Trees Lecture notes for Advanced Graph Algorithms : Verification of Minimum Spanning Trees Lecturer: Uri Zwick November 18, 2009 Abstract We present a deterministic linear time algorithm for the Tree Path Maxima

More information

An O(N) Semi-Predictive Universal Encoder via the BWT

An O(N) Semi-Predictive Universal Encoder via the BWT An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect

More information

Quadratic Expressions and Equations

Quadratic Expressions and Equations Unit 5 Quadratic Expressions and Equations 1/9/2017 2/8/2017 Name: By the end of this unit, you will be able to Add, subtract, and multiply polynomials Solve equations involving the products of monomials

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion : An Efficient Matching-Based Method for Error-Tolerant Autocompletion Dong Deng Guoliang Li He Wen H. V. Jagadish Jianhua Feng Department of Computer Science, Tsinghua National Laboratory for Information

More information

Jumping Sequences. Steve Butler 1. (joint work with Ron Graham and Nan Zang) University of California Los Angelese

Jumping Sequences. Steve Butler 1. (joint work with Ron Graham and Nan Zang) University of California Los Angelese Jumping Sequences Steve Butler 1 (joint work with Ron Graham and Nan Zang) 1 Department of Mathematics University of California Los Angelese www.math.ucla.edu/~butler UCSD Combinatorics Seminar 14 October

More information

Self-Stabilizing Silent Disjunction in an Anonymous Network

Self-Stabilizing Silent Disjunction in an Anonymous Network Self-Stabilizing Silent Disjunction in an Anonymous Network Ajoy K. Datta 1, Stéphane Devismes 2, and Lawrence L. Larmore 1 1 Department of Computer Science, University of Nevada Las Vegas, USA 2 VERIMAG

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 410 (2009) 2759 2766 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note Computing the longest topological

More information

Algorithms Design & Analysis. String matching

Algorithms Design & Analysis. String matching Algorithms Design & Analysis String matching Greedy algorithm Recap 2 Today s topics KM algorithm Suffix tree Approximate string matching 3 String Matching roblem Given a text string T of length n and

More information

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures Christian Scheideler SS 2018 23.04.2018 Chapter 3 1 Overview Basic data structures Search structures (successor searching) Dictionaries

More information

Hierarchical Overlap Graph

Hierarchical Overlap Graph Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,

More information

New Bounds and Extended Relations Between Prefix Arrays, Border Arrays, Undirected Graphs, and Indeterminate Strings

New Bounds and Extended Relations Between Prefix Arrays, Border Arrays, Undirected Graphs, and Indeterminate Strings New Bounds and Extended Relations Between Prefix Arrays, Border Arrays, Undirected Graphs, and Indeterminate Strings F. Blanchet-Sadri Michelle Bodnar Benjamin De Winkle 3 November 6, 4 Abstract We extend

More information

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd 4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we

More information

Notes on Learning Probabilistic Automata

Notes on Learning Probabilistic Automata Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1999 Notes on Learning Probabilistic Automata Alberto Apostolico Report Number: 99-028 Apostolico,

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5] 575 21. Dynamic Programming III FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5] Approximation 576 Let ε (0, 1) given. Let I opt an optimal selection. No try to find a valid selection

More information

A practical introduction to active automata learning

A practical introduction to active automata learning A practical introduction to active automata learning Bernhard Steffen, Falk Howar, Maik Merten TU Dortmund SFM2011 Maik Merten, learning technology 1 Overview Motivation Introduction to active automata

More information

Breadth-First Search of Graphs

Breadth-First Search of Graphs Breadth-First Search of Graphs Analysis of Algorithms Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Applications of Breadth-First Search of Graphs a) Single Source

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Notes for Comp 497 (Comp 454) Week 10 4/5/05 Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the last two chapters in Part II. Cohen presents some results concerning context-free languages (CFL) and regular languages (RL) also some decidability

More information

ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION

ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION PAOLO FERRAGINA, IGOR NITTO, AND ROSSANO VENTURINI Abstract. One of the most famous and investigated lossless data-compression schemes is the one introduced

More information

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together

More information

A Dichotomy for Regular Expression Membership Testing

A Dichotomy for Regular Expression Membership Testing 58th Annual IEEE Symposium on Foundations of Computer Science A Dichotomy for Regular Expression Membership Testing Karl Bringmann Max Planck Institute for Informatics Saarbrücken Informatics Campus Saarbrücken,

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

HENNING FERNAU Fachbereich IV, Abteilung Informatik, Universität Trier, D Trier, Germany

HENNING FERNAU Fachbereich IV, Abteilung Informatik, Universität Trier, D Trier, Germany International Journal of Foundations of Computer Science c World Scientific Publishing Company PROGRAMMMED GRAMMARS WITH RULE QUEUES HENNING FERNAU Fachbereich IV, Abteilung Informatik, Universität Trier,

More information

The Impossibility of Certain Types of Carmichael Numbers

The Impossibility of Certain Types of Carmichael Numbers The Impossibility of Certain Types of Carmichael Numbers Thomas Wright Abstract This paper proves that if a Carmichael number is composed of primes p i, then the LCM of the p i 1 s can never be of the

More information

A Tight Lower Bound for Determinization of Transition Labeled Büchi Automata

A Tight Lower Bound for Determinization of Transition Labeled Büchi Automata A Tight Lower Bound for Determinization of Transition Labeled Büchi Automata Thomas Colcombet, Konrad Zdanowski CNRS JAF28, Fontainebleau June 18, 2009 Finite Automata A finite automaton is a tuple A =

More information

On weakly intersecting pairs of sets

On weakly intersecting pairs of sets Zoltán Király 1 Zoltán L. Nagy 1 Dömötör Pálvölgyi 1,2 Mirkó Visontai 3 1 Eötvös University Budapest 2 Ecole Polytechnique Fédérale de Lausanne 3 University of Pennsylvania July 7, 2010/LPCA Outline Intersecting

More information

Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT)

Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT) Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT) Disclaimer Students attending my lectures are often astonished that I present the material in a much livelier form than in this script.

More information

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap

More information

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1

More information

INF 4130 / /8-2014

INF 4130 / /8-2014 INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT

More information