McCreight's suffix tree construction algorithm
|
|
- Bridget Daniels
- 6 years ago
- Views:
Transcription
1 McCreight's suffix tree construction algorithm b 2 $baa $ 5 $ $ba 6 3 a b 4 $ aab$ 1
2 Motivation Recall: the suffix tree is an extremely useful data structure with space usage and construction time in O(n). Today we see algorithms for constructing a suffix tree in time O(n).
3 Suffix trees A suffix tree of a sequence, x, is a compressed trie of all suffixes of the sequence x$. x=abaab 1: abaab$ 2: baab$ 3: aab$ 4: ab$ 5: b$ 6: $ 2 $baa $ 5 b $ $ba 6 3 a b 4 $ aab$ 1
4 McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, T i, where... is a trie of sequences x$[1..n+1], x$[2..n+1],..., x$[i..n+1] b a b aab$ 1 2$baa a b aab$ 1 2 $baa b $ba a b 3 aab$ 1 T 1 T 2 T 3
5 McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, T i, where... is a trie of sequences x$[1..n+1], x$[2..n+1],..., x$[i..n+1] For i=n+1, T i is the suffix tree for x. b 2 $baa $ 5 $ $ba 6 3 a b 4 $ aab$ 1
6 McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, T i, where... is a trie of sequences x$[1..n+1], x$[2..n+1],..., x$[i..n+1] For i=n+1, T i is the suffix tree for x. The essential trick is being clever in how we insert x[i..n]$ into T i so we don't spend O(n 2 ) all in all.
7 Terminology A common prefix of x and y is a string, p, that is a prefix of both: x: y: p: The longest common prefix, p=lcp(x,y), is a prefix such that: x[ p +1] y[ p +1] x: y: p: a b a b
8 LCP and LCA x[i..n]: x[j..n]: LCP: For suffixes of x, x[i..n], x[j..n], their longest common prefix is their lowest common ancestor in the suffix tree: a b a b i a b LCP(x[i..n],x[j..n]) j LCA(i,j)
9 Head and tail Let head(i) denote the longest LCP of x[i..n]$ and x[j..n]$ for all j < i Let tail(i) be the string such that x[i..n]$=head(i)tail(i) Iteration i in McCreight s algorithm consist of finding (or inserting) the node for head(i), and appending tail(i) head(i) tail(i) i
10 The Trick The trick in McCreight's algorithm is a clever way of finding head(i)
11 Lemma Let head(i) = x[i..i+h]. Then x[i+1..i+h] is a prefix of head(i+1) i i+h head(i) tail(i) head(i+1) tail(i+1) i+1 0
12 Proof Trivial for h=0 (head(i) empty), so assume h>0: Let head(i) = ay: a By def. exists j<i such that LCP(i,j)=ay Thus suffix j+1 and i+1 share prefix y Thus y is a prefix of LCP(i+1,j+1) Thus y is a prefix of head(i+1) i+1 x[i..n]: x[j..n]: a a j+1
13 Suffix link Define s(u) = if u=, and v if u=av As a pointer from x[i..k] to x[i+1..k]: x[i..k] x[i+1..k] s i i+1 (ex 5.2.3: if u is a node, so is s(u))
14 Corollary of Lemma s(head(i)) is a prefix of head(i+1) Thus: s(head(i)) is an ancestor of head(i+1) head(i) s s(head(i)) i head(i+1) i+1 s(head(i)) can be used as a shortcut!
15 Slowscan and fastscan Slowscan: if we do not know if string y is in T i, we must search character by character Fastscan: if we do know that y is in x, we can jump directly from node to node At node u at (path-)depth d, follow the edge with label starting with y[d] Continue until we reach the end of y y[1] On a node (if y is in T i ) Or on an edge (if y is a prefix of a string in T i ) y[d 2 ] y[d 1 ] y[d 3 ]
16 Sketch of McCreight's algorithm Begin with the tree T 1 : For i=1,...,n, build tree T i+1 satisfying: T i+1 is a compressed trie for x[j..n]$, j i+1 All non-terminal nodes (with the possible exception of head(i)) have a suffix link s(-) Each iteration must: Add node i+1 Potentially add head(i+1) Add tail(i+1) Add suffix link head(i) s(head(i)) 1
17 At the beginning of iteration i... parent(head(i)) u v s s(u) s(parent(head(i))) s(head(i))=w head(i) Let head(i)=uv and parent(head(i))=u and w=s(u)v=s(head(i)) By the invariant, s(parent(head(i))) and the suffix link exists; by the lemma, w is an ancestor of head(i+1)
18 Steps in iteration i... parent(head(i)) u v s s(u) s(parent(head(i))) s(head(i))=w head(i) head(i+1) Move quickly to w, then search for head(i+1) starting there
19 Observe: w is in T i head(i) is a prefix of x[j..n] for some j < i Thus w is a prefix of x[j+1..n] for some j < i i.e w is a prefix of some suffix j i i.e. w is in T i Consequently: we can search for w from s(u) using fastscan! u s(u) v s s(head(i))=w
20 If w is a node Update s(head(i)) := w Then search for head(i+1) using slowscan
21 If w is on an edge If w is not a node, then all suffix j<i with prefix w agree on the next letter By definition of head(i) there is j<i such that suffix x[i..n] and x[j..n] differs after head(i) x[i+1..n] must also disagree at that character Thus head(i+1) must be w Add node w, update head(i):=w and set head(i+1)=w
22 McCreight's algorithm Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1
23 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 abaab$ 1
24 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=1 head(1)= tail(1)=abaab$ abaab$ 1
25 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=1 2 head(2)= tail(2)=baab$ $baab abaab$ 1
26 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=2 2 head(2)= tail(2)=baab$ $baab 3 head(3)=a tail(3)=ab$ $ba a baab$ 1
27 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 2 head(3)=a tail(3)=ab$ $baab 3 $ba a baab$ 1
28 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 2 head(3)=a tail(3)=ab$ $baab 3 $ba a head(i) u v = a baab$ 1
29 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 v[2.. v ]= i=3 w 2 head(3)=a tail(3)=ab$ $baab 3 $ba a head(i) u v = a baab$ 1
30 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 w 2 head(3)=a tail(3)=ab$ $baab 3 $ba a b aab$ head(i) head(i+1) 1
31 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 w 2 head(3)=a tail(3)=ab$ $baab 3 $ba a b aab$ head(i) head(i+1) 1
32 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 w 2 head(3)=a tail(3)=ab$ $baab 3 $ba a b aab$ head(i) head(i+1) 1
33 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=3 head(3)=a tail(3)=ab$ $baab $ba a 2 3 $ head(i+1) 4 b aab$ 1
34 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=4 2 head(4)=ab tail(4)=$ $baab 3 head(i) a a $ba 4 b aab$ $ u v=b 1
35 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=4 2 w head(4)=ab tail(4)=$ s(u) $baa b 3 head(i) a a $ba 4 b aab$ $ u v=b 1
36 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=4 2 w head(i+1) head(4)=ab tail(4)=$ $baa b 3 head(i) a a $ba 4 b aab$ $ 1
37 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=4 2 w head(i+1) head(4)=ab tail(4)=$ $baa b 3 head(i) a a $ba 4 b aab$ $ 1
38 Example: x = abaab Construct tree for x[1..n] for i = 1 to n do if head(i)= then ((( slowscan(,s(tail(i head(i+1) = add i+1 and head(i+1) as node if necessary continue (( label(u,head(i u = parent(head(i)) ; v = ( fastscan(s(u),v if u then w = ([ fastscan(,v[2.. v else w = if w is an edge then add a node for w head(i+1) = w else if w is a node then (( slowscan(w,tail(i head(i+1) = add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=4 head(i+1) 2 head(4)=ab tail(4)=$ $baa b 5 3 head(i) a a $ba 4 b aab$ $ 1
39 Done Nothing new for i=5, inserting $... 2 $baa 5 b $ $ $ba 6 3 a b aab$ $ 4 1
40 Correctness Correctness follows from the invariant: At iteration i we have a trie of all suffixes j < i. After the final iteration we have a trie of all suffixes of x$, i.e. we have the suffix tree of x.
41 Time and space usage Everything but searching is constant time per suffix, so the running time is O(n + slowscan + fastscan ). We are not using more space than time, so the space usage is the same.
42 Slowscan time usage We use slowscan to find head(i+1) from w=s(head(i)), i.e. time head(i+1) - head(i) +1 for iteration i head(i) tail(i) head(i+1) tail(i+1) A telescoping sum Sum = head(n+1) - head(1) +n Equal to n since head(n+1)=head(1)= = head(i+1) - head(i) +1
43 Fastscan time usage Fastscan uses time proportional to the number of nodes it process Define d(v) as the (node-)depth of node v Fastscan increases the node depth Following parent and suffix pointers decreases the node depth Time usage of fastscan is bounded by the total depth-increase (amortized analysis)
44 Proposition d(v) d(s(v))+1 Proof: For any ancestor u of v, s(u) is an ancestor of s(v) Except for the empty prefix and the single letter prefix of v, the s(u) s are different u v s(u) s(v)
45 Corollary In each step, before calling fastscan, we decrease the depth by at most 2: d(u) = d(head(i))-1; d(w) d(u)-1 u w head(i) The total decrease is thus 2n
46 Time usage of fastscan The time usage of fastscan is bounded by n plus the total decrease of depth, i.e. the time usage of fastscan is O(n)
47 Summary We iteratively build tries of suffixes of x. Using suffix links and fastscan we can quickly find where to insert the next suffix in our current trie. By amortized analysis, the total running time becomes linear.
48 Ukkonen's suffix tree construction algorithm aba$ $ababa$ $ab 3 a ba$ $ $ $ab a ba$ $
49 Motivation Yet another suffix tree construction algorithm... Why?
50 Motivation Yet another suffix tree construction algorithm... Why? An online algorithm, i.e. one we can update with new sequences
51 Sketch of algorithm Recall McCreight s approach: For i = 1.. n+1, build compressed trie of {x[j..n]$ j i} Ukkonen s approach: For i = 1.. n+1, build compressed trie of {x$[j..i] j i} Compressed trie of all suffixes of prefix x$[1..i] of x$ A suffix tree except for leaf property
52 McCreight's algorithm x=aba T 1 : {x$[j..n] j 1 } T 2 : {x$[j..n] j 2 } 2 aba$ $ab 1 aba$ 1 T 3 : {x$[j..n] j 3 } $ab a ba$ $ T 4 : {x$[j..n] j 4 } 2 $ab $ 4 a ba$ $ 1 3
53 Ukkonen's algorithm x=aba T 1 : {x$[j..1]} a 1 T 2 : {x$[j..2]} b ab 2 1 T 3 : {x$[j..3]} ab a ba 2 1 Note: no node for x[3..3] = a T 4 : {x$[j..n]} 2 $ab $ 4 a ba$ $ 1 3
54 Tasks in iteration i In iteration i we must Update each to x[j..i+1] Add string x[i+1] (special case of above) Leaf: Inner node: Edge: j x[i+1] x[i+1] x[i+1] j x[i+1] x[i+1] j x[i+1] x[i+1] j
55 First attempt... Obvious algorithm: For i=1,...,n+1: for j=1,...,i: find append x[i+1]
56 First attempt... Obvious algorithm: For i=1,...,n+1: for j=1,...,i: find append x[i+1] Running time O(n 3 ) Need lots of tricks to get O(n)!
57 Updating leaves for free If we label leaves with (k, ) denoting k to the current i, updating a leaf is automatic Leaf: Inner node: Edge: x[i+1] x[i+1] j x[i+1] j x[i+1] x[i+1] j x[i+1] x[i+1] j
58 Updating existing strings is free If x[j..i+1] is already in the tree, the update is automatic Leaf: Inner node: Edge: x[i+1] x[i+1] j x[i+1] j x[i+1] x[i+1] j x[i+1] x[i+1] j
59 Real operations If we can recognize the free operations, we need only deal with the remaining Leaf: Inner node: Edge: x[i+1] x[i+1] j x[i+1] j x[i+1] x[i+1] j x[i+1] x[i+1] j
60 Lemma Let j denote suffix of x[1..i] a)if j>1 is a leaf node in T i, then so is j-1 b)if, from j<i, there is a path in T i that begins with a, then there is a path in T i from j+1 beginning with a
61 Lemma Let j denote suffix of x[1..i] a)if j>1 is a leaf node in T i, then so is j-1 b)if, from j<i, there is a path in T i that begins with a, then Once there an edge, is a path always in T an i from edge j+1 beginning with a
62 Lemma Let j denote suffix of x[1..i] a)if j>1 is a leaf node in T i, then so is j-1 b)if, Once from a j<i, leaf, there always is a path been in a T leaf i that begins with a, then there is a path in T i from j+1 beginning with a
63 Lemma Let j denote suffix of x[1..i] a)if j>1 is a leaf node in T i, then so is j-1 b)if, from j<i, there is a path in T i that begins with a, then there is a path in T i from j+1 beginning with a
64 Proof of lemma a) If j>1 is a leaf node in T i, then so is j-1 Assume j-1 is not a leaf. Then there exists k<j-1 such that: x[k..i] x[j-1..i] x[k..i] x[j-1..i] j-1 k Then j j-1 k k+1 thus: x[k+1..i] x[k+1..i]
65 Lemma Let j denote suffix of x[1..i] a)if j>1 is a leaf node in T i, then so is j-1 b)if, from j<i, there is a path in T i that begins with a, then there is a path in T i from j+1 beginning with a
66 Proof of lemma b) If, from j<i, there is a path in T i that begins with a, then there is a path in T i from j+1 beginning with a Assume j is followed by a, then there exists k<j such that: j k a thus: j+1 k+1 a Hence j+1 is followed by a.
67 Corollary In iteration i, there exist indices j L and j R such that: All suffixes j j L are leaves All suffixes j j R are already in the trie I II III j L j R
68 Consequence of corollary I and III are free operations x[i+1] x[i+1] j x[i+1] j x[i+1] x[i+1] j x[i+1] x[i+1] j I II III j L j R
69 Updated algorithm Implicitly handling free operations: For i=1,...,n+1: for j=j L,...,j R : find append x[i+1]
70 Updated algorithm Implicitly handling free operations: For i=1,...,n+1: for j=j L,...,j R : find append x[i+1] How do we find j L and j R?
71 Updated algorithm Implicitly handling free operations: For i=1,...,n+1: for j=j L,...,j R : find append x[i+1] j L in iteration i is the last leaf inserted in iteration i-1 (all smaller indices are already leaves) j R in iteration i is the first index where x[j..i+1] is already in the trie (all larger indices are already in the trie)
72 Suffixes in II are made into leaves Whenever j L < j < j R, j is made a leaf: x[i+1] j x[i+1] j
73 Suffixes in II are made into leaves Whenever j L < j < j R, j is made a leaf: x[i+1] j x[i+1] j Once j is a leaf, it will be in I and never in II again
74 Time to go from II to I We handle j in II or implicitly in III time 2n: 2 2 Index j in II n+1 Path length is 2n Iterations n+1
75 Updated running time Only 2n of these: For i=1,...,n+1: for j=j L,...,j R : find append x[i+1] Constant time Running time is 2n*T(find )
76 Updated running time Only 2n of these: For i=1,...,n+1: for j=j L,...,j R : find append x[i+1] Constant time? Constant time Running time is 2n*T(find ) ( O(1 We just have to deal with T(find ) in
77 What do we know about? Is it already in the tree? Will that help us?
78 Using fastscan and s(-) When searching for, it is already in the trie We can use fastscan for the search T(find ) in O(d) where d is the (node-)depth of If we keep suffix links, s(-), in the tree we can use these as shortcuts
79 Suffix links Invariant: All inner nodes have suffix links
80 Suffix links Invariant: All inner nodes have suffix links Ensuring the invariant? For i=1,...,n+1: for j=j L,...,j R : find append x[i+1]
81 Suffix links Invariant: All inner nodes have suffix links Ensuring the invariant: We only insert inner nodes when adding leaves j Whenever we insert a new node, for some j<i, we also find or insert x[j+1..i], and can update s() := x[j+1..i] If we insert x[i..i], then s(x[i..i]) := ε
82 Finding x[j+1..i] from parent(j) j s s(parent(j)) x[j+1..i] Starting from here (initial j is j L and we can keep a pointer to that node between iterations) Using fastscan here
83 Bound on fastscan Time usage by fastscan is bounded by n for the maximal (node-)depth in the trie plus total decrease of (node-)depth Decrease in depth: Moving to parent(j): 1 Moving to s(parent(j)): max 1 Restarting at j L :?
84 Bound on fastscan Time usage by fastscan is bounded by n for the maximal (node-)depth in the trie plus total decrease of (node-)depth Decrease in depth: Moving to parent(j): 1 Moving to s(parent(j)): max 1 Restarting at j L :? parent(j) j s s(parent(j)) x[j+1..i] Using fastscan here
85 Hacking the suffix link When searching for x[j L +1..i], update s(x[j L..i]) to point to the nearest ancestor of x[j L +1..i] parent(j L ) s s(parent(j L )) j L x[j L +1..i] Nearest ancestor of x[j L +1..i]
86 Hacking the suffix link When searching for x[j L +1..i], update s(x[j L..i]) to point to the nearest ancestor of x[j L +1..i] parent(j L ) s s(parent(j L )) j L x[j L +1..i] Restart in constant time Nearest ancestor of x[j L +1..i]
87 Iterations Searching time Vertical steps are paid for by the previous horizontal step (free restarting) Horizontal steps are total fastscan bounded by O(n) Runtime O(n) Index j in II 2 2 n+1 n+1
88 Summary We have seen a new suffix tree construction algorithm Ukkonen s algorithm is an online algorithm: As long as no suffix is a prefix of another, the intermediate trees are suffix trees Generalized suffix trees can be built one string at a time
Ukkonen's suffix tree construction algorithm
Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?
More informationTheoretical Computer Science
Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)
More informationSpace-Efficient Construction Algorithm for Circular Suffix Tree
Space-Efficient Construction Algorithm for Circular Suffix Tree Wing-Kai Hon, Tsung-Han Ku, Rahul Shah and Sharma Thankachan CPM2013 1 Outline Preliminaries and Motivation Circular Suffix Tree Our Indexes
More informationSUFFIX TREE. SYNONYMS Compact suffix trie
SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix
More informationConverting SLP to LZ78 in almost Linear Time
CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP
More informationHow to write proofs - II
How to write proofs - II Problem (Ullman and Hopcroft, Page 104, Exercise 4.8). Let G be the grammar Prove that S ab ba A a as baa B b bs abb L(G) = { x {a, b} x contains equal number of a s and b s }.
More information5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:
Grammar Automata and Languages Grammar Prof. Mohamed Hamada oftware Engineering Lab. The University of Aizu Japan Regular Grammar Context-free Grammar Context-sensitive Grammar Left-linear Grammar right-linear
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationInteger Sorting on the word-ram
Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words
More informationDisjoint-Set Forests
Disjoint-Set Forests Thanks for Showing Up! Outline for Today Incremental Connectivity Maintaining connectivity as edges are added to a graph. Disjoint-Set Forests A simple data structure for incremental
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationOnline Computation of Abelian Runs
Online Computation of Abelian Runs Gabriele Fici 1, Thierry Lecroq 2, Arnaud Lefebvre 2, and Élise Prieur-Gaston2 1 Dipartimento di Matematica e Informatica, Università di Palermo, Italy Gabriele.Fici@unipa.it
More informationEnhancing Active Automata Learning by a User Log Based Metric
Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor
More informationInformation Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes
Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length
More informationMining Emerging Substrings
Mining Emerging Substrings Sarah Chan Ben Kao C.L. Yip Michael Tang Department of Computer Science and Information Systems The University of Hong Kong {wyschan, kao, clyip, fmtang}@csis.hku.hk Abstract.
More informationString Indexing for Patterns with Wildcards
MASTER S THESIS String Indexing for Patterns with Wildcards Hjalte Wedel Vildhøj and Søren Vind Technical University of Denmark August 8, 2011 Abstract We consider the problem of indexing a string t of
More informationFast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200
Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available
More informationUse subword reversing to constructing examples of ordered groups.
Subword Reversing and Ordered Groups Patrick Dehornoy Laboratoire de Mathématiques Nicolas Oresme Université de Caen Use subword reversing to constructing examples of ordered groups. Abstract Subword Reversing
More informationEfficient Sequential Algorithms, Comp309
Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to
More informationSpecial Factors and Suffix and Factor Automata
Special Factors and Suffix and Factor Automata LIAFA, Paris 5 November 2010 Finite Words Let Σ be a finite alphabet, e.g. Σ = {a, n, b, c}. A word over Σ is finite concatenation of symbols of Σ, that is,
More informationTransducers for bidirectional decoding of prefix codes
Transducers for bidirectional decoding of prefix codes Laura Giambruno a,1, Sabrina Mantaci a,1 a Dipartimento di Matematica ed Applicazioni - Università di Palermo - Italy Abstract We construct a transducer
More informationCOS597D: Information Theory in Computer Science October 19, Lecture 10
COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept
More informationNotes on Learning Probabilistic Automata
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1999 Notes on Learning Probabilistic Automata Alberto Apostolico Report Number: 99-028 Apostolico,
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationDynamic Ordered Sets with Exponential Search Trees
Dynamic Ordered Sets with Exponential Search Trees Arne Andersson Computing Science Department Information Technology, Uppsala University Box 311, SE - 751 05 Uppsala, Sweden arnea@csd.uu.se http://www.csd.uu.se/
More informationChapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright Pearson-Addison Wesley. All rights reserved. 4 4.1 Interval Scheduling Interval Scheduling Interval scheduling. Job j starts at s j and finishes
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationSkriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT)
Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT) Disclaimer Students attending my lectures are often astonished that I present the material in a much livelier form than in this script.
More informationarxiv: v2 [cs.ds] 8 Apr 2016
Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl
More informationJumping Sequences. Steve Butler 1. (joint work with Ron Graham and Nan Zang) University of California Los Angelese
Jumping Sequences Steve Butler 1 (joint work with Ron Graham and Nan Zang) 1 Department of Mathematics University of California Los Angelese www.math.ucla.edu/~butler UCSD Combinatorics Seminar 14 October
More informationBreadth-First Search of Graphs
Breadth-First Search of Graphs Analysis of Algorithms Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Applications of Breadth-First Search of Graphs a) Single Source
More informationCombinatorics on Finite Words and Data Structures
Combinatorics on Finite Words and Data Structures Dipartimento di Informatica ed Applicazioni Università di Salerno (Italy) Laboratoire I3S - Université de Nice-Sophia Antipolis 13 March 2009 Combinatorics
More informationSturmian Words, Sturmian Trees and Sturmian Graphs
Sturmian Words, Sturmian Trees and Sturmian Graphs A Survey of Some Recent Results Jean Berstel Institut Gaspard-Monge, Université Paris-Est CAI 2007, Thessaloniki Jean Berstel (IGM) Survey on Sturm CAI
More informationTemporal logics and explicit-state model checking. Pierre Wolper Université de Liège
Temporal logics and explicit-state model checking Pierre Wolper Université de Liège 1 Topics to be covered Introducing explicit-state model checking Finite automata on infinite words Temporal Logics and
More informationMonday, April 29, 13. Tandem repeats
Tandem repeats Tandem repeats A string w is a tandem repeat (or square) if: A tandem repeat αα is primitive if and only if α is primitive A string w = α k for k = 3,4,... is (by some) called a tandem array
More informationCS 161: Design and Analysis of Algorithms
CS 161: Design and Analysis of Algorithms Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling Disjoint Sets, Continued Each
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationTighter Bounds for the Sum of Irreducible LCP Values
Tighter Bounds for the Sum of Irreducible LCP Values Juha Kärkkäinen 1 Dominik Kempa 1 Marcin Piątkowski 2 1 University of Helsinki 2 Nicolaus Copernicus University Outline 1 Cyclic words 2 Irreducible
More informationBurrows-Wheeler Transforms in Linear Time and Linear Bits
Burrows-Wheeler Transforms in Linear Time and Linear Bits Russ Cox (following Hon, Sadakane, Sung, Farach, and others) 18.417 Final Project BWT in Linear Time and Linear Bits Three main parts to the result.
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationAnalysis of tree edit distance algorithms
Analysis of tree edit distance algorithms Serge Dulucq 1 and Hélène Touzet 1 LaBRI - Universit Bordeaux I 33 0 Talence cedex, France Serge.Dulucq@labri.fr LIFL - Universit Lille 1 9 6 Villeneuve d Ascq
More informationUNIT II REGULAR LANGUAGES
1 UNIT II REGULAR LANGUAGES Introduction: A regular expression is a way of describing a regular language. The various operations are closure, union and concatenation. We can also find the equivalent regular
More informationOnline Sorted Range Reporting and Approximating the Mode
Online Sorted Range Reporting and Approximating the Mode Mark Greve Progress Report Department of Computer Science Aarhus University Denmark January 4, 2010 Supervisor: Gerth Stølting Brodal Online Sorted
More informationAlgorithms Design & Analysis. String matching
Algorithms Design & Analysis String matching Greedy algorithm Recap 2 Today s topics KM algorithm Suffix tree Approximate string matching 3 String Matching roblem Given a text string T of length n and
More informationCS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)
CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA) Objectives Introduce the Pumping Lemma for CFL Show that some languages are non- CFL Discuss the DPDA, which
More informationCardinality Networks: a Theoretical and Empirical Study
Constraints manuscript No. (will be inserted by the editor) Cardinality Networks: a Theoretical and Empirical Study Roberto Asín, Robert Nieuwenhuis, Albert Oliveras, Enric Rodríguez-Carbonell Received:
More informationProblem Session 5 (CFGs) Talk about the building blocks of CFGs: S 0S 1S ε - everything. S 0S0 1S1 A - waw R. S 0S0 0S1 1S0 1S1 A - xay, where x = y.
CSE2001, Fall 2006 1 Problem Session 5 (CFGs) Talk about the building blocks of CFGs: S 0S 1S ε - everything. S 0S0 1S1 A - waw R. S 0S0 0S1 1S0 1S1 A - xay, where x = y. S 00S1 A - xay, where x = 2 y.
More informationNotes on Logarithmic Lower Bounds in the Cell Probe Model
Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)
More informationOn the Number of Distinct Squares
Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline
More informationarxiv: v2 [cs.ds] 3 Oct 2017
Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract
More informationReversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set
Reversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set Petr Kolman Charles University in Prague Faculty of Mathematics and Physics Department of Applied Mathematics
More informationFault-Tolerant Consensus
Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process
More informationTheory of Computation
Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what
More informationON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION
ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION PAOLO FERRAGINA, IGOR NITTO, AND ROSSANO VENTURINI Abstract. One of the most famous and investigated lossless data-compression schemes is the one introduced
More informationString Search. 6th September 2018
String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)
More informationComputability and Complexity
Computability and Complexity Rewriting Systems and Chomsky Grammars CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard
More informationk-distinct In- and Out-Branchings in Digraphs
k-distinct In- and Out-Branchings in Digraphs Gregory Gutin 1, Felix Reidl 2, and Magnus Wahlström 1 arxiv:1612.03607v2 [cs.ds] 21 Apr 2017 1 Royal Holloway, University of London, UK 2 North Carolina State
More informationAn on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to
(To appear in ALGORITHMICA) On{line construction of sux trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN{00014 University of Helsinki,
More informationSkriptum VL Text-Indexierung Sommersemester 2010 Johannes Fischer (KIT)
1 Recommended Reading Skriptum VL Text-Indexierung Sommersemester 2010 Johannes Fischer (KIT) D. Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997. M. Crochemore,
More informationFalse. They are the same language.
CS 3100 Models of Computation Fall 2010 Notes 8, Posted online: September 16, 2010 These problems will be helpful for Midterm-1. More solutions will be worked out. The midterm exam itself won t be this
More informationBefore We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?
Before We Start The Pumping Lemma Any questions? The Lemma & Decision/ Languages Future Exam Question What is a language? What is a class of languages? Context Free Languages Context Free Languages(CFL)
More informationLecture notes for Advanced Graph Algorithms : Verification of Minimum Spanning Trees
Lecture notes for Advanced Graph Algorithms : Verification of Minimum Spanning Trees Lecturer: Uri Zwick November 18, 2009 Abstract We present a deterministic linear time algorithm for the Tree Path Maxima
More informationChapter 8: Recursion. March 10, 2008
Chapter 8: Recursion March 10, 2008 Outline 1 8.1 Recursively Defined Sequences 2 8.2 Solving Recurrence Relations by Iteration 3 8.4 General Recursive Definitions Recursively Defined Sequences As mentioned
More informationChapter 3 Deterministic planning
Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions
More informationSmall-Space Dictionary Matching (Dissertation Proposal)
Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length
More informationOptimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs
Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together
More informationCISC4090: Theory of Computation
CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter
More informationSingle Source Shortest Paths
CMPS 00 Fall 017 Single Source Shortest Paths Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk Paths in graphs Consider a digraph G = (V, E) with an edge-weight
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017 Lecture 4 Ana Bove March 24th 2017 Structural induction; Concepts of automata theory. Overview of today s lecture: Recap: Formal Proofs
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationMultiple Choice Tries and Distributed Hash Tables
Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.
More information6. DYNAMIC PROGRAMMING II
6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison
More informationINF 4130 / /8-2014
INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT
More informationINF 4130 / /8-2017
INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 4 Ana Bove March 23rd 2018 Recap: Formal Proofs How formal should a proof be? Depends on its purpose...... but should be convincing......
More informationDecentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication
Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly
More informationLinear Classifiers (Kernels)
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March
More informationVarieties of Regularities in Weighted Sequences
Varieties of Regularities in Weighted Sequences Hui Zhang 1, Qing Guo 2 and Costas S. Iliopoulos 3 1 College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang 310023,
More informationOn the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States
On the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States Sven De Felice 1 and Cyril Nicaud 2 1 LIAFA, Université Paris Diderot - Paris 7 & CNRS
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationString Range Matching
String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings
More informationThe Impossibility of Certain Types of Carmichael Numbers
The Impossibility of Certain Types of Carmichael Numbers Thomas Wright Abstract This paper proves that if a Carmichael number is composed of primes p i, then the LCM of the p i 1 s can never be of the
More informationCS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)
CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL) Objectives Introduce Pumping Lemma for CFL Apply Pumping Lemma to show that some languages are non-cfl Pumping Lemma
More informationTree sets. Reinhard Diestel
1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked
More informationLinear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs
Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized
More informationComputing Connected Components Given a graph G = (V; E) compute the connected components of G. Connected-Components(G) 1 for each vertex v 2 V [G] 2 d
Data Structures for Disjoint Sets Maintain a Dynamic collection of disjoint sets. Each set has a unique representative (an arbitrary member of the set). x. Make-Set(x) - Create a new set with one member
More informationNotes for Comp 497 (454) Week 10
Notes for Comp 497 (454) Week 10 Today we look at the last two chapters in Part II. Cohen presents some results concerning the two categories of language we have seen so far: Regular languages (RL). Context-free
More informationA Dichotomy for Regular Expression Membership Testing
58th Annual IEEE Symposium on Foundations of Computer Science A Dichotomy for Regular Expression Membership Testing Karl Bringmann Max Planck Institute for Informatics Saarbrücken Informatics Campus Saarbrücken,
More information{a, b, c} {a, b} {a, c} {b, c} {a}
Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also say, S is a partially ordered set or S is a poset.
More informationCS4311 Design and Analysis of Algorithms. Analysis of Union-By-Rank with Path Compression
CS4311 Design and Analysis of Algorithms Tutorial: Analysis of Union-By-Rank with Path Compression 1 About this lecture Introduce an Ackermann-like function which grows even faster than Ackermann Use it
More informationSORTING SUFFIXES OF TWO-PATTERN STRINGS.
International Journal of Foundations of Computer Science c World Scientific Publishing Company SORTING SUFFIXES OF TWO-PATTERN STRINGS. FRANTISEK FRANEK and WILLIAM F. SMYTH Algorithms Research Group,
More informationGreedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:
October 5, 017 Greedy Chapters 5 of Dasgupta et al. 1 Activity selection Fractional knapsack Huffman encoding Later: Outline Dijkstra (single source shortest path) Prim and Kruskal (minimum spanning tree)
More informationLecture 3: String Matching
COMP36111: Advanced Algorithms I Lecture 3: String Matching Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The string matching problem The Rabin-Karp algorithm The Knuth-Morris-Pratt
More informationFoundations of
91.304 Foundations of (Theoretical) Computer Science Chapter 3 Lecture Notes (Section 3.2: Variants of Turing Machines) David Martin dm@cs.uml.edu With some modifications by Prof. Karen Daniels, Fall 2012
More information0.1 Minimizing Finite State Machines
.. MINIMIZING FINITE STATE MACHINES. Minimizing Finite State Machines Here we discuss the problem of minimizing the number of states of a DFA. We will show that for any regular language L, thereisaunique
More informationWorst case analysis for a general class of on-line lot-sizing heuristics
Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University
More informationLexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.
Lexical Analysis Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Today
More informationTuring Machines Part II
Turing Machines Part II Hello Hello Condensed Slide Slide Readers! Readers! This This lecture lecture is is almost almost entirely entirely animations that that show show how how each each Turing Turing
More information