Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Similar documents
Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Fast index for approximate string matching

NON-DETERMINISTIC FSA

6.5 Improper integrals

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Introduction to Olympiad Inequalities

CS 573 Automata Theory and Formal Languages

Chapter 4 State-Space Planning

Finite State Automata and Determinisation

Arrow s Impossibility Theorem

Lecture 6: Coding theory

Arrow s Impossibility Theorem

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Nondeterministic Automata vs Deterministic Automata

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Algorithm Design and Analysis

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Prefix-Free Regular-Expression Matching

Lecture Notes No. 10

CS 491G Combinatorial Optimization Lecture Notes

THE PYTHAGOREAN THEOREM

Algorithm Design and Analysis

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Spacetime and the Quantum World Questions Fall 2010

p-adic Egyptian Fractions

Symmetrical Components 1

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

A Study on the Properties of Rational Triangles

System Validation (IN4387) November 2, 2012, 14:00-17:00

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

TOPIC: LINEAR ALGEBRA MATRICES

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

2.4 Theoretical Foundations

Part 4. Integration (with Proofs)

These slides are from 2014 and contain a semi-serious error at one point in the

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Electromagnetism Notes, NYU Spring 2018

Section 1.3 Triangles

Parse trees, ambiguity, and Chomsky normal form

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

AVL Trees. D Oisín Kidney. August 2, 2018

Bases for Vector Spaces

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable

Alpha Algorithm: Limitations

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then.

Proving the Pythagorean Theorem

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS

12.4 Similarity in Right Triangles

8 THREE PHASE A.C. CIRCUITS

A CLASS OF GENERAL SUPERTREE METHODS FOR NESTED TAXA

= state, a = reading and q j

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Part I: Study the theorem statement.

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

QUADRATIC EQUATION. Contents

Designing Information Devices and Systems I Spring 2018 Homework 7

ANALYSIS AND MODELLING OF RAINFALL EVENTS

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

Chapter 8 Roots and Radicals

Polynomials. Polynomials. Curriculum Ready ACMNA:

Hyers-Ulam stability of Pielou logistic difference equation

Discrete Structures Lecture 11

On Suffix Tree Breadth

Lossless Compression Lossy Compression

Section 4.4. Green s Theorem

MAT 403 NOTES 4. f + f =

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Expand the Shares Together: Envy-free Mechanisms with a Small Number of Cuts

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

1 Nondeterministic Finite Automata

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then

Exercise sheet 6: Solutions

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

PYTHAGORAS THEOREM WHAT S IN CHAPTER 1? IN THIS CHAPTER YOU WILL:

5. Every rational number have either terminating or repeating (recurring) decimal representation.

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Homework Solution - Set 5 Due: Friday 10/03/08

Nondeterministic Finite Automata

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Behavior Composition in the Presence of Failure

Appendix C Partial discharges. 1. Relationship Between Measured and Actual Discharge Quantities

2.4 Linear Inequalities and Interval Notation

@#? Text Search ] { "!" Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata

Section 4: Integration ECO4112F 2011

Engineering a Lightweight Suffix Array Construction Algorithm 1. Giovanni Manzini 2 and Paolo Ferragina 3

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

CONTROLLABILITY and observability are the central

Solving the String Statistics Problem in Time O(n log n)

Transcription:

Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely used dt strutures for text indexing. Eh uses liner spe nd n e onstruted in liner time for polynomilly sized lphets. However, when it omes to nswering queries with worst-se deterministi time ounds, the prior does so in O(m log Σ ) time, where m is the query size, Σ is the lphet size, nd the ltter does so in O(m + log n) time, where n is the text size. If one wnts to output ll ppernes of the query, n dditive ost of O(o) time is suffiient, where o is the size of the output. Notie tht it is possile to otin worst se, deterministi query time of O(m) ut t the ost of super-liner onstrution time or spe usge. We propose novel wy of omining the two into, wht we ll, suffix try. The spe nd onstrution time remin liner nd the query time improves to O(m + log Σ ) for integer lphets from liner rnge, i.e. Σ {1,, n}, for n ritrry onstnt. The onstrution nd query re deterministi. Here lso n dditive O(o) time is suffiient if one desires to output ll ppernes of the query. We lso onsider the online version of indexing, where the text rrives online, one hrter t time, nd indexing queries re nswered in tndem. In this vrint we rete ross etween suffix tree nd suffix list ( dynmi vrint of suffix rry) to e lled suffix trist; it supports queries in O(m + log Σ ) time. The suffix trist lso uses liner spe. Furthermore, if there exists n online onstrution for liner-spe suffix tree suh tht the ost of dding hrter is worst-se deterministi f(n, Σ ) (n is the size of the urrent text), then one n further updte the suffix trist in O(f(n, Σ ) + log Σ ) time. The est urrently known worstse deterministi ound for f(n, Σ ) is O(log n) time. 1 Introdution Indexing is one of the most importnt prdigms in serhing. The ide is to preproess text nd onstrut mehnism tht will lter provide nswers to queries of the form does pttern P our in the text in time proportionl to the size of the pttern rther thn the text. The suffix Results from this pper hve ppered s n extended strt in ICALP 2006. Deprtment of Computer Siene, Cournt Institute, NYU. This work ws supported in prt y NSF grnt CCF-1217989 Dept. of Computer Siene, Br-Iln U., 52900 Rmt-Gn, Isrel This reserh ws supported y BSF grnt (#2010437) nd GIF grnt (#1147/2011) 1

tree [7, 16, 20, 21] nd suffix rry [10, 11, 12, 15] hve proven to e invlule dt strutures for indexing. Both suffix trees nd suffix rrys use O(n) spe, where n is the text length. In ft, for lphets from polynomilly sized rnge, oth n e onstruted in liner time, see [7, 10, 11, 12]. The query time is slightly different in the two dt strutures. Nmely, using suffix trees queries re nswered in O(m log Σ + o) time, where m is the length of the query, Σ is the lphet, Σ is the lphet size nd o is the numer of ourrenes of the query. Using suffix rrys the time is O(m + log n + o). In [5] it ws shown tht the serh time of O(m + log n + o) is possile lso on suffix trees. For the rest of this pper we ssume tht we re only interested in one ourrene of the pttern in the text, nd note tht we n find ll of the ourrenes of the pttern with nother dditive o ost in the query time. The differenes in the running times follows from the different wys queries re nswered. In suffix tree, queries re nswered y trversing the tree from the root. At eh node one needs to know how to ontinue the trversl nd one needs to deide etween t most Σ options whih re sorted, whih explins the O(log Σ ) ftor. In suffix rrys one performs inry serh on ll suffixes (hene the log n ftor) nd uses longest ommon prefix (LCP) queries to quikly deide whether the pttern needs to e ompred to speifi suffix (see [15] for full detils). It is strightforwrd to onstrut dt struture tht will yield optiml O(m) time to nswer queries. This n e done y putting Σ length rry t every node of the suffix tree. Hene, when trversing the suffix tree with the query we will spend onstnt time t eh node. However, the size of this struture is O(n Σ ). Also, notie tht this method ssumes Σ is omprised of {1, 2,..., Σ 1} {} 1 s we need to e le to rndom ess lotions in the rry sed on the urrent hrter, nd so we need our lphet to e proper indies. While this n e overome using renming shemes, it will provide n extr O(m log Σ ) time for the query proess, s one would need to renme eh of the hrters in the pttern. Another vrint of suffix trees tht will yield optiml O(m) time to nswer queries is s follows. Construt suffix tree whih mintins stti ditionry t eh node, h v : Σ v E v where E v is the set of edges exiting v nd Σ v is the set of first hrters on E v edges. We set h v (σ) = x, where x is the edge ssoited with the first hrter σ. This sheme n e uilt in liner rndomized time nd liner spe. If one desires determinism, s we do, the downside is tht the onstrution of the deterministi stti ditionry tkes, in the est (nd omplited) se, O(n min{log Σ, log log n}) preproessing time [18]. The question of interest here is whether one n deterministilly onstrut n indexing dt struture using O(n) spe nd time tht will nswer queries fster thn suffix rrys nd suffix trees. We indeed propose to do so with the Suffix Try, new dt struture tht extrts the dvntges of suffix trees nd suffix rrys y omining their strutures. This yields n O(m + log Σ ) query time. However, our solution uses some Σ length rrys to llow for quik nvigtion, nd s suh it seems tht we re lso onfined to using integer lphets. However, this n e improved to lphets Σ {1,, n}, for ritrry onstnt, sine the text n e sorted nd renming rry A of length n n e mintined sving the rnk (reltive to Σ) of the element, i.e. A[σ] is the rnk of σ in Σ nd A[i] is 0 if there is no hrter i in T. For eh pttern hrter evluted 1 Note tht the speil hrter is speil delimiter whih hs only ppers t the end of the text nd is onsidered to e lexiogrphilly lrger thn ll of the other integers in Σ. 2

onstnt time lookup in A gives the rnk of the hrter. We lso onsider the nturl extension to the online se, tht is the senrio where online updte of the text re llowed. In other words, given n indexing struture supporting indexing queries on S, we would lso like to support extensions of the text to S, where Σ. We ssume tht the text is given in reverse, i.e. from the lst hrter towrds the eginning. So, n indexing struture of our desire when representing S will support extensions to S where Σ. We ll the hnge of S to S text extension. The reverse ssumption tht we use is not strit, s most indexing strutures n hndle online texts tht re reversed (e.g. insted of suffix tree one n onstrut prefix tree nd nswer the queries in reverse. Likewise, prefix rry n e onstruted insted of suffix rry). Online onstrutions of indexing strutures hve een suggested previously. MCreight s suffix tree lgorithm [16] ws the first online onstrution. It ws reverse onstrution (in the sense mentioned ove). Ukkonen s lgorithm [20] ws the first online lgorithm tht ws not reversed. In oth these lgorithms text extensions tke O(log Σ ) mortized time, ut in the worst-se text extension ould tke Ω(n log Σ ) time. In [3] n online suffix tree onstrution (under the reverse ssumption) ws proposed with O(log n) worst-se time per text extension. In ll of these onstrutions full suffix tree is onstruted nd hene queries re nswered in O(m log Σ ) time. An online vrint of suffix rrys ws lso proposed in [3] with O(log n) worst-se time per text extension nd O(m + log n) time for nswering queries. Similr results n e otined y using the results in [9]. The prolem we del with in the seond prt of the pper is how to uild n indexing struture tht supports oth text extensions (to the eginning of the text) nd supports fst(er) indexing. We will show tht if there exists n online onstrution for liner-spe suffix tree suh tht the ost of dding hrter is f(n, Σ ) (n is the size of the urrent text), then we n onstrut n online liner-spe dt-struture for indexing tht supports indexing queries in O(m+log Σ ) time, where the ost of dding hrter is O(f(n, Σ )+log Σ ) time. We will ll this dt struture the Suffix Trist. As mentioned in the previous prgrph the est urrently known worst-se deterministi ound for f(n, Σ ) is O(log n) time [3]. Bresluer nd Itlino [4] hve improved this result for the se tht Σ is o(log n). Speifilly, they show how to support text extensions in deterministi O( Σ + log log n) time. In [13] n indexing dt struture ws shown for whih text extensions n e implemented in O(log log n + log log Σ ) expeted time. However, the onstrution there is rndomized nd does not suit our needs. 2 Suffix Trees, Suffix Arrys nd Suffix Intervls Consider text S of length n nd let S 1,, S n e the suffixes of S. Two lssil dt strutures for indexing re the suffix tree nd the suffix rry. It is ssumed tht the reder is fmilir with the suffix tree. Let S i 1,..., S in e the lexiogrphi ordering of the suffixes. The suffix rry of S is defined to e SA(S) =< i 1,..., i n >, i.e. the indies of the lexiogrphi ordering of the suffixes. Lotion j of the suffix rry is sometimes referred to s the lotion of S i j (insted of the lotion of i j ). 3

T= =3 8 11 6 9 3 12 1 15 7 10 4 13 5 2 14 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 1: A suffix tree with suffix rry t leves. Let ST (S) nd SA(S) denote the suffix tree nd suffix rry of S, respetively. As with ll suffix tree onstrutions to dte, ssume tht every node in suffix tree mintins its hildren in lexiogrphi order. Therefore, the leves ordered y n inorder trversl orrespond to the suffixes in lexiogrphi order, whih is lso the order mintined in the suffix rry. Hene, one n view the suffix tree s tree over the suffix rry. See Figure 1. We further omment on the onnetion etween suffix rrys nd suffix trees. For strings R nd R sy tht R < L R if R is lexiogrphilly smller thn R. lef(s i ) denotes the lef orresponding to S i in ST (S), the suffix tree of S. Define L(v) to e the lotion of S i in the suffix rry where lef(s i ) is the leftmost lef of the sutree of v. Notie tht sine the hildren of node in suffix tree re ssumed to e mintined in lexiogrphi order, it follows tht for ll S j suh tht lef(s j ) is desendnt of v, S i L S j. Likewise, define R(v) to e the lotion of S i in the suffix rry where lef(s i ) is the rightmost lef of the sutree of v. Therefore, for ll S j suh tht lef(s j ) is desendnt of v, S i L S j. Hene, the intervl [L(v), R(v)] is n intervl of the suffix rry whih ontins extly ll the suffixes S j for whih lef(s j ) is desendnt of v. Notie tht intervl representtions hve lredy een disussed in [1]. Moreover, under the ssumption tht the hildren of node in suffix tree re mintined in lexiogrphi ordering one n stte the following. Oservtion 1 Let S e string nd ST (S) its suffix tree. Let v e node in ST (S) nd let v 1,..., v r e its hildren in lexiogrphi order. Let 1 i j r, nd let [L(v i ), R(v j )] e n intervl of the suffix rry. Then k [L(v i ), R(v j )] if nd only if lef(s k ) is in one of the sutrees rooted t v i,..., v j. This leds to the following onept. Definition 1 Let S e string nd {S i 1,..., S in } e the lexiogrphi ordering of its suffixes. The intervl [j, k] = {i j,..., i k }, for j k, is lled suffix intervl. 4

Oviously, suffix intervls re intervls of the suffix rry. Notie tht, s mentioned ove, for node v in suffix tree, [L(v), R(v)] is suffix intervl nd the intervl is lled v s suffix intervl. Also, y Oservtion 1, for v s hildren v 1,..., v r nd for ny 1 i j r, we hve tht [L(v i ), R(v j )] is suffix intervl nd this intervl is lled the (i, j)-suffix intervl. 3 Suffix Trys The suffix try is now introdued. The suffix try will use the onept of suffix intervls from the previous setion whih, s hs een seen, forms onnetion etween the nodes of the suffix trees nd intervls in the suffix rry. For suffix trys, speil nodes re reted, whih orrespond to suffix intervls. These nodes re lled suffix intervl nodes. Notie tht these nodes re not suffix tree nodes. Prt of the suffix try will e suffix rry. Eh suffix intervl node n e viewed s node tht mintins the endpoints of the intervl within the omplete suffix rry. Seondly, the ide of the spe-ineffiient O(n Σ ) suffix tree solution mentioned in the introdution is used. We onstrut Σ -length rrys for seleted suset of nodes, suset tht ontins no more thn n nodes, whih retins the O(n) spe ound. To hoose this seleted suset of nodes Σ we define the following. Definition 2 Let S e string over lphet Σ. A node u in ST (S) is lled σ-node if the numer of leves in the sutree of ST (S) rooted t u is t lest Σ. A σ-node u is lled rnhing-σnode, if t lest two of u s hildren in ST (S) re σ-nodes nd is lled σ-lef if ll its hildren in ST (S) re not σ-nodes. See Figure 2 for n illustrtion of the different node types. T=, =3 = -lef = non-rnhing -node = rnhing -node 8 11 6 9 3 12 1 15 7 10 4 13 5 2 14 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 2: A suffix tree with different node types. 5

The following property of rnhing-σ-nodes is ruil to our result. Lemm 1 Let S e string of size n over n lphet Σ nd let ST (S) e its suffix tree. The numer of rnhing-σ-nodes in ST (S) is O( n Σ ). n Proof: The numer of σ-leves is t most euse (1) they eh hve t lest Σ leves Σ in their sutree nd (2) their sutrees re disjoint. Let T e the tree indued y the σ-nodes nd ontrted onto the rnhing-σ-nodes nd σ-leves only. Then T is tree with n leves nd with Σ every internl node hving t lest 2 hildren. Hene, the lemm follows. This implies tht one n fford to onstrut rrys t every rnhing-σ-node whih will e used for nswering queries quikly s shll e seen in Setion 3.2. 3.1 Suffix Try Constrution A suffix try is onstruted from suffix tree s follows. The suffix try ontins ll the σ-nodes of the suffix tree. Some suffix intervl nodes re lso dded to the suffix try s hildren of σ-nodes. Here is how eh σ-node is onverted from the suffix tree to the suffix try. σ-lef u: u eomes suffix intervl node with suffix intervl [L(u), R(u)]. non-lef σ-node u: Let u 1,..., u r e u s hildren in the suffix tree nd u l1,..., u lx e the suset of hildren tht re σ-nodes. Then u will e in the suffix try nd its hildren will e interleving suffix intervl nodes nd σ-nodes, i.e. (1, l 1 1)-suffix intervl node, u l1, (l 1 + 1, l 2 1)- suffix intervl node, u l2,..., u lx, (l x + 1, r)-suffix intervl node. At eh rnhing-σ-node u in the suffix try we onstrut n rry of size Σ, denoted y A u, tht ontins the following dt. For every hild v of u tht is σ-node, lotion τ in A u where τ is the first hrter on the edge (u, v), points to v. The rest of the lotions in A u point to the pproprite suffix intervl node, or to NIL pointer if no suh suffix intervl exists. At eh σ-node u whih is not rnhing-σ-node nd not σ-lef, i.e. it hs extly one hild v whih is σ-node, store the first hrter τ on the edge (u, v), whih is lled the seprting hrter, together with two pointers to its two interleving suffix intervls. See Figure 3 for n exmple of suffix try. The suffix try is now limed to e of liner size. Lemm 2 Let S e string of size n. Then the size of the suffix try for S is O(n). Proof: The suffix rry is lerly of size O(n) nd the numer of suffix intervl nodes is ounded y the numer of nodes in ST (S). Also, for eh non rnhing-σ-node the uxiliry informtion is of onstnt size. The uxiliry informtion held in eh rnhing-σ-node is of size O( Σ ). By Lemm 1 there re O( n ) rnhing-σ-nodes. Hene, the overll size is O(n). Σ Oviously, given suffix tree nd suffix rry, suffix try n e onstruted in liner time (using depth-first serhes, nd stndrd tehniques). Sine oth suffix rrys nd suffix trees n e onstruted in liner time for lphets from polynomilly sized rnge [7, 10, 11, 12], so n suffix trys. 6

= -lef = non-rnhing -node = rnhing -node 8 11 6 9 3 12 1 15 7 10 4 13 5 2 14 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 3: A suffix try (on running exmple) with hunks of suffix rry t the ottom. 3.2 Nvigting on Index Queries Suffix trys min gol is nswering index queries. Now we explin how to do so. Upon reeiving query P = p 1...p m, we trverse the suffix try from the root. Sy tht we hve lredy trversed the suffix try with p 1...p i 1 nd need to ontinue with p i...p m. At eh rnhingσ-node u, ess rry A u t lotion p i in order to determine whih suffix try node to nvigte to. Oviously, sine this is n rry lookup this tkes onstnt time. For other σ-nodes tht re not σ-leves nd not rnhing-σ-nodes, ompre p i with the seprtor hrter τ. Rell tht these nodes hve only one hild v tht is σ-node. Hene, the hildren of u in the suffix try re (1) suffix intervl node to the left of v, sy u s left intervl, (2) v, nd (3) suffix intervl node to the right of v, sy u s right intervl. If p i < τ, nvigte to u s left intervl. If p i > τ, nvigte to u s right intervl. If p i = τ, nvigte to the only hild of u tht is σ-node. If u is σ-lef then we serh within u s suffix intervl. To serh within suffix intervl [j, k], the suffix rry serh is pplied eginning with oundries [j, k] 2. The time to serh in this struture is O(m + log I), where I is the intervl size. Hene, the following lemm is helpful. Lemm 3 Every suffix intervl in suffix try is of size O( Σ 2 ). Proof: Consider n (i, j)-suffix intervl, i.e. the intervl [L(v i ), R(v j )] whih stems from node v with hildren v 1,..., v r. Notie tht, y Oservtion 1, the (i, j)-suffix intervl ontins the suffixes whih re represented y leves in the sutrees of v i,..., v j. However, v i,..., v j re not σ-nodes (y suffix try onstrution). Hene, eh sutree of those nodes ontins t most Σ 1 leves. Sine j i + 1 Σ the overll size of the (i, j)-suffix intervl is O( Σ 2 ). 2 This n esily e done with the RMQ dt struture. However, the originl result of Mner nd Myers [15] is slightly more rigid. The more expnsive view is desried in [14]. 7

A suffix intervl [L(v), R(v)] is mintined only for σ-leves. As none of the hildren of v re σ-nodes this is speil se of the (i, j)-suffix intervl. By the disussion ove nd Lemm 3 the running time for nswering n indexing query is O(m + log Σ ). In summry we hve proven the following. Theorem 1 Let S e length n string over n lphet Σ {1,, n}. The suffix try of S is (1) of size O(n), (2) n e onstruted in O(n) time, nd (3) supports indexing queries (of size m) in O(m + log Σ ) time. 4 Suffix Trists - The Online Senrio In this setion the prolem of how to uild n indexing struture tht supports oth text extensions nd supports fst(er) indexing is ddressed. It is shown tht if there exists n online onstrution for liner-spe suffix tree suh tht the ost of dding hrter is f(n, Σ ) (n is the size of the urrent text), then one n onstrut n online liner-spe dt-struture for indexing tht supports indexing queries in O(m+log Σ ) time, where the ost of dding hrter is O(f(n, Σ )+log Σ ) time. As text extensions our, the online liner-spe suffix-tree onstrution is treted s suffixtree orle; it performs the pproprite updtes to the suffix tree s result of the text extension, nd mintins the on-line suffix-tree s result. Speifilly, the est known urrent onstrution supports text extensions in O(log n) time (see [3]). When Σ is smll one n otin n improved O( Σ + log log n) time [4]. We propose n online version of the Suffix Try dt struture whih we ll Suffix Trist ( ross etween suffix tree nd n enhned linked list). The suffix trist intends to imitte the suffix try. The σ-nodes nd rnhing-σ-nodes re still used in the suffix tree prt of the suffix trist, nd the method for nswering indexing queries is similr. However, new issues rise in the online model: 1. Suffix rrys re stti dt strutures nd, hene, do not support insertions of new suffixes. 2. The rrys A v, for the rnhing-σ-nodes, need to e initilized nd mde dynmi. 3. The sttus of nodes hnges s time progresses (non-σ-nodes eome σ-nodes, nd σ-nodes eome rnhing-σ-nodes). We will show in Setion 4.1 how to overome (1) y using n lterntive dt struture. We will hve mny of these lterntive dt strutures, eh representing suffix intervl. In Setion 4.2 we show how to find the orret lterntive dt struture (mong those representing the different suffix intervls) when implementing text extensions. In Setion 4.3 we propose n lterntive wy to hndling the rry A v of rnhing-σ-nodes. Setion 5 is dedited to solving (3) nd is diret worst-se solution. 4.1 Blned Indexing Strutures for Suffix Trists Sine we mimi the Suffix Try with the Suffix Trist in n online (dynmi) mnner we need to show how to reple the (stti) suffix rrys with dynmi dt strutures. On the other hnd, we lso wnt to mke sure tht the new dt strutures n nswer queries effiiently in the worst se. 8

The two dt strutures to e used re: (1) suffix list whih is douly linked list of the lexiogrphilly ordered suffix olletion, nd (2) lned indexing struture, BIS for short, whih is lned inry serh tree over the lexiogrphilly ordered suffix olletion, fully desried y Amir et l. [3]. The BIS ugments the suffix list with the dynmi order mintenne dt struture of Dietz nd Sletor [6] (see [13] for simplifition). The BIS ontins suffix list within. Moreover, it ontins the longest ommon prefix (LCP) dt struture from [8] in order to llow indexing queries to e nswered effiiently. A BIS supports text extension in O(log n) time nd indexing queries in O(m+log n) time (see [3] nd the next prgrph for synopsis). The log n term in the indexing query time follows from the height of the tree. In nutshell, BIS is defined for string S nd supports text extensions to S nd indexing queries. We give n outline of text extensions (queries re similr) from [3]. We insert the new suffix S into the BIS y following the inry serh tree from the root to the pproprite lotion of S in the tree, s one would for stndrd inry serh. However, strightforwrd omprison my tke Ω( S ) time s S nd the string represented y node my e of length Ω( S ). To redue the time to O(1) n LCP (longest ommon prefix) dt struture is used on ll suffixes. So, t node representing suffix S we ompre S to S y first ompring their first hrter. If they re not equl then the wlk down the inry serh tree ontinues ording to the omprison. If they re equl we use the LCP dt struture to ompre S nd S, where S = S. Notie tht we ompre S nd S nd not S nd S euse S is new nd not yet in the LCP dt struture. One S is in ple in the inry serh tree one dds it to the LCP dt struture. This is done using method very similr to the results in [8] nd is out of sope of this pper. For more detils see [3]. The ove suggests repling suffix rry of the Suffix Try with BIS in the Suffix Trist therey reting seprte BIS for every suffix intervl. Sine the suffix intervls re of size O( Σ 2 ) the serh time in the smll BISs will e O(m + log Σ ). However, while the serh is within the desired time ounds, one hs to e more reful during the exeution of text extension. There re two issues tht need to e onsidered. The first is tht we ssume tht we re given n online suffix tree onstrution. Hene, the suffix tree is updted with new node representing the new suffix of S. However, one needs to find the pproprite BIS into whih the new node needs to e inserted. This is disussed in Setion 4.2. The seond issue is tht eh BIS ontins only some of the suffixes, wheres the serh of the lotion of S in the BIS relies on the underlying LCP dt struture, desried ove, whih requires tht ll the suffixes of S e in the BIS (whih is lmost lwys not the se). However, this my e solved y letting the LCP dt struture work on ll suffixes in ll BISs. To mke this work we need to show how to insert the new suffix S (the text extension) into the dt struture. We oserve tht one we find the lotion of S in the urrent BIS this will e its lotion (not only in the urrent BIS ut lso) mong ll suffixes in ll BISs. This is true euse the BISs represent suffix intervls (i.e. ontiguous suffixes). 4.2 Inserting New Nodes into BISs When text extension from S to S is performed, the suffix tree is updted to represent the new text S (y our suffix-tree orle). Speifilly, new lef, orresponding to the new suffix, is dded to the suffix tree, nd perhps one internl node is lso dded. If suh n internl node is inserted, then 9

tht node is the prent of the new lef nd this hppens in the event tht the new suffix diverges from n edge (in the suffix tree of S) t lotion where no node previously existed. In this se n edge needs to e roken into two nd the internl node is dded t tht point. We now show how to updte the suffix trist using the output of the online suffix tree orle. The steps re (1) finding the orret BIS into whih the new suffix is to e inserted nd (2) performing the insertion of the new suffix into this BIS. Of ourse, this my hnge the sttus of internl nodes, whih re hndled in Setion 5. The fous is on solving (1) while mentioning tht (2) n e solved y the text extension method of the BIS whose synopsis ws given in Setion 4.1. The following useful oservtion is immeditely derived from the definition of suffix trists. Oservtion 2 For node u in suffix tree, if u is not σ-node, then ll of the leves in u s sutree re in the sme BIS. For every node u in the suffix tree it will e useful to mintin pointer lef(u) to some lef in u s sutree. This invrint n esily e mintined under text extensions sine n internl node is lwys reted together with new lef, nd this lef will lwys e lef of the sutree of the internl node. In order to find the orret BIS in whih the new node is to e inserted we first onsider the sitution in the suffix tree. We onsider the following two ses. () The new lef u in the suffix tree (representing the new suffix) is inserted s hild of n lredy existing internl node v. () The new lef u in the suffix tree is inserted s hild of new internl node v. First, onsider the se where the new lef u in the suffix tree is inserted s hild of n lredy existing internl node v. If v is either σ-lef or not σ-node, then from Oservtion 2 it is known tht lef(v) nd u need to e in the sme BIS. So, we my move to the BIS in whih lef(v) is positioned nd trverse up from lef(v) to the root of the BIS (in O(log Σ ) time). If v is rnhing-σ-node, then the BIS is found in O(log Σ ) time from A v (we show how to do this in Setion 4.3). Otherwise, v is σ-node whih is not σ-lef nd not rnhing-σ-node. In this se the orret BIS of the two possile BISs n e found y exmining the seprting hrter mintined in v. Next, onsider the se where the new lef u in the suffix tree is inserted s hild of new internl node v. Let w e v s prent, nd let w e v s other hild (not u). We onsider this se in two steps. First we insert v nd then we insert u. After we show how to updte the suffix trist to inlude v, u n e dded s explined ove. We need to determine the sttus of v. Oviously, v nnot e rnhing-σ-node. Moreover, the numer of leves in v s sutree is the sme s the numer of leves in the sutree rooted t w (s u is urrently eing ignored). So, if w is not σ-node, v is not σ-node, nd otherwise, v is non-rnhing-σ-node with seprting hrter tht is the first hrter of the lel of edge (v, w ). The entire proess tkes O(log Σ ) time, s required. 10

4.3 The Brnhing-σ-Node Arry in Dynmi Setting The rry A v for rnhing-σ-node v hs to e dpted to work for the dynmi setting. First of ll, it will e neessry to initilize the rry when node eomes rnhing-σ-node (whih nively tkes O( Σ ) time). The other hllenge is tht the vlues of the rry might e hnging. Most notly, suffix intervl, represented y BIS, might ontin node tht hnges to σ-node therey dividing the suffix intervl. This requires updtes to ll the rry lotions of hrters represented y the suffix intervl. We will disuss these issues in more depth in Setion 5.4. However, we present upfront the hnge in dt struture formt from tht of the Suffix Try. This will e useful for the disussion throughout Setion 5. The rry A v in the (stti) suffix try ontins pointer for eh hrter to its pproprite suffix intervl. Speifilly, pointers to the sme suffix intervl pper in ll of the rry ells tht go to tht suffix intervl. This is fine in the stti se ut my e ostly for text extensions sine splitting BIS n use mny pointers to need to e hnged. Hene, we hnge our strtegy nd let n rry ell representing hrter in the rry point to the edge in the suffix tree leving v with. During trversl we implement the pointer to the pproprite BIS from hrter y essing the edge from A v t the lotion representing. Sy this edge leds to node u. Then from lef(u) we n ess the pproprite BIS. Notie tht while we will no longer need suffix intervl nodes, we will still refer to them s if they exist while keeping in mind tht essing the pproprite BIS is done through the the pproprite edge (in mnner to e shown lter) without the tul suffix intervl node. The dvntge is tht, when BIS representing suffix intervl is split, nothing in the rry needs to e hnged (only the new σ-node needs to e mrked). We point out tht the proedure we will desrie to find the pproprite BIS will work regrdless of the hnging BISs. We solve the prolem of initilizing A v s follows. Sine we re in dynmi environment, nodes will hnge sttus to eome rnhing-σ-nodes s the text grows, therey requiring new rrys. We disuss this lter in Setion 5.4. However, we do point out tht it is helpful to mintin (throughout the lgorithm) lned serh tree t eh node v of the suffix tree, whih we denote BST v, ontining the hrters whih exit the node (not ll hrters) with pointers to the edges exiting v with the orresponding hrters. We will use this dt struture during the time A v is eing onstruted. It will e shown lter tht this will not ffet the query time. 4.4 Serh Outline We outline the proedure of the serh for pttern in Suffix Trist. However, we point out tht it is n outline nd the reder is dvised to rered the outline fter reding Setion 5. Serh 1. σ-lef v: Go to the BIS tht represents the suffixes in the sutree of v in the suffix tree. 2. Non-rnhing-σ-node v: Compre τ, the seprting hrter, with ρ, the urrent hrter of the serh. If ρ = τ then go the hild v i where (v, v i ) is mrked strting with τ. If ρ < τ then go to the left BIS nd if ρ > τ then go to the right BIS. 11

3. Brnhing-σ-node v: () If A v exists then go to A v (ρ) leding to edge e = (v, u) exiting v with ρ. Chek the text on the edge for mth. If u is σ-node then ontinue the serh in the suffix tree. If not, then go to lef(u), move to the BIS ontining lef(u) nd find the root of the BIS y going up the inry serh tree of tht BIS. Continue the serh for the pttern within the BIS (s desried in Setion 4.1). () If A v is still under onstrution, ompre ρ with the first hrter on eh of the edges leding to v i nd v j (the first two hildren tht eme σ-nodes nd used v to eome rnhing-σ-node). If ρ mthes one of them go to v i or v j ordingly, heking for mth on the wy. If not, use BST v to find ρ nd then repet the proedure desried when A v exists. 5 When Node Chnges Sttus At this stge of our desription of the suffix trist we hve ssumed tht we hve n online suffix tree orle. We hve lso desried how to mintin the BISs tht reple the suffix rry intervls of the Suffix Try. Finlly we hve shown how to find the orret BIS into whih new suffix is inserted during text extension, nd how to exeute the insertion. We re left with the tsk of mintining the uxiliry dt on the nodes of the suffix tree, nmely mintining the dt for σ-nodes, rnhingσ-nodes nd knowing when σ-node is σ-lef. Of ourse, the sttus of node my hnge sine the text is now rriving online. In 5.1 we egin y desriing how to detet when node rehes the sttus of σ-node. In 5.2 we desrie wht needs to e updted when node eomes σ-node. In 5.3 we desrie wht hppens when σ-lef loses its lef sttus. In 5.4 we desrie the updtes neessry when σ-node eomes rnhing-σ-node. 5.1 Deteting New σ-node Let u e new σ-node nd let v e its prent. Just efore u eomes σ-node, (1) v must hve lredy een σ-node nd (2) u {v i,..., v j } nd is ssoited with n (i, j)-suffix intervl represented y suffix intervl node w tht is hild of v in the suffix trist. Hene, one will e le to detet when new σ-node is reted y mintining ounters for eh of the (suffix tree) nodes v i,..., v j to ount the numer of leves in their sutrees (in the suffix tree). These ounters re mintined on the suffix tree edges nd eh ounter v k is indexed y the first hrter on the edge (v, v k ). These ounters only need to e mintined for nodes whih re not σ-nodes nd hve prent whih is σ-node. Thus, mintining the ounters n e done s follows. When new lef is dded into given BIS of the suffix trist t suffix intervl node w of the BIS, where v is the prent of w, the ounter of v k in the BIS needs to e inresed, where v k is the one node (of the nodes of v i,..., v j of the suffix intervl of the BIS) whih is n nestor of the new lef. The node v k n e found in O(log Σ ) time y using BST v with the serh key eing the hrter tht ppers t lotion 12

lel(v) + 1 in the text; this hrter n e found in onstnt time with diret ess. Then, the ounter of v k is inremented. Notie tht when new internl node ws inserted into the suffix tree s desried in Setion 4, it is possile tht the newly inserted internl node is now one of the nodes v i,..., v j for n (i, j)-suffix intervl. In suh se, when the new node is inserted, it opies the numer of leves in its sutree from its only hild (s ws explined in Setion 4 the newly inserted lef is initilly ignored nd then treted s n independent insertion), s tht hild previously mintined the numer of leves in its sutree. Furthermore, from now on only the size of the sutree of the newly inserted node is updted, nd not the size of its hild sutree. Finlly, when node eomes σ-node for the first time, the ounters of ll of its hildren will need to e updted. This is explined in further detil in Setion 5.2. 5.2 Updting the New σ-node Let u e the new σ-node (whih is, of ourse, σ-lef) nd let v e its prent. As disussed in the previous susetion u {v i,..., v j } where v i,..., v j re hildren of v (in the suffix tree) nd just efore eoming σ-node there ws suffix intervl node w tht ws n (i, j)-suffix intervl with BIS representing it. Updting the new σ-node will require two things. First, the BIS is split into 3 prts; two new BISs nd the new σ-lef tht seprtes them. Seond, for the new σ-lef the seprting hrter is dded (esy) nd new set of ounters for the hildren of u is reted (more omplited). The first gol will e to split the BIS tht hs just een updted into three - the nodes orresponding to suffixes in u s sutree, the nodes orresponding to suffixes tht re lexiogrphilly smller thn the suffixes in u s sutree, nd the nodes orresponding to suffixes tht re lexiogrphilly lrger thn the suffixes in u s sutree. As is well-known, for given vlue x, splitting BST, lned serh tree, into two BSTs t vlue x n e implemented in O(h) time, where h is the height of the BST (see Setions 4.1 nd 4.2 in [19]). The sme is true for BISs (lthough some strightforwrd tehnilities re neessry to hndle the uxiliry informtion). Sine the height of BISs is O(log Σ ) one n split BIS into two BISs in O(log Σ ) time nd y finding the suffixes (nodes in the BIS) tht orrespond to the rightmost nd leftmost leves of the sutree of u, one n split the BIS into the three desired prts in O(log Σ ) time. Fortuntely, one n find the two nodes in the BIS in O(log Σ ) time y essing the BIS diretly through lef(u), nd then wlking up nd down the BIS. Rell tht we need to mintin ounters for nodes in the suffix tree whih re not σ-nodes nd hve prent whih is σ-node. So we need to initilize ounters for ll of the hildren of u in the suffix tree. Denote these hildren of u y u 1,..., u k. First notie tht the numer of suffixes in sutree of u i n e omputed in O(log Σ ) time y trversl in the BIS using lssil methods of inry serh trees. It is now shown tht there is enough time to initilize ll the ounters of u 1,..., u k efore one of them eomes σ-node, while still mintining the O(log Σ ) time ound per updte. Speifilly, the ounters will e updted during the first k insertions into the BIS of u (following the event of u eoming σ-node). At eh insertion two of the ounters re updted. Wht is 13

required is for the ounters to e ompletely updted prior to the next time they will e used, i.e. in time to detet new σ-node ourring in the sutree of u. The following lemm is preisely wht is needed. Lemm 4 Let u e node in the suffix tree, nd let u 1,..., u k e u s hildren (in the suffix tree). Sy u hs just eome σ-node. Then t this time, the numer of leves in eh of the sutrees of u s hildren is t most Σ k + 1. Proof: Assume y ontrdition tht this is not the se. Speifilly, ssume tht hild v i hs t lest Σ k + 2 leves in its sutree t this time. Clerly, the numer of leves in eh of the sutrees is t lest one. So summing up the numer of leves in ll of the sutrees of u 1,..., u k is t lest Σ k + 2 + k 1 = Σ + 1, ontrditing the ft tht u just eme σ-node (it should hve lredy een σ-node). Sine the size of the sutree of hild of u, sy u i, is no more thn Σ k + 1 t lest k 1 insertions will e required into the sutree of u i, nd hene into the sutree of u, efore u i eomes σ-node. Now, fter eh insertion into the sutree of u (regrdless of whih hild s sutree the new node ws inserted into) we initilize the next two ounters (y ontinuous trversl on BST u ). Hene, fter k insertions ll ounters re initilized. 2 5.3 When σ-lef Loses its Sttus The sitution where σ-lef eomes non-lef σ-node is se tht hs lredy een impliitly overed. Let v e σ-lef tht is out to hnge its sttus to non-rnhing-σ-node whih is not lef. This hppens euse one of its hildren v k is out to eome σ-lef. Notie tht just efore the hnge v is suffix intervl node. Hene, the BIS representing the suffix intervl needs to e split into three prts, nd the detils re extly the sme s in Setion 5.2. As efore this is done in O(log Σ ) time. 5.4 When σ-node Beomes Brnhing-σ-Node Let v e σ-node tht is hnging its sttus to rnhing-σ-node. Just efore it hnges its sttus it hd extly one hild v j whih ws σ-node. The hnge in sttus must our euse nother hild (in the suffix tree), sy v i, hs eome σ-lef (nd now tht v hs two hildren tht re σ-nodes it hs eome rnhing-σ-node). Assume, without loss of generlity, tht v i preedes v j in the list of v s hildren. Just efore eoming rnhing-σ-node, v ontined seprting hrter τ, the first hrter on the edge (v, v j ), nd two suffix intervl nodes w nd x, orresponding to the left intervl of v nd the right intervl of v, respetively. Now tht v i eme σ-lef w ws split into three prts (s desried in Setion 5.2). So, in the suffix trist the hildren of v re (1) suffix intervl node w L, (2) σ-lef v i, (3) suffix intervl node w R, (4) σ-node v j, nd (5) suffix intervl node x. Denote with B 1, B 2 nd B 3 the BISs tht represent the suffix intervl nodes w L, w R nd x. The min prolem here is tht onstruting the rry A v tkes too muh time, so one must use different pproh nd spred the onstrution over some time. A solution for whih the A v onstrution hrges its ost on future insertions (some of whih my never our) is shown first. 14

Then we show how to mke it worst-se solution. The hrges will e over the insertions into the BISs B 1, B 2 nd B 3 nd will e desried in the following lemm. Lemm 5 From the time tht v eomes rnhing-σ-node, t lest Σ insertions re required into B 1, B 2 or B 3 efore ny node in the sutree of v (in the suffix tree) tht is not in the sutrees of v i or v j eomes rnhing-σ-node. Proof: Clerly, t this time, ny node in the sutree of v (in the suffix tree) tht is not in the sutrees of v i or v j hs fewer thn Σ leves in its sutree. On the other hnd, note tht ny rnhing-σ-node must hve t lest 2 Σ leves in its sutree, s it hs t lest two hildren tht re σ-nodes, eh ontriuting t lest Σ leves. Thus, in order for node in the sutree of v (in the suffix tree) tht is not in the sutrees of v i or v j to eome rnhing-σ-node, t lest Σ leves need to e dded into its sutree, s required. This yields desired result, s one n lwys hrge the A v onstrution over its insertions into B 1, B 2 nd B 3. The ruil oservtion tht follows from Lemm 5 is tht A v will e onstruted efore rnhing-σ-node tht is desendnt of v ut not of v i or v j is hndled. We now use lzy pproh tht yields the worst-se result. We egin y using the folklore trik of initilizing the rry A v in onstnt time, see [2] (Ex. 2.12, pge 71) nd [17] (Setion III.8.1). Then every time n insertion tkes ple into one of B 1, B 2 or B 3, one more element is dded to the rry A v. Lemm 5 ssures tht A v will e onstruted efore rnhing-σ-node tht is desendnt of v ut not of v i or v j is hndled. We note tht in the interim, until A v is fully onstruted, we use BST v during n indexing query s follows. First we ompre the next hrter ρ of the serh with the first hrters on v i nd v j. If there is mth on either we ontinue the serh in the diretion of v i or v j ordingly. This is to mke sure tht the time spent t node v is onstnt if we ontinue to v i or v j. If this is not the se then we use BST v to serh for the edge leving v strting with ρ. This n e done in O(log Σ ) time. The following orollry of Lemm 5 ssures us tht we will use this ess to inry serh tree only one in the whole serh nd, hene, the O(log Σ ) time is dditive. Corollry 1 Let v e rnhing-σ-node with hildren v i nd v j eing the first σ-nodes. Let x e rnhing-σ-node tht is desendnt of v, ut not of v i or v j. Then A v is fully onstruted. Proof: By Lemm 5 there must e t lest Σ insertions into the sutree of v tht is not sutree of v i or v j efore x n e rnhing-σ-node. By then A v is fully onstruted. We n finlly onlude with the following theorem. Theorem 2 Let S e string over n lphet Σ. The suffix trist of S is (1) of size O(n), (2) supports text extensions in time O(log Σ ) + extension ST (n, Σ)) time (where f ST (n, Σ) is the time for text extension in the suffix tree) nd (3) supports indexing queries (of size m) in O(m+log Σ ) time. 6 Conlusions We hve shown how to rete nd mintin liner sized indexing strutures tht llow serhes in O(m+log Σ ) time for m length ptterns. We hve shown this oth for the stti nd online senr- 15

ios. In the stti version we hve produed the Suffix Try dt struture nd in the online version we hve produed the Suffix Trist dt struture. The dvntge of the Suffix Try is its simpliity, while llowing fst serhes. The Suffix Trist, lthough more involved, is len generlized dynmiztion of the Suffix Try. One future hllenge is in llowing deletions to tke ple in ddition to the insertions. For exmple, one my e interested in generlized suffix trist mnging olletion of texts so you n insert new texts nd delete texts from the olletion. Referenes [1] M.I. Aouelhod, S. Kurtz, nd E. Ohleush. Repling suffix trees with enhned suffix rrys. J. of Disrete Algorithms, 2(1):53-86, 2004. [2] A. V. Aho nd J. E. Hoproft nd J. D. Ullmn. The Design nd Anlysis of Computer Algorithms. Addison-Wesley Pulishing Compny, 1974. [3] A. Amir, T. Kopelowitz, M. Lewenstein, nd N. Lewenstein. Towrds Rel-Time Suffix Tree Constrution. Pro. of Symp. on String Proessing nd Informtion Retrievl (SPIRE), 67-78, 2005. [4] D. Bresluer nd G.F. Itlino. Ner rel-time suffix tree onstrution vi the fringe mrked nestor prolem. J. Disrete Algorithms, 18:32-48, 2013. [5] R. Cole nd M. Lewenstein. Multidimensionl mthing nd fst serh in suffix trees. Pro. of the Symposium on Disrete Algorithms (SODA), 851-852, 2003. [6] P.F. Dietz, D.D. Sletor. Two Algorithms for Mintining Order in List. In Pro. of Symposium on Theory of Computing (STOC), 1987, 365 372. [7] M. Frh-Colton, P. Ferrgin, S. Muthukrishnn. On the sorting-omplexity of suffix tree onstrution. J. of the ACM, 47(6): 987 1011, 2000. [8] G. Frneshini nd R. Grossi. A Generl Tehnique for Mnging Strings in Comprison- Driven Dt Strutures. In Pro. 31st Intl. Col. on Automt, Lnguges nd Progrmming (ICALP), LNCS 3142, 606 617, 2004. [9] R. Grossi nd G. F. Itlino. Effiient tehniques for mintining multidimensionl keys in linked dt strutures. Pro. of the Intl. Col. on Automt, Lnguges nd Progrmming (ICALP), 372 381, 1999. [10] J. Kärkkäinen, P. Snders nd S. Burkhrdt. Liner work suffix rry onstrution. In J. of the ACM, 53(6): 918-936, 2006. [11] D.K. Kim, J.S. Sim, H. Prk, nd K. Prk. Construting suffix rrys in liner time. J. Disrete Algorithms, 3(2-4): 126-142, 2005. [12] P. Ko nd S. Aluru. Spe effiient liner time onstrution of suffix rrys. J. Disrete Algorithms, 3(2-4): 143-156, 2005. 16

[13] T. Kopelowitz. On-Line Indexing for Generl Alphets vi Predeessor Queries on Susets of n Ordered List. Pro. of the Symposium on Foundtions of Computer Siene (FOCS), 283-292, 2012. [14] Moshe Lewenstein. Orthogonl Rnge Serhing for Text Indexing. Spe-Effiient Dt Strutures, Strems, nd Algorithms, LNCS 8066, 267-302, 2013. [15] U. Mner nd E.W. Myers. Suffix rrys: A new method for on-line string serhes. SIAM J. on Computing, 22(5):935-948, 1993. [16] E. M. MCreight. A spe-eonomil suffix tree onstrution lgorithm. J. of the ACM, 23:262 272, 1976. [17] K. Mehlhorn. Dt Strutures nd Algorithms 1: Sorting nd Serhing. EATCS Monogrphs in Theoretil Computer Siene. Spriger-Verlg, 1984. [18] M. Ruzi. Construting Effiient Ditionries in Close to Sorting Time. Pro. of the Intl. Col. on Automt, Lnguges nd Progrmming (ICALP (1)), 84-95, 2008. [19] R. E. Trjn. Dt Strutures nd Network Algorithms. Volume 44 of CBMS-NSF Regionl Conferene Series in Applied Mthemtis, SIAM, 1983. [20] E. Ukkonen. On-line onstrution of suffix trees. Algorithmi, 14:249 260, 1995. [21] P. Weiner. Liner pttern mthing lgorithm. Pro. 14th IEEE Symposium on Swithing nd Automt Theory, 1 11, 1973. 17