Engineering a Lightweight Suffix Array Construction Algorithm 1. Giovanni Manzini 2 and Paolo Ferragina 3

Size: px
Start display at page:

Download "Engineering a Lightweight Suffix Array Construction Algorithm 1. Giovanni Manzini 2 and Paolo Ferragina 3"

Transcription

1 Algorithmi (2004) 40: DOI: /s Algorithmi 2004 Springer-Verlg New York, LLC Engineering Lightweight Suffix Arry Constrution Algorithm 1 Giovnni Mnzini 2 nd Polo Ferrgin 3 Astrt. In this pper we desrie new lgorithm for uilding the suffix rry of string. This tsk is equivlent to the prolem of lexiogrphilly sorting ll the suffixes of the input string. Our lgorithm is sed on new pproh lled deep shllow sorting: we use shllow sorter for the suffixes with short ommon prefix, nd deep sorter for the suffixes with long ommon prefix. All the known lgorithms for uilding the suffix rry either require lrge mount of spe or re ineffiient when the input string ontins mny repeted sustrings. Our lgorithm hs een designed to overome this dihotomy. Our lgorithm is lightweight in the sense tht it uses very smll spe in ddition to the spe required y the suffix rry itself. At the sme time our lgorithm is fst even when the input ontins mny repetitions: this hs een shown y extensive experiments with inputs of size up to 110 M. The soure ode of our lgorithm, s well s C lirry providing simple API, is ville under the GNU GPL [26]. Key Words. tree. Suffix rry, Algorithmi engineering, Spe-eonomil lgorithms, Full-text index, Suffix 1. Introdution. In this pper we onsider the prolem of omputing the suffix rry of text string T [1, n]. This prolem onsists in sorting the suffixes of T in lexiogrphi order. The suffix rry [24] (or PAT rry [10]) is simple, esy to ode, nd elegnt dt struture used for severl fundmentl string mthing prolems involving oth linguisti texts nd iologil dt [5], [13]. Reently, interest in this dt struture hs een revitlized y its use s uilding lok for two novel pplitions: (1) the Burrows Wheeler ompression lgorithm [4], whih is provly [25] nd prtilly [29] effetive ompression tool; nd (2) the onstrution of suint [12], [28] or ompressed [8], [9], [11] indexes. In these pplitions the onstrution of the suffix rry is the omputtionl ottlenek oth in time nd spe. This motivted our interest in designing yet nother suffix rry onstrution lgorithm whih is fst nd lightweight in the sense tht it uses smll working spe. The suffix rry onsists of n integers in the rnge [1, n]. This mens tht in priniple it uses (n log n) its of storge. However, in most pplitions the size of the text is smller thn 2 32 nd it is ustomry to store eh integer in 4 yte word; this 1 This reserh ws prtilly supported y the Itlin MIUR projets Algorithmis for Internet nd the We (ALINWEB) nd Tehnologies nd Servies for Enhned Content Delivery (ECD). A preliminry version of this work hs ppered in Proeedings of the 10th Europen Symposium on Algorithms (ESA 02). 2 Diprtimento di Informti, Università del Piemonte Orientle, Alessndri, Itly, nd IIT-CNR, Pis, Itly. mnzini@mfn.unipmn.it. 3 Diprtimento di Informti, Università di Pis, Pis, Itly. ferrgin@di.unipi.it. Reeived Deemer 16, 2002; revised Otoer 12, Communited y R. Sedgewik. Online pulition April 26, 2004.

2 34 G. Mnzini nd P. Ferrgin yields totl spe oupny of 4n ytes. For wht onerns the ost of onstruting the suffix rry, the theoretilly est known lgorithms run in (n) time [6]. These lgorithms work y first uilding the suffix tree nd then otining the sorted suffixes vi n in-order trversl of the tree. However, suffix tree onstrution lgorithms re oth omplex nd spe onsuming sine they oupy t lest 15n ytes of working spe (or even more, depending on the text struture [22]). This mkes their use imprtil even for modertely lrge texts. For this reson, suffix rrys re usully uilt diretly using lgorithms whih run in O(n log n) time ut hve smller spe oupny. Among these lgorithms the urrent leder is the qsufsort lgorithm y Lrsson nd Sdkne [23]. qsufsort uses 8n ytes 4 nd it is muh fster in prtie thn the lgorithms sed on suffix tree onstrution. Unfortuntely, the size of our douments hs grown muh more quikly thn the min memory of our omputers. Thus, it is desirle to uild suffix rry using s smll spe s possile. Reently, Itoh nd Tnk [15] nd Sewrd [30] hve proposed two new lgorithms whih only use 5n ytes. We ll these lgorithms lightweight lgorithms to stress their (reltively) smll spe oupny. From the theoretil point of view these lgorithms hve (n 2 log n) worst-se time omplexity. In prtie they re fster thn qsufsort when the verge LCP (Longest Common Prefix) is smll. However, for texts with lrge verge LCP these lgorithms n e slower thn qsufsort y ftor of 100 or more. In this pper we desrie nd extensively test new lightweight suffix sorting lgorithm. Our min ide is to use very smll mount of extr memory, in ddition to 5n ytes, to void ny degrdtion in performne when the verge LCP is lrge. To hieve this gol we mke use of engineered lgorithms nd d ho dt strutures. Our lgorithm uses 5n + n ytes, where n e hosen y the user t run time; in our tests ws t most The theoretil worst-se time omplexity of our lgorithm is still (n 2 log n), ut its ehvior in prtie is quite good. Extensive experiments, rried out on four different rhitetures, show tht our lgorithm is fster thn ny other tested lgorithm. Only on single instne single file on single rhiteture ws our lgorithm outperformed y qsufsort. 2. Definitions nd Previous Results. Let T [1, n] denote text over the lphet. The suffix rry [24] (or PAT rry [10]) for T is n rry SA[1, n] suh tht T [SA[1], n], T [SA[2], n], et. is the list of suffixes of T sorted in lexiogrphi order. For exmple, for T = then SA = [2, 1, 3, 5, 4] sine T [2, 5] = is the suffix with the lowest lexiogrphi rnk, followed y T [1, 5] =, followed y T [3, 5] = nd so on. 5 Given two strings v, w we write LCP(v, w) to denote the length of their longest ommon prefix. The verge LCP of text T is defined s the verge length of the LCP 4 Here nd in the following the spe oupny figures inlude the spe for the input text, for the suffix rry, nd for ny uxiliry dt struture used y the lgorithm. 5 Note tht to define the lexiogrphi order of the suffixes it is ustomry to ppend t the end of T speil end-of-text symol whih is smller thn ny symol in.

3 Engineering Lightweight Suffix Arry Constrution Algorithm 35 etween two onseutive suffixes, tht is, verge LCP = ( 1 n 1 ) n 1 LCP(T [SA[i], n], T [SA[i + 1], n]). i=1 The verge LCP is rough mesure of the diffiulty of sorting the suffixes: if the verge LCP is lrge we need in priniple to exmine mny hrters in order to estlish the reltive order of two suffixes. Note however tht most suffix sorting lgorithms do not ompre suffixes with simple hrter-y-hrter omprison, thus the verge LCP is not the only prmeter whih plys role in this prolem. In the rest of the pper we mke the following ssumptions whih orrespond to the sitution most often fed in prtie. We ssume 256 nd tht eh lphet symol is stored in 1 yte. Hene, the text T [1, n] oupies preisely n ytes. Furthermore, we ssume tht n 2 32 nd tht the strting position of eh suffix is stored in 4 yte word. Hene, the suffix rry SA[1, n] oupies preisely 4n ytes. In the following we use the term lightweight to denote suffix sorting lgorithm whih uses 5n ytes plus some smll mount of extr memory (we re intentionlly giving n informl definition). Note tht 5n ytes re just enough to store the input text T nd the suffix rry SA. Although we do not lim tht 5n ytes re indeed required, we do not know of ny lgorithm using less spe. To test the suffix rry onstrution lgorithms we use the olletion of files shown in Tle 1. These files ontin different kinds of dt in different formts; they lso disply wide rnge of sizes nd of verge LCPs The Lrsson Sdkne qsufsort Algorithm. The qsufsort lgorithm [23] is sed on the douling tehnique introdued in [18] nd first used for the onstrution of the suffix rry in [24]. Given two strings v, w nd t > 0 we write v< t w if the length-t prefix of v is lexiogrphilly smller thn the length-t prefix of w. Similrly we define the symols t nd = t. Let s 1, s 2 denote two suffixes nd ssume s 1 = t s 2 (tht is, T [s 1, n] nd T [s 2, n] hve length-t ommon prefix). Let ŝ 1 = s 1 + t denote the suffix T [s 1 + t, n] nd similrly let ŝ 2 = s 2 + t. The fundmentl oservtion of the douling Tle 1. Files used in our experiments sorted in order of inresing verge LCP. Nme Ave. LCP Mx. LCP File size Desription sprot , ,617,186 Swiss prot dtse (originl file nme sprot34.dt) rf , ,421,901 Contention of RFC text files howto ,720 39,422,105 Contention of Linux Howto text files reuters , ,711,151 Reuters news in XML formt linux , ,254,720 Tr rhive ontining the Linux kernel soure files jdk ,334 69,728,899 Contention of html nd jv files from the JDK 1.3 do. etext99 1, , ,277,340 Contention of Projet Gutemerg etext99/*.txt files hr22 1, ,999 34,553,758 Genome ssemly of humn hromosome 22 g 8, ,970 86,630,400 Tr rhive ontining the g 3.0 soure files w3 42, , ,201,579 Contention of html files from

4 36 G. Mnzini nd P. Ferrgin tehnique is tht (1) s 1 2t s 2 ŝ 1 t ŝ 2. In other words, we n derive the 2t order etween s 1 nd s 2 y looking t the rnk of ŝ 1 nd ŝ 2 in the t order. The lgorithm qsufsort works in rounds. At the eginning of the ith round the suffixes re lredy sorted ording to the 2 i ordering. In the ith round the lgorithm looks t groups of suffixes shring the first 2 i hrters nd sorts them ording to the 2 i+1 ordering using the Bentley MIlroy ternry quiksort [1]. Beuse of (1) eh omprison in the quiksort lgorithm tkes O(1) time. After t most log n rounds ll the suffixes re sorted. Thnks to very lever dt orgniztion qsufsort only uses 8n ytes. Even more surprisingly, the whole lgorithm fits in two pges of len nd elegnt C ode. The experiments reported in [23] show tht qsufsort outperforms other suffix sorting lgorithms sed on either the douling tehnique or the suffix tree onstrution. The only lgorithm whih runs fster thn qsufsort, ut only for files with verge LCP less thn 20, is the Bentley Sedgewik multikey quiksort [2]. Multikey quiksort is diret omprison lgorithm sine it onsiders the suffixes s ordinry strings nd sorts them vi hrter-y-hrter omprison without tking dvntge of their speil struture. In this pper we did not onsider multikey quiksort sine it is well known tht it is ineffiient when the verge LCP is lrge. However, for inputs with smll verge LCP it is one of the fstest lgorithms: see [21] for n effiient suffix sorting lgorithm sed on multikey quiksort The Itoh Tnk two-stge Algorithm. In [15] Itoh nd Tnk desrie suffix sorting lgorithm lled two-stge suffix sort (two-stge from now on). two-stge only uses the text T nd the suffix rry SAfor totl spe oupny of 5n ytes. To desrie how it works, we ssume ={,,...,z} nd let SAe initilized s SA[i] = i. Using ounting sort, two-stge initilly sorts the rry SA ording to the 1 ordering. Then it logilly prtitions SA into ukets B,...,Bz. A uket is set of onseutive entries of SAontining the suffixes whih strt with the sme hrter, from to z in our illustrtive exmple. Within eh uket two-stge distinguishes etween two types of suffixes: Type A suffixes in whih the seond hrter of the suffix is smller thn the first, nd Type B suffixes in whih the seond hrter is lrger thn or equl to the first suffix hrter. Within eh uket two-stge stores Type A suffixes first, followed y Type B suffixes. This is orret sine Type A suffixes lexiogrphilly preede Type B suffixes. The ruil oservtion of lgorithm two-stge is tht when ll Type B suffixes re sorted, we n esily derive the ordering of the Type A suffixes. This n e done with single pss over the rry SA: when we meet suffix s i = T [i, n] we look t suffix s i 1 = T [i 1, n], if s i 1 is Type A suffix we move it to the first empty position of uket B T [i 1]. Type B suffixes re sorted using textook string sorting lgorithms: in their implementtion the uthors use MSD rdix sort [27] for sorting lrge groups of suffixes, Bentley Sedgewik multikey quiksort for medium size groups, nd insertion sort for smll groups. Summing up, two-stge n e onsidered n dvned diret ompr-

5 Engineering Lightweight Suffix Arry Constrution Algorithm 37 ison lgorithm sine Type B suffixes re sorted y diret omprison wheres Type A suffixes re sorted y muh fster proedure whih tkes dvntge of the speil struture of the suffixes. In [15] the uthors ompre two-stge with three diret-omprison lgorithms (quiksort, multikey quiksort, nd MSD rdix sort) nd with n erlier version of qsufsort. two-stge turns out to e roughly four times fster thn quiksort nd MSD rdix sort, nd from two to three times fster thn multikey quiksort nd qsufsort. However, the files used for the experiments hve n verge LCP of t most 31, nd we know tht the dvntge of douling lgorithms (like qsufsort) with respet to diret omprison lgorithms eomes pprent for muh lrger verge LCPs. Some improvements to lgorithm two-stge hve een reently desried in [16]. Although these improvements re sed on some interesting lgorithmi ides, we do not desrie them here sine they led to n lgorithm whih is not lightweight its spe requirement eing 9n ytes Sewrd opy Algorithm. Independently of Itoh nd Tnk, Sewrd desries in [30] lightweight lgorithm, lled opy, whih is sed on onept similr to the Type A/Type B suffixes used y lgorithm two-stge. Using ounting sort, opy initilly sorts the rry SA ording to the 2 ordering. As efore we use the term uket to denote the ontiguous portion of SA ontining set of suffixes shring the sme first hrter. We use the term su-uket to denote the ontiguous portion of SA ontining suffixes shring the first two hrters. There re ukets, eh one onsisting of su-ukets. One or more (su-)ukets n e empty. In the following we use the symol B α to denote the uket ontining the suffixes strting with hrter α, nd we use the symol αβ to denote the su-uket ontining the suffixes strting with the hrter-pir αβ. opy sorts the ukets one t time strting with the one ontining the fewest suffixes, nd proeeding up to the lrgest one. Assume for simpliity tht ={,,...,z}.to sort uket, sy Bp, opy sorts the su-ukets p, p,...,pz individully. The ruil point of lgorithm opy is tht when uket Bp is ompletely sorted, with simple pss over it opy sorts ll the su-ukets p, p,..., zp. These su-ukets re mrked s sorted nd opy skips them when their prent uket is sorted. In other words, ssuming B is sorted fter Bp, when we sort B we skip p nd ny other lredy sorted su-uket within B. As further improvement, Sewrd shows tht even the sorting of the su-uket pp n e voided sine its ordering n e derived from the ordering of the su-ukets p,...,po nd pq,...,pz. This trik, first suggested in [4], is extremely effetive when working on files ontining long runs of identil hrters. Algorithm opy sorts the su-ukets using the Bentley MIlroy ternry quiksort. During this sorting the suffixes re onsidered tomi, tht is, eh omprison onsists of the snning of two entire suffixes. The stndrd trik of sorting the lrgest side of the prtition lst nd eliminting til reursion ensures tht the mount of spe required y the reursion stk grows, in the worst se, logrithmilly with the size of the input text. In [30] Sewrd ompres tuned implementtion of opy with the qsufsort lgorithm on set of files with verge LCP up to 400. In these tests opy outperforms qsufsort for ll files ut one. However, Sewrd reports tht opy is muh slower thn qsufsort when

6 38 G. Mnzini nd P. Ferrgin the verge LCP exeeds 1000, nd for this reson he suggests the use of qsufsort s fllk in tht se Sewrd he Algorithm. In [30] Sewrd desries how to improve lgorithm opy in order to del etter with files with lrge verge LCP. The new lgorithm, lled he, uses n uxiliry rry R[1, n] of 16 it integers. Initilly ll entries in R re set to zero. When the sorting of uket B is ompleted, for eh suffix T [k, n] inb we write in R[k] the most signifint 16 its of its rnk (the rnk r of T [k, n] is its position in the sorted suffix rry, tht is, SA[r] = k). If t ny point in the lgorithm we re ompring the suffixes T [i, n] nd T [ j, n]we n proeed s follows. If T [i] = T [ j] we ompre R[i] nd R[ j]: if they differ we hve the orret ordering of T [i, n] nd T [ j, n]. If R[i] = R[ j], we next ompre T [i + 1] nd T [ j + 1]; if they re equl we n ompre R[i + 1] nd R[ j + 1], nd so on. Note tht we use R to determine the ordering of suffixes whih hve the sme first hrter. Hene, we n store in R[i] the rnk of T [i, n] reltive to its uket. Tht is, we do not need the solute rnk of T [i, n], ut only its rnk mong the suffixes strting with the hrter T [i]. In the experiments reported in [30], opy ws fster thn qsufsort nd he for files with smll verge LCP (up to 30). he ws the fstest lgorithm for files with lrge verge LCP, nd qsufsort ws the fstest only for single file with n verge LCP of However, in the experiments of [30] the input files were split in loks of size 10 6 ytes, nd the mximum verge LCP ws ; this explins the reltively poor performne of qsufsort whih is effiient when the verge LCP is lrge. Algorithm he s desried ove uses 7n ytes: 5n for T nd SA plus 2n for R. Sine we re interested in lightweight lgorithms we hve modified it in order to redue its spe oupny to 6n ytes. This hs een hieved y defining R[1, n] s n rry of eight it integers. Clerly, this redues the effetiveness of R: now we n only store the eight most signifint its of the rnks, nd therefore ties re more likely when we ompre the vlues stored in R. To ompenste for this, we store in R rnks reltive to the su-ukets. Hene, s soon s the su-uket T [k]t [k+1] is sorted, we store in R[k] the eight most signifint its of the rnk of T [k, n] within T [k]t [k+1]. In the following we write he 6n to denote this modified he lgorithm Preliminry Experimentl Results. We hve tested the three lgorithms qsufsort, opy, nd he 6n (our spe eonomil version of he) on our suite of test files (see Tle 1). We hve used two mhines with different rhitetures: 1000 MHz Pentium III with 256 KB L2 he, nd 933 MHz PowerPC G4 with 256 KB L2 he nd 2 M L3 he (the L3 he runs t hlf the proessor speed). The results of our experiments re reported in the top three rows of Tle 2 for the Pentium nd Tle 3 for the PowerPC. The sme dt re represented s histogrms in Figure 1. Note tht the test files re ordered y inresing verge LCP. Conerning the reltive performnes of the three lgorithms our results re in ordne with Sewrd s oservtions reported in [30]. opy is fster thn qsufsort when the verge LCP is smll, nd it is slower when the verge LCP is lrge. he 6n is fster thn qsufsort roughly hlf of the times ut there is no ler reltionship etween their reltive speed nd the verge LCP of the input files.

7 Engineering Lightweight Suffix Arry Constrution Algorithm 39 Tle 2. Running times (in seonds) for 1000 MHz Pentium III proessor, with 1 G min memory nd 256 K L2 he. sprot rf howto reuters linux jdk13 etext99 hr22 g w3 qsufsort he 6n , opy , ,180.7 ds0 L = , ,137.0 ds0 L = , ,107.7 ds0 L = , ,290.6 ds0 L = , ,024.2 ds1 L = ds1 L = ds1 L = ds1 L = ds2 d = ds2 d = ds2 d = ds2 d = The operting system ws GNU/Linux Red Ht 7.1. The ompiler ws g ver with options -O3 -fomit-frme-pointer. The tle reports (user + system) time verged over five runs. The running times do not inlude the time spent reding the input files. The test files re ordered y inresing verge LCP. Tle 3. Running times (in seonds) for 933 MHz PowerPC G4 proessor, with 1 G min memory, 256 K L2 he nd 2 M L3 he. sprot rf howto reuters linux jdk13 etext99 hr22 g w3 qsufsort he 6n , opy , ,006.3 ds0 L = , ,587.5 ds0 L = , ,569.4 ds0 L = , ,580.5 ds0 L = , ,565.1 ds1 L = ds1 L = ds1 L = ds1 L = ds2 d = ds2 d = ds2 d = ds2 d = The operting system ws GNU/Linux Mndrke 8.2. The ompiler ws g ver with options -O3 -fomit-frmepointer. The tle reports (user + system) time verged over five runs. The running times do not inlude the time spent reding the input files. The test files re ordered y inresing verge LCP.

8 40 G. Mnzini nd P. Ferrgin Fig. 1. Grphil representtions of the running times of qsufsort, opy, nd he 6n reported in Tles 2 nd 3. Note tht the histogrms for opy on g nd w3 nd for he 6n on g hve een trunted sine the running times re well eyond the upper limit of the Y-xis. The test files re ordered y inresing verge LCP. If we ompre the dt in Tles 2 nd 3 we see tht ll lgorithms run fster on the Pentium thn on the PowerPC with the exeption of lgorithm opy on the files with the lrgest verge LCP. This is gin in ordne with Sewrd s nlysis of the lgorithms qsufsort, opy, nd he. In Setion 5.3 of [30] Sewrd hs shown tht qsufsort does mny rndom esses to the memory nd therefore does not fully enefit from the proessor he. This is true, to lesser extent, lso for he; wheres opy ws the lgorithm generting the smllest numer of he misses. Thus, it is to e expeted tht opy enefits of the lrge L3 he of the PowerPC; nd indeed the phenomenon is more notiele when the verge LCP is lrge, sine in this se most of the work of opy onsists in ompring pirs of suffixes y mens of sequentil sns. The dt in Tles 2 nd 3 lso show tht the hrdness of uilding the suffix rry does not depend on the verge LCP nd file size lone. For exmple, the file reuters hs n verge LCP smller thn linux nd roughly the sme size. Nevertheless, uilding the suffix rry for reuters tkes more time for ll lgorithms. On the PowerPC, uilding the suffix rry for reuters with qsufsort nd he 6n tkes more time thn for the file w3 whih hs n verge LCP 150 times lrger. Another phenomenon worth mentioning is the ehvior of opy nd he 6n on the files g nd w3. w3 is 20% lrger thn g nd its verge LCP is five times lrger. Surprisingly, uilding the suffix rry for g seems to e more diffiult tsk for opy nd he 6n; for he 6n the running time for g is more thn ten times lrger thn the time tken on w3 (note tht in Figure 1 the histogrms for opy nd he 6n on g hve een trunted). A few experiments hve shown tht the performnes of opy nd he 6n on g n e improved using etter pivot seletion strtegy in the Bentley MIlroy ternry quiksort (whih is used to sort the su-ukets). However, we hve not een le to dislose this pprently ounterintuitive ehvior fully. 6 Note tht qsufsort shows 6 As we hve lredy pointed out, lgorithms opy nd he were oneived nd engineered to work on loks of dt of size t most 1 M. They re not to e lmed if they re osionlly ineffiient on inputs of size 80 M nd more!

9 Engineering Lightweight Suffix Arry Constrution Algorithm 41 the expeted ehvior: its running time for g is roughly hlf the running time for w3. Summing up, the dt in Tles 2 nd 3 show tht qsufsort is very fst nd roust lgorithm. Its only downside is tht it uses 8n spe. he 6n whih only uses 6n spe is lso quite fst, ut its ehvior on g suggests tht it is not s roust s qsufsort. Finlly, if we re tight on spe nd we re fored to use opy we must e prepred to wit long time for files with lrge verge LCP: for g nd w3 opy is times slower thn qsufsort. In the next setion we desrie new lightweight lgorithm whih retins the nie fetures of opy smll spe oupny nd good performne for files with moderte verge LCP without suffering from signifint slowdown when the verge LCP is lrge. 3. Our Contriution: Deep-Shllow Suffix Sorting. Our strting point for the design of n effiient lightweight suffix rry onstrution lgorithm is Sewrd s opy lgorithm. Within this lgorithm we reple the proedure used for sorting the su-ukets, i.e., the groups of suffixes hving the first two hrters in ommon. Insted of using the Bentley MIlroy ternry quiksort we use more sophistited tehnique. We sort the su-ukets using the Bentley Sedgewik multikey quiksort, stopping the reursion when we reh predefined depth L, tht is, when we hve to sort group of suffixes with length-l ommon prefix. At this point we swith to different string sorting lgorithm (to e desried next). This pproh hs severl dvntges: 1. it provides simple nd effiient mens to detet the groups of suffixes with long ommon prefix; 2. euse of the limit L, the size of the reursion stk is ounded y predefined onstnt whih is independent of the size of the input text nd n e tuned y the user; 3. if the suffixes in the su-uket hve ommon prefixes whih never exeed L, their sorting is done y multikey quiksort whih is n extremely effiient string sorting lgorithm when the verge LCP is smll (see the lst prgrph of Setion 2.1). We ll this pproh deep shllow suffix sorting sine we mix n lgorithm for sorting suffixes with short LCP (shllow sorter) with n lgorithm (tully more thn one, s we shll see) for sorting suffixes with long LCP (deep sorter). In the next setions we desrie severl deep sorting strtegies, tht is, lgorithms for sorting suffixes hving ommon prefix longer thn L Blind Sorting. Let s 1, s 2,...,s m denote group of m suffixes with length-l ommon prefix tht we need to deep sort. If m is smll (we disuss lter wht this mens) we sort them using n lgorithm, lled lind sort, whih is sed on the lind trie dt struture introdued in Setion 2.1 of [7] (see Figure 2). Blind sorting simply onsists of inserting the strings s 1,...,s m one t time in n initilly empty lind trie; then we trverse the trie from left to right thus otining the strings sorted in lexiogrphi order. Oviously in the onstrution of the trie we ignore the first L hrters of eh suffix sine we know tht they re identil.

10 42 G. Mnzini nd P. Ferrgin Compted Trie Blind Trie Fig. 2. A stndrd ompted trie (left) nd the orresponding lind trie (right) for the strings,,,,,, nd. Eh internl node of the lind trie ontins n integer nd set of outgoing leled rs. A node ontining the integer k represent set of strings whih hve length-k ommon prefix nd differ in the (k + 1)st hrter. The outgoing rs re leled with the different hrters tht we find in position k + 1. Note tht sine the outgoing rs re ordered lphetilly, y visiting the trie leves from left to right we get the strings in lexiogrphi order. The insertion of string s i in the trie onsists of first phse in whih we sn s i nd simultneously trverse the trie top-down until we reh lef l. Then we ompre s i with the string, sy s j, ssoited to lef l nd we determine the length of their ommon prefix. This length nd the mismthing hrter llow us to identify the position in the trie where the new lef orresponding to s i hs to e inserted (see [7] for detils). The ruil point of the lgorithm is tht for the insertion of s i in the trie the only opertions involving the suffixes s 1,...,s i re: 7 1. sequentil ess to s i during the trversl of the trie, nd 2. the sequentil sn of s i nd s j during their omprison. Thus, our lgorithm sorts the suffixes using only he-friendly sequentil string sns. Note tht we re negleting in this nlysis the ost of trie trversl euse the trie is smll, sine m is hosen to e smll, nd thus the ost of suffix omprisons domintes the ost of trie peroltion. We point lso out tht the string-sed Bentley MIlroy ternry quiksort lgorithm, used within opy, sorts the suffixes y mens of sequentil sns. However, ternry quiksort exeutes on verge (m log m) sequentil sns, wheres our lind sorting lgorithm exeutes only (m) sequentil esses to the suffixes. This improvement 7 In the following we use the expression sequentil ess to s when n lgorithm reds the hrters s[ j 1 ], s[ j 2 ],...,s[ j k ] with j 1 < j 2 < < j k. We use the expression sequentil sn when n lgorithm reds onseutive hrters: s[0], s[1],...,s[k].

11 Engineering Lightweight Suffix Arry Constrution Algorithm 43 over ternry quiksort is pid for in terms of the extr memory required for storing the trie dt struture. This mens tht we nnot use lind sorting for n ritrrily lrge group of suffixes. Our implementtion of lind sort uses t most 36m ytes of memory. We use it when the numer of suffixes to e sorted is less thn B = n/2000. With this hoie the spe overhed of using lind sort is t most 9n/500 ytes. If the text is 100 M long, this overhed is 1.8 M whih should e ompred with the 500 M required y the text nd the suffix rry. 8 If the numer of suffixes to e sorted is lrger thn B = n/2000, we sort them using the Bentley MIlroy ternry quiksort. However, with respet to the ternry quiksort lgorithm used y opy for sorting the su-ukets, we introdue the following two improvements: 1. As soon s we re working with group of suffixes smller thn B we stop the reursion nd we sort them using lind sort. 2. During eh ternry quiksort prtitioning phse, we ompute L S (resp. L L ) whih is the LCP etween the pivot nd the strings whih re lexiogrphilly smller (resp. lrger) thn the pivot. When we sort the strings whih re smller (resp. lrger) thn the pivot, we n skip the first L S (resp. L L ) hrters sine we know they onstitute ommon prefix. We ll ds0 the suffix sorting lgorithm whih uses multikey quiksort up to depth L nd then swithes to the lind-sort/ternry-quiksort omintion desried ove. The performne of ds0 is reported in Tles 2 nd 3 for severl vlues of the prmeter L. We n see tht ds0 is fster thn qsufsort nd he 6n on hr22 nd on the five files with the smllest verge LCP. We n lso see tht ds0 is lwys fster thn opy nd tht for the file g ds0 hieves tenfold running time redution. This is ertinly good strt. We now show how to redue the running time further y tking dvntge of the ft tht the strings we re sorting re ll suffixes of the sme text Indued Sorting. One of the nie fetures of the lgorithms two-stge, opy, nd he 6n is tht some of the suffixes re not sorted y diret omprison: their reltive order is derived in onstnt time from the ordering of other suffixes whih hve lredy een sorted. We use generliztion of this tehnique in the deep-sorting phse of our lgorithm. Assume we need to sort the suffixes s 1,...,s m whih hve length-l ommon prefix. We sn the first L hrters of s 1 looking t eh pir of onseutive hrters, nmely T [s 1 + i]t [s 1 + i + 1] for i = 0,...,L 1. As soon s we find pir of hrters, sy αβ, elonging to n lredy sorted su-uket αβ, we derive the ordering of s 1,...,s m from the ordering of αβ s follows. Let α = T [s 1 + t] nd β = T [s 1 + t + 1] for some t < L 1. Sine s 1,...,s m hve length-l ommon prefix, every s i ontins the hrter-pir αβ strting t position t. Hene αβ ontins m suffixes orresponding to s 1,...,s m, tht is, αβ ontins the 8 Although we elieve this is smll overhed, we point out tht the limit B = n/2000 ws hosen somewht ritrrily. Experimentl results show tht there is only mrginl degrdtion in performne when we tke B = n/3000 or B = n/4000.

12 44 G. Mnzini nd P. Ferrgin suffixes strting t s 1 +t, s 2 +t,...,s m +t. The good news is tht the first t 1 hrters of s 1,...,s m re identil, so tht the ordering of s 1,...,s m n e derived from the ordering of the orresponding suffixes in αβ. The d news is tht these orresponding suffixes re not neessrily onseutive in αβ, even if they re expeted to e lose to eh other euse of their long ommon prefix. Comining these oservtions we derive the ordering of s 1,...,s m s follows: 1-is 2-is 3-is 4-is We sort the suffixes s 1,...,s m ording to their strting position in the input text T [1, n]. This is done so tht in Step 3-is elow we n use inry serh to nswer memership queries in the set s 1,...,s m. Let ŝ denote the suffix strting t the text hrter T [s 1 + t]. We sn the suuket αβ in order to find the position of ŝ within αβ. We sn the suffixes preeding nd following ŝ in the su-uket αβ. For eh suffix s we hek whether the suffix strting t the hrter T [s t] is in the set s 1,...,s m ; if so we mrk the suffix s. 9 When m suffixes in αβ hve een mrked, we sn them from left to right. Sine αβ is sorted this gives us the orret ordering of s 1,...,s m. The effetiveness of the ove proedure depends on how mny suffixes re snned t Step 3-is efore ll the suffixes orresponding to s 1,...,s m re found nd mrked. We expet tht this numer is smll sine, s we lredy oserved, the suffixes orresponding to s 1,...,s m re expeted to e lose to eh other in αβ. Oviously there is no gurntee tht in the length-l ommon prefix of s 1,...,s m there is pir of hrters elonging to n lredy sorted su-uket. In this se we nnot use indued sorting nd we resort to the lind-sort/quiksort omintion. We ll ds1 the lgorithm whih uses indued sorting nd we report its performne for severl vlues of L in Tles 2 nd 3. ds1 ppers to e slightly slower thn ds0 for files with smll verge LCP ut it is lerly fster for the files with lrge verge LCP: for w3 it is more thn ten times fster. We n see tht ds1 with L = 2000 runs fster thn qsufsort nd he 6n for ll files exept g nd w Anhor Sorting. Profiling shows tht the most ostly opertion of indued sorting is the snning of the su-uket αβ to serh for the position of suffix ŝ (Step 2-is ove). We show how to void this opertion using smll mount of extr memory. We prtition the text T [1, n] into n/d segments of length d: T [1, d], T [d + 1, 2d], nd so on (for simpliity we ssume tht d divides n). We define two rrys Anhor[ ] nd Offset[ ] of size n/d suh tht: Offset[i] ontins the position of the leftmost suffix whih strts in the ith segment nd elongs to n lredy sorted su-uket. If no suffix elonging to n lredy sorted smll uket strts in the ith segment, then Offset[i] = 0. Let s i denote the suffix whose strting position is stored in Offset[i]. Anhor[i] ontins the position of s i within its su-uket. 9 We mrk the suffixes y setting the most signifint it of the integer whih represent the suffix s. This mens tht our lgorithm n work with texts of size t most 2 31 ytes. Note tht the sme restrition holds for qsufsort s well.

13 Engineering Lightweight Suffix Arry Constrution Algorithm 45 Note tht the rrys Offset nd Anhor provide sort of prtil inverse of the (lredy omputed portion of the) suffix rry. In this sense they re similr to the rry R[ ] used y he nd he 6n whih stores the most signifint its of the rnks of the lredy sorted suffixes. The use of the rrys Anhor[ ] nd Offset[ ] within indued sorting is firly simple. Assume tht we need to sort the suffixes s 1,...,s m whih hve length-l ommon prefix. For j = 1,...,m, let z j denote the segment ontining the strting position of s j.if s zj (tht is, the leftmost lredy sorted suffix in segment z j ) strts within the first L hrters of s j (tht is, s j < s zj < s j + L), then we n sort s 1,...,s m using the indued sorting lgorithm desried in the previous setion. However, we n now skip Step 2-is sine the position of s zj within its su-uket is stored in Anhor[z j ]. Oviously it is possile tht, for some j, s zj does not exist or nnot e used euse it preedes s j or follows s j + L. However, sine the suffixes s 1,...,s m usully elong to different segments, we hve m possile ndidtes. In our implementtion, mong the ville sorted suffixes s zj s, we use the one whose strting position is losest to the orresponding s j, tht is, we hoose j whih minimizes s zj s j > 0. This hoie helps Step 3-is of the indued sorting sine using the nottion of Step 3-is it mximizes L t nd thus minimizes the numer of suffixes s suh tht the suffix strting t T [s t] is not in the set s 1,...,s m. If, for j = 1,...,m, s zj does not exist or nnot e used, then we resort to the lind-sort/quiksort omintion. For updting the rrys Offset nd Anhor we use the following strtegy. The strightforwrd pproh is to updte them eh time we omplete the sorting of su-uket. Insted we updte them t the end of eh ll to deep sorting, tht is, eh time we omplete the sorting of set of suffixes whih shre length-l ommon prefix. This pproh hs twofold dvntge: Updtes re done only when we hve useful dt. As n exmple, if su-uket is sorted y shllow sorting lone, tht is, ll suffixes differ within the first L hrters, the suffixes in tht smll uket re not used to updte Offset nd Anhor. The rtionle is tht these suffixes re not very useful for indued sorting. It is esy to see tht they n e used only for determining the ordering of suffixes whih differ within the first d + L hrters while we know tht indued sorting is dvntgeous only when used for suffixes whih hve very long ommon prefix. Updtes re done s erly s possile. When we omplete the sorting of set of suffixes s 1,...,s m whih shre length-l ommon prefix, we use them to updte the rrys Offset nd Anhor without witing for the ompletion of the sorting of their su-uket. This mens tht nhor sorting n use s 1,...,s m to determine the order of set of suffixes whih re in the sme su-uket s s 1,...,s m. Conerning the spe oupny of nhor sorting, we oserve tht in Offset[i] we n store the distne etween the eginning of the ith segment nd the leftmost sorted suffix in the segment. Hene Offset[i] is lwys smller thn the segment length d. If we tke d < 2 16 we n store the rry Offset in 2n/d ytes. Sine eh entry of Anhor requires 4 ytes, the overll spe oupny is 6n/d ytes. In our tests d ws t lest 500 whih yields n overhed of 6n/500 ytes. If we dd the 9n/500 ytes required y lind sorting with B = n/2000, we get mximum overhed of t most 3n/100 ytes. Hene,

14 46 G. Mnzini nd P. Ferrgin Tle 4. Running times (in seonds) for 1400 MHz Athlon XP nd 1700 MHz Pentium 4. sprot rf howto reuters linux jdk13 etext99 hr22 g w MHz Athlon XP qsufsort he 6n ds2 d = ds2 d = ds2 d = ds2 d = MHz Pentium 4 qsufsort he 6n ds2 d = ds2 d = ds2 d = ds2 d = Both mhines were equipped with with 1 G min memory nd 256 K L2 he. The operting system on the Athlon ws GNU/Linux Dein 2.2; the ompiler ws g ver with options -O3 -fomit-frme-pointer. The operting system on the Pentium 4 ws GNU/Linux Mndrke 9.0; the ompiler ws g ver. 3.2 with options -O3 -fomit-frme-pointer -mrh=pentium4. The tle reports (user + system) time verged over five runs. The running times do not inlude the time spent reding the input files. The test files re ordered y inresing verge LCP. for 100 M text the overhed is t most 3 M, whih we onsider smll mount ompred with the 500 M used y the text nd the suffix rry. In Tles 2 nd 3 we report the running time of nhor sorting under the nme ds2 for d rnging from 500 to 5000 nd L = d In Tle 4 we report the running time of qsufsort, he 6n, nd ds2 on n Athlon XP nd Pentium 4. A first oservtion is tht deresing the prmeter d (tht is, inresing the numer of nhors) within ds2 does not lwys yield redution of the running time. Indeed, for the file sprot the est results re otined for d = 5000; for the files rf, reuters, nd w3 the est results re otined for d = 1000; for the other files the fstest lgorithm is the one with d = 500. Note tht there is not n ovious reltionship etween the optiml vlue of d nd the verge LCP of the input file. This ehvior hs een onfirmed y some dditionl experiments: for exmple, using d = 200 we get 20% redution in the running time for jdk13 ut for the other files the running times re very lose to those otined for d = 500. To ompre ds2 with the other lgorithms, in ddition to the dt in Tles 2 4, in Figure 3 we show grphil omprison of the running times of qsufsort, he 6n, nd ds2 with d = 500 on the four different rhitetures used in our tests. We n see tht for the files with moderte verge LCP ds2 with d = 500 is signifintly fster thn opy nd he 6n nd roughly twie s fst s qsufsort. For the files with lrge verge LCP, ds2 is lwys fster thn he 6n nd it is fster thn qsufsort for ll files exept g.forg ds2 is fster thn qsufsort on the Pentium 4 nd slower on the Pentium III; on the PowerPC nd the Athlon the two lgorithms hve roughly the sme speed. A omment on the performne on the Pentium 4 is in order. We n see tht most of the times the 1700 MHz Pentium 4 is signifintly slower thn the 1000 MHz Pentium III.

15 Engineering Lightweight Suffix Arry Constrution Algorithm 47 Fig. 3. Grphil representtions of the running times of qsufsort, he 6n, nd ds2 (with d = 500, L = 550) reported in Tles 2 4. Note tht the histogrms for he 6n on g hve een trunted sine the running times re well eyond the upper limit of the Y-xis. The test files re ordered y inresing verge LCP. Remrkle exeptions re he 6n on g nd ds2 on g nd w3 for whih the Pentium 4 is lerly fster. These dt show one more tht the rhitetures of modern CPUs n hve signifint nd unexpeted impts on the exeution speed of the different lgorithms. 10 In order to hve different perspetive on the performnes of ds2, in Figure 4 we report the rtios etween the running times of ds2 nd qsufsort on the four different mhines used in our tests. These rtios represent the redution in running time hieved y ds2 over qsufsort. We oserve tht for ll files exept g the rtios for the Pentium III nd the Athlon re quite lose. We n lso see tht for ll files exept hr22 the smllest rtios re hieved on the PowerPC nd the Pentium 4. This mens tht ds2 is more effiient thn qsufsort on these rhitetures; however, the differene is not mrked nd does not pper to e relted to the verge LCP of the input files. 10 Another peulirity of the Pentium 4 is tht the use of the ompiler option -mrh=pentium4 gretly enhned the performnes of he 6n nd ds2 (it did not ffet the performnes of qsufsort). On the other mhines, the -mrh option did not ring ler improvement nd therefore it ws not used.

16 48 G. Mnzini nd P. Ferrgin Fig. 4. Running time redution hieved y ds2 over qsufsort. Eh r represents the rtio etween the running time of ds2 (with d = 500, L = 550) over the running time of qsufsort. Overll, the dt reported in this setion show the vlidity of our deep shllow suffix sorting pproh. We hve een le to improve the lredy impressive performnes of opy nd he 6n for files with moderte verge LCP. At the sme time we hve voided ny signifint degrdtion in performnes for files with lrge verge LCP: we re fster thn ny other lgorithm with the only exeption of the file g on the Pentium 3. We stress tht this improvement in terms of running time hs een hieved with simultneous redution of the spe oupny. ds2 with d = 500 uses 5.03n spe, he 6n uses 6n spe, nd qsufsort uses 8n spe. 4. Conluding Remrks. In this pper we hve presented lightweight lgorithm for uilding the suffix rry of text T [1, n]. We hve een motivted y the oservtion tht the mjor drwk of most suffix rry onstrution lgorithms is their lrge spe oupny. Our lgorithm uses 5.03n ytes nd is fster thn ny other tested lgorithm. Only on single file on single mhine is our lgorithm outperformed y qsufsort, whih however uses 8n ytes. The C soure ode of ll lgorithms desried in this pper, nd the omplete olletion of test files, re pulily ville on the we [26]. For our lightweight suffix sorting lgorithm we provide simple API whih mkes the onstrution of the suffix rry s simple s lling two C proedures. Finlly, we point out tht suffix sorting is very tive re of reserh. All lgorithms desried in this pper re less thn 4 yers old nd new ones re under development. During the review of this pper some (n) time suffix sorting lgorithms not sed on the suffix tree hve ppered in the literture [14], [17], [19], [20]. At the moment it is too erly to evlute their prtil impt, lthough their engineering nd experimentl evlution is ertinly worthwhile reserh gol. Also during the review of this pper, new lightweight suffix sorting lgorithm hs een proposed y Burkhrdt nd Kärkkäinen [3]. This new lgorithm runs in O(n log n) time in the worst se nd uses O(n/ log n) spe in ddition to the input text nd the suffix rry. Preliminry experimentl results reported in Setion 7 of [3] show tht on rel-world files this lgorithm is

17 Engineering Lightweight Suffix Arry Constrution Algorithm 49 roughly three times slower thn ds2 nd uses 17% more spe; however, its O(n log n) worst-se running time mkes it n ttrtive option thus deserving further investigtion. Finlly, we know of new suffix sorting lgorithm [31] whih, like qsufsort, uses 8n spe nd runs in O(n log n) time in the worst se. Preliminry tests show tht this new lgorithm is roughly two times fster thn qsufsort nd slightly fster thn ds2 [31]. Aknowledgments. We thnk Hideo Itoh, Tsi-Hsing Ko, Stefn Kurtz, Jesper Lrsson, Kunihiko Sdkne, nd Julin Sewrd for lrifying some detils on their work nd/or providing the soure or exeutle ode of their lgorithms. We lso thnk Giovnni Rest nd Giorgio Vehiottivi for their tehnil ssistne in the experimentl work. Referenes [1] J. L. Bentley nd M. D. MIlroy. Engineering sort funtion. Softwre Prtie nd Experiene, 23(11): , [2] J. L. Bentley nd R. Sedgewik. Fst lgorithms for sorting nd serhing strings. In Proeedings of the 8th ACM SIAM Symposium on Disrete Algorithms, pges , [3] S. Burkhrdt nd J. Kärkkäinen. Fst lightweight suffix rry onstrution nd heking. In Proeedings of the 14th Symposium on Comintoril Pttern Mthing (CPM 03), pges LNCS Springer-Verlg, Berlin, [4] M. Burrows nd D. Wheeler. A lok sorting lossless dt ompression lgorithm. Tehnil Report 124, Digitl Equipment Corportion, [5] M. Crohemore nd W. Rytter. Text Algorithms. Oxford University Press, Oxford, [6] M. Frh-Colton, P. Ferrgin, nd S. Muthukrishnn. On the sorting-omplexity of suffix tree onstrution. Journl of the ACM, 47(6): , [7] P. Ferrgin nd R. Grossi. The string B-tree: new dt struture for string serh in externl memory nd its pplitions. Journl of the ACM, 46(2): , [8] P. Ferrgin nd G. Mnzini. Opportunisti dt strutures with pplitions. In Proeedings of the 41st IEEE Symposium on Foundtions of Computer Siene, pges , [9] P. Ferrgin nd G. Mnzini. An experimentl study of n opportunisti index. In Proeedings 12th ACM SIAM Symposium on Disrete Algorithms, pges , [10] G. H. Gonnet, R. A. Bez-Ytes, nd T. Snider. New indies for text: PAT trees nd PAT rrys. In B. Frkes nd R. A. Bez-Ytes, editors, Informtion Retrievl: Dt Strutures nd Algorithms, hpter 5, pges Prentie-Hll, Englewood Cliffs, NJ, [11] R. Grossi, A. Gupt, nd J. Vitter. High-order entropy-ompressed text indexes. In Proeedings of the 14th Annul ACM SIAM Symposium on Disrete Algorithms (SODA 03), pges , [12] R. Grossi nd J. Vitter. Compressed suffix rrys nd suffix trees with pplitions to text indexing nd string mthing. In Proeedings of the 32nd ACM Symposium on Theory of Computing, pges , [13] D. Gusfield. Algorithms on Strings, Trees, nd Sequenes: Computer Siene nd Computtionl Biology. Cmridge University Press, Cmridge, [14] W. Hon, K. Sdkne, nd W. Sung. Breking time-nd-spe rrier in onstruting full-text indies. In Proeedings of the 44th IEEE Symposium on Foundtions of Computer Siene, pges , [15] H. Itoh nd H. Tnk. An effiient method for in memory onstrution of suffix rrys. In Proeedings of the Sixth Symposium on String Proessing nd Informtion Retrievl (SPIRE 99), pges IEEE Computer Soiety Press, Los Almitos, CA, [16] T.-H. Ko. Improving suffix-rry onstrution lgorithms with pplitions. Mster s thesis, Deprtment of Computer Siene, Gunm University, Ferury 2001.

18 50 G. Mnzini nd P. Ferrgin [17] J. Kärkkäinen nd P. Snders. Simple liner work suffix rry onstrution. In Proeedings of the 30th Interntionl Colloquium on Automt, Lnguges nd Progrmming (ICALP 03), pges LNCS Springer-Verlg, Berlin, [18] R. Krp, R. Miller, nd A. Rosenerg. Rpid identifition of repeted ptterns in strings, rrys nd trees. In Proeedings of the ACM Symposium on Theory of Computtion, pges , [19] D. K. Kim, J. S. Sim, H. Prk, nd K. Prk. Liner-time onstrution of suffix rrys. In Pro. 14th Symposium on Comintoril Pttern Mthing (CPM 03), pges LNCS Springer- Verlg, Berlin, [20] P. Ko nd S. Aluru. Spe effiient liner time onstrution of suffix rrys. In Proeedings of the 14th Symposium on Comintoril Pttern Mthing (CPM 03), pges LNCS Springer- Verlg, Berlin [21] S. Kurtz. Mkvtree pkge (ville upon request). [22] S. Kurtz. Reduing the spe requirement of suffix trees. Softwre Prtie nd Experiene, 29(13): , [23] N. J. Lrsson nd K. Sdkne. Fster suffix sorting. Tehnil Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Deprtment of Computer Siene, Lund University, Sweden, [24] U. Mner nd G. Myers. Suffix rrys: new method for on-line string serhes. SIAM Journl on Computing, 22(5): , [25] G. Mnzini. An nlysis of the Burrows Wheeler trnsform. Journl of the ACM, 48(3): , [26] G. Mnzini nd P. Ferrgin. Lightweight suffix sorting home pge. it/~mnzini/lightweight. [27] P. M. MIlroy nd K. Bosti. Engineering rdix sort. Computing Systems, 6(1):5 27, [28] K. Sdkne. Compressed text dtses with effiient query lgorithms sed on the ompressed suffix rry. In Proeeding of the 11th Interntionl Symposium on Algorithms nd Computtion, pges LNCS Springer-Verlg, Berlin [29] J. Sewrd. The BZIP2 home pge, [30] J. Sewrd. On the performne of BWT sorting lgorithms. In DCC: Dt Compression Conferene, pges IEEE Computer Soiety Press, Los Almitos, CA, [31] K. P. Vo. Personl ommunition.

Engineering a Lightweight Suffix Array Construction Algorithm (Extended Abstract)

Engineering a Lightweight Suffix Array Construction Algorithm (Extended Abstract) Engineering Lightweight Suffix Arry Constrution Algorithm (Extended Astrt) Giovnni Mnzini 1,2 nd Polo Ferrgin 3 1 Diprtimento di Informti, Università del Piemonte Orientle I-15100 Alessndri, Itly mnzini@mfn.unipmn.it

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

Lecture Notes No. 10

Lecture Notes No. 10 2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

Fast index for approximate string matching

Fast index for approximate string matching Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

Prefix-Free Regular-Expression Matching

Prefix-Free Regular-Expression Matching Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15 Pttern Mthing Given pttern P nd text T, find ll sustrings

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

Engr354: Digital Logic Circuits

Engr354: Digital Logic Circuits Engr354: Digitl Logi Ciruits Chpter 4: Logi Optimiztion Curtis Nelson Logi Optimiztion In hpter 4 you will lern out: Synthesis of logi funtions; Anlysis of logi iruits; Tehniques for deriving minimum-ost

More information

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4. Mth 5 Tutoril Week 1 - Jnury 1 1 Nme Setion Tutoril Worksheet 1. Find ll solutions to the liner system by following the given steps x + y + z = x + y + z = 4. y + z = Step 1. Write down the rgumented mtrix

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

Lossless Compression Lossy Compression

Lossless Compression Lossy Compression Administrivi CSE 39 Introdution to Dt Compression Spring 23 Leture : Introdution to Dt Compression Entropy Prefix Codes Instrutor Prof. Alexnder Mohr mohr@s.sunys.edu offie hours: TBA We http://mnl.s.sunys.edu/lss/se39/24-fll/

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep

More information

Learning Partially Observable Markov Models from First Passage Times

Learning Partially Observable Markov Models from First Passage Times Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs).

More information

Section 1.3 Triangles

Section 1.3 Triangles Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Fun Gme Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Fun Gme Properties Arrow s Theorem Leture Overview 1 Rep 2 Fun Gme 3 Properties

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t

More information

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Generalization of 2-Corner Frequency Source Models Used in SMSIM Generliztion o 2-Corner Frequeny Soure Models Used in SMSIM Dvid M. Boore 26 Mrh 213, orreted Figure 1 nd 2 legends on 5 April 213, dditionl smll orretions on 29 My 213 Mny o the soure spetr models ville

More information

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES PAIR OF LINEAR EQUATIONS IN TWO VARIABLES. Two liner equtions in the sme two vriles re lled pir of liner equtions in two vriles. The most generl form of pir of liner equtions is x + y + 0 x + y + 0 where,,,,,,

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix tries Definition of tri mtri is regulr rry of numers enlosed inside rkets SCHOOL OF ENGINEERING & UIL ENVIRONEN Emple he following re ll mtries: ), ) 9, themtis ), d) tries Definition of tri Size of tri

More information

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions MEP: Demonstrtion Projet UNIT 4: Trigonometry UNIT 4 Trigonometry tivities tivities 4. Pythgors' Theorem 4.2 Spirls 4.3 linometers 4.4 Rdr 4.5 Posting Prels 4.6 Interloking Pipes 4.7 Sine Rule Notes nd

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

QUADRATIC EQUATION. Contents

QUADRATIC EQUATION. Contents QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

Instructions. An 8.5 x 11 Cheat Sheet may also be used as an aid for this test. MUST be original handwriting.

Instructions. An 8.5 x 11 Cheat Sheet may also be used as an aid for this test. MUST be original handwriting. ID: B CSE 2021 Computer Orgniztion Midterm Test (Fll 2009) Instrutions This is losed ook, 80 minutes exm. The MIPS referene sheet my e used s n id for this test. An 8.5 x 11 Chet Sheet my lso e used s

More information

Spacetime and the Quantum World Questions Fall 2010

Spacetime and the Quantum World Questions Fall 2010 Spetime nd the Quntum World Questions Fll 2010 1. Cliker Questions from Clss: (1) In toss of two die, wht is the proility tht the sum of the outomes is 6? () P (x 1 + x 2 = 6) = 1 36 - out 3% () P (x 1

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P. Chpter 7: The Riemnn Integrl When the derivtive is introdued, it is not hrd to see tht the it of the differene quotient should be equl to the slope of the tngent line, or when the horizontl xis is time

More information

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic Chpter 3 Vetor Spes In Chpter 2, we sw tht the set of imges possessed numer of onvenient properties. It turns out tht ny set tht possesses similr onvenient properties n e nlyzed in similr wy. In liner

More information

Test Generation from Timed Input Output Automata

Test Generation from Timed Input Output Automata Chpter 8 Test Genertion from Timed Input Output Automt The purpose of this hpter is to introdue tehniques for the genertion of test dt from models of softwre sed on vrints of timed utomt. The tests generted

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

Iowa Training Systems Trial Snus Hill Winery Madrid, IA

Iowa Training Systems Trial Snus Hill Winery Madrid, IA Iow Trining Systems Tril Snus Hill Winery Mdrid, IA Din R. Cohrn nd Gil R. Nonneke Deprtment of Hortiulture, Iow Stte University Bkground nd Rtionle: Over the lst severl yers, five sttes hve een evluting

More information

Symmetrical Components 1

Symmetrical Components 1 Symmetril Components. Introdution These notes should e red together with Setion. of your text. When performing stedy-stte nlysis of high voltge trnsmission systems, we mke use of the per-phse equivlent

More information

Maintaining Mathematical Proficiency

Maintaining Mathematical Proficiency Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Behavior Composition in the Presence of Failure

Behavior Composition in the Presence of Failure Behvior Composition in the Presene of Filure Sestin Srdin RMIT University, Melourne, Austrli Fio Ptrizi & Giuseppe De Giomo Spienz Univ. Rom, Itly KR 08, Sept. 2008, Sydney Austrli Introdution There re

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

TOPIC: LINEAR ALGEBRA MATRICES

TOPIC: LINEAR ALGEBRA MATRICES Interntionl Blurete LECTUE NOTES for FUTHE MATHEMATICS Dr TOPIC: LINEA ALGEBA MATICES. DEFINITION OF A MATIX MATIX OPEATIONS.. THE DETEMINANT deta THE INVESE A -... SYSTEMS OF LINEA EQUATIONS. 8. THE AUGMENTED

More information

TIME AND STATE IN DISTRIBUTED SYSTEMS

TIME AND STATE IN DISTRIBUTED SYSTEMS Distriuted Systems Fö 5-1 Distriuted Systems Fö 5-2 TIME ND STTE IN DISTRIUTED SYSTEMS 1. Time in Distriuted Systems Time in Distriuted Systems euse eh mhine in distriuted system hs its own lok there is

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

#A42 INTEGERS 11 (2011) ON THE CONDITIONED BINOMIAL COEFFICIENTS

#A42 INTEGERS 11 (2011) ON THE CONDITIONED BINOMIAL COEFFICIENTS #A42 INTEGERS 11 (2011 ON THE CONDITIONED BINOMIAL COEFFICIENTS Liqun To Shool of Mthemtil Sienes, Luoyng Norml University, Luoyng, Chin lqto@lynuedun Reeived: 12/24/10, Revised: 5/11/11, Aepted: 5/16/11,

More information

Comparing the Pre-image and Image of a Dilation

Comparing the Pre-image and Image of a Dilation hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

AVL Trees. D Oisín Kidney. August 2, 2018

AVL Trees. D Oisín Kidney. August 2, 2018 AVL Trees D Oisín Kidne August 2, 2018 Astrt This is verified implementtion of AVL trees in Agd, tking ides primril from Conor MBride s pper How to Keep Your Neighours in Order [2] nd the Agd stndrd lirr

More information

System Validation (IN4387) November 2, 2012, 14:00-17:00

System Validation (IN4387) November 2, 2012, 14:00-17:00 System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise

More information

Review Topic 14: Relationships between two numerical variables

Review Topic 14: Relationships between two numerical variables Review Topi 14: Reltionships etween two numeril vriles Multiple hoie 1. Whih of the following stterplots est demonstrtes line of est fit? A B C D E 2. The regression line eqution for the following grph

More information

2. Udi Manber, Gene Myers: Suffix arrays: A new method for online string searching, SIAM Journal on Computing 22:935-48,1993

2. Udi Manber, Gene Myers: Suffix arrays: A new method for online string searching, SIAM Journal on Computing 22:935-48,1993 7 Suffix rrys This exposition ws developed y Clemens Gröpl nd Knut Reinert. It is sed on the following sources, which re ll recommended reding: 1. Simon J. Puglisi, W. F. Smyth, nd Andrew Turpin, A txonomy

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

THE INFLUENCE OF MODEL RESOLUTION ON AN EXPRESSION OF THE ATMOSPHERIC BOUNDARY LAYER IN A SINGLE-COLUMN MODEL

THE INFLUENCE OF MODEL RESOLUTION ON AN EXPRESSION OF THE ATMOSPHERIC BOUNDARY LAYER IN A SINGLE-COLUMN MODEL THE INFLUENCE OF MODEL RESOLUTION ON AN EXPRESSION OF THE ATMOSPHERIC BOUNDARY LAYER IN A SINGLE-COLUMN MODEL P3.1 Kot Iwmur*, Hiroto Kitgw Jpn Meteorologil Ageny 1. INTRODUCTION Jpn Meteorologil Ageny

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Siene Deprtment Compiler Design Spring 7 Lexil Anlysis Smple Exerises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sienes Institute 47 Admirlty Wy, Suite

More information

Automatic Synthesis of New Behaviors from a Library of Available Behaviors

Automatic Synthesis of New Behaviors from a Library of Available Behaviors Automti Synthesis of New Behviors from Lirry of Aville Behviors Giuseppe De Giomo Università di Rom L Spienz, Rom, Itly degiomo@dis.unirom1.it Sestin Srdin RMIT University, Melourne, Austrli ssrdin@s.rmit.edu.u

More information

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into

More information

@#? Text Search ] { "!" Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata

@#? Text Search ] { ! Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata g Text Serh @#? ~ Mrko Berezovský Rdek Mřík PAL 0 Nondeterministi Finite Automt n Trnsformtion NFA to DFA nd Simultion of NFA f Text Serh Using Automt A B R Power of Nondeterministi Approh u j Regulr Expression

More information

Part 4. Integration (with Proofs)

Part 4. Integration (with Proofs) Prt 4. Integrtion (with Proofs) 4.1 Definition Definition A prtition P of [, b] is finite set of points {x 0, x 1,..., x n } with = x 0 < x 1

More information

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing I9 Introdution to Bioinformtis, 0 Indeing tehniques Yuzhen Ye (yye@indin.edu) Shool of Informtis & Computing, IUB Contents We hve seen indeing tehnique used in BLAST Applitions tht rely on n effiient indeing

More information

Chapter 8 Roots and Radicals

Chapter 8 Roots and Radicals Chpter 8 Roots nd Rdils 7 ROOTS AND RADICALS 8 Figure 8. Grphene is n inredily strong nd flexile mteril mde from ron. It n lso ondut eletriity. Notie the hexgonl grid pttern. (redit: AlexnderAIUS / Wikimedi

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Trigonometry Revision Sheet Q5 of Paper 2

Trigonometry Revision Sheet Q5 of Paper 2 Trigonometry Revision Sheet Q of Pper The Bsis - The Trigonometry setion is ll out tringles. We will normlly e given some of the sides or ngles of tringle nd we use formule nd rules to find the others.

More information

Probability. b a b. a b 32.

Probability. b a b. a b 32. Proility If n event n hppen in '' wys nd fil in '' wys, nd eh of these wys is eqully likely, then proility or the hne, or its hppening is, nd tht of its filing is eg, If in lottery there re prizes nd lnks,

More information

Ch. 2.3 Counting Sample Points. Cardinality of a Set

Ch. 2.3 Counting Sample Points. Cardinality of a Set Ch..3 Counting Smple Points CH 8 Crdinlity of Set Let S e set. If there re extly n distint elements in S, where n is nonnegtive integer, we sy S is finite set nd n is the rdinlity of S. The rdinlity of

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministi Finite utomt The Power of Guessing Tuesdy, Otoer 4, 2 Reding: Sipser.2 (first prt); Stoughton 3.3 3.5 S235 Lnguges nd utomt eprtment of omputer Siene Wellesley ollege Finite utomton (F)

More information

Semi-local string comparison

Semi-local string comparison Semi-lol string omprison Alexnder Tiskin http://www.ds.wrwik..uk/~tiskin Deprtment of Computer Siene University of Wrwik 1 The prolem 2 Effiient output representtion 3 The lgorithms 4 Conlusions nd future

More information

MATRIX INVERSE ON CONNEX PARALLEL ARCHITECTURE

MATRIX INVERSE ON CONNEX PARALLEL ARCHITECTURE U.P.B. Si. Bull., Series C, Vol. 75, Iss. 2, ISSN 86 354 MATRIX INVERSE ON CONNEX PARALLEL ARCHITECTURE An-Mri CALFA, Gheorghe ŞTEFAN 2 Designed for emedded omputtion in system on hip design, the Connex

More information

Hyers-Ulam stability of Pielou logistic difference equation

Hyers-Ulam stability of Pielou logistic difference equation vilble online t wwwisr-publitionsom/jns J Nonliner Si ppl, 0 (207, 35 322 Reserh rtile Journl Homepge: wwwtjnsom - wwwisr-publitionsom/jns Hyers-Ulm stbility of Pielou logisti differene eqution Soon-Mo

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information

Electromagnetism Notes, NYU Spring 2018

Electromagnetism Notes, NYU Spring 2018 Eletromgnetism Notes, NYU Spring 208 April 2, 208 Ation formultion of EM. Free field desription Let us first onsider the free EM field, i.e. in the bsene of ny hrges or urrents. To tret this s mehnil system

More information

The DOACROSS statement

The DOACROSS statement The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information