Engineering a Lightweight Suffix Array Construction Algorithm (Extended Abstract)

Size: px
Start display at page:

Download "Engineering a Lightweight Suffix Array Construction Algorithm (Extended Abstract)"

Transcription

1 Engineering Lightweight Suffix Arry Constrution Algorithm (Extended Astrt) Giovnni Mnzini 1,2 nd Polo Ferrgin 3 1 Diprtimento di Informti, Università del Piemonte Orientle I Alessndri, Itly mnzini@mfn.unipmn.it 2 Istituto di Informti e Telemti, CNR I Pis, Itly 3 Diprtimento di Informti, Università dipis I Pis, Itly ferrgin@di.unipi.it 1 Introdution We onsider the prolem of omputing the suffix rry of text T [1,n]. This prolem onsists in sorting the suffixes of T in lexiogrphi order. The suffix rry [16] (orpt rry [9]) is simple, esy to ode, nd elegnt dt struture used for severl fundmentl string mthing prolems involving oth linguisti texts nd iologil dt [4, 11]. Reently, the interest in this dt struture hs een revitlized y its use s uilding lok for three novel pplitions: (1) the Burrows-Wheeler ompression lgorithm [3], whih is provly [17] nd prtilly [20] effetive ompression tool; (2) the onstrution of suint [10, 19] nd ompressed [7, 8] indexes; the ltter n store oth the input text nd its full-text index using roughly the sme spe used y trditionl ompressors for the text lone; nd (3) lgorithms for lustering nd rnking the nswers to user queries in we-serh engines [22]. In ll these pplitions the onstrution of the suffix rry is the omputtionl ottlenek oth in time nd spe. This motivted our interest in designing yet nother suffix rry onstrution lgorithm whih is fst nd lightweight in the sense tht it uses smll spe. The suffix rry onsists of n integers in the rnge [1,n]. This mens tht in theory it uses Θ(n log n) its of storge. However, in most pplitions the size of the text is smller thn 2 32 nd it is ustomry to store eh integer in four yte word; this yields totl spe oupny of 4n ytes. For wht onerns the ost of onstruting the suffix rry, the theoretilly est lgorithms run in Θ(n) time[5]. These lgorithms work y first uilding the suffix tree nd then otining the sorted suffixes vi n in-order trversl of the tree. However, suffix tree onstrution lgorithms re oth omplex nd spe onsuming sine they oupy t lest 15n ytes of working spe (or even more, depending on the Prtilly supported y Itlin MIUR projet on Tehnologies nd servies for enhned ontent delivery. R. Möhring nd R. Rmn (Eds.): ESA 2002, LNCS 2461, pp , Springer-Verlg Berlin Heidelerg 2002

2 Engineering Lightweight Suffix Arry Constrution Algorithm 699 text struture [14]). This mkes their use imprtil even for modertely lrge texts. For this reson, suffix rrys re usully uilt using lgorithms whih run in O(n log n) time ut hve smller spe oupny. Among these lgorithms the urrent leder is the qsufsort lgorithm y Lrsson nd Sdkne [15]. qsufsort uses 8n ytes 1 nd despite the O(n log n) worst se ound it is fster thn the lgorithms sed on suffix tree onstrution. Unfortuntely, the size of our douments hs grown muh more quikly thn the min memory of our omputers. Thus, it is desirle to uild suffix rry using s smll spe s possile. Reently, Itoh nd Tnk [12] nd Sewrd [21] hve proposed two new lgorithms whih only use 5n ytes. From the theoretil point of view these lgorithms hve Θ ( n 2 log n ) worst se omplexity. In prtie they re fster thn qsufsort when the verge lp is smll (the lp is the length of the longest ommon prefix etween two onseutive suffixes in the suffix rry). However, for texts with lrge verge lp these lgorithms n e slower thn qsufsort y ftor 100 or more. 2 In this pper we desrie nd extensively test new lightweight suffix sorting lgorithm. Our min ide is to use very smll mount of extr memory, in ddition to 5n ytes, to void the degrdtion in performne when the verge lp is lrge. To hieve this gol we mke use of engineered lgorithms nd d ho dt strutures. Our lgorithm uses 5n + n ytes, where is user tunle prmeter (in our tests ws t most 0.03). For files with verge lp smller thn 100 our lgorithm is fster thn Sewrd s lgorithm nd roughly two times fster thn qsufsort. The est lgorithm in our tests uses 5.03n ytes nd is fster thn qsufsort for ll files exept for the one with the lrgest verge lp. 2 Definitions nd Previous Results Let T [1,n] denote text over the lphet Σ. The suffix rry [16] (orpt rry [9]) for T is n rry SA[1,n] suh tht T [SA[1],n], T [SA[2],n], et. is the list of suffixes of T sorted in lexiogrphi order. For exmple, for T = then SA =[2, 1, 3, 5, 4] sine T [2, 5] = is the suffix with lower lexiogrphi rnk, followed y T [1, 5] =, followed y T [3, 5] = nd so on. 3 Given two strings v, w we write lp(v, w) to denote the length of their longest ommon prefix. The verge lp of text T is defined s the verge length of the longest ommon prefix etween two onseutive suffixes. The verge lp is rough mesure of the diffiulty of sorting the suffixes: if the verge lp is lrge we need in priniple to exmine mny hrters in order to estlish the reltive order of two suffixes. 1 Here nd in the following the spe oupny figures inlude the spe for the input text, for the suffix rry, nd for ny uxiliry dt struture used y the lgorithm. 2 This figure refers to Sewrd lgorithm [21]. We re in the proess of quiring the ode of the Itoh-Tnk lgorithm nd we hope we will e le to test it in the finl version of the pper. 3 Note tht to define the lexiogrphi order of the suffixes it is ustomry to ppend t the end of T speil end-of-text symol whih is smller thn ny symol in Σ.

3 700 Giovnni Mnzini nd Polo Ferrgin Tle 1. Files used in our experiments sorted in order of inresing verge lp Nme Ave. lp Mx. lp File size Desription ile ,047,392 The file ile of the Cnterury orpus e.oli ,815 4,638,690 The file E.oli of the Cnterury orpus wrld ,473,400 The file world192 of the Cnterury orpus sprot , ,617,186 Swiss prot dtse (sprot34.dt) rf , ,421,901 Contention of RFC text files howto ,720 39,422,105 Contention of Linux Howto text files reuters , ,711,151 Reuters news in XML formt linux , ,254,720 Linux kernel soure files (tr rhive) jdk ,334 69,728,899 html nd jv files from the JDK 1.3 do. hr22 1, ,999 34,553,758 Assemly of humn hromosome 22 g 8, ,970 86,630,400 g 3.0 soure files (tr rhive) Sine this is n lgorithmi engineering pper we mke the following ssumptions whih orrespond to the sitution most often fed in prtie. We ssume Σ 256 nd tht eh lphet symol is stored in one yte. Hene, the text T [1,n] tkes preisely n ytes. Furthermore, we ssume tht n 2 32 nd tht the strting position of eh suffix is stored in four yte word. Hene, the suffix rry SA[1,n] tkes preisely 4n ytes. In the following we use the term lightweight to denote suffix sorting lgorithm whih use 5n ytes plus some smll mount of extr memory (we re intentionlly giving n informl definition). Note tht 5n ytes re just enough to store the input text T nd the suffix rry SA. Although we do not lim tht 5n ytes re indeed required, we do not know of ny lgorithm using less spe. For testing the suffix rry onstrution lgorithms we use the olletion of filesshownintle1. These files ontin different kind of dt in different formts; they lso disply wide rnge of sizes nd of verge lp s. 2.1 The Lrsson-Sdkne qsufsort Algorithm The qsufsort lgorithm [15] is sed on the douling tehnique introdued in [13] nd first used for the onstrution of the suffix rry in [16]. Given two strings v, w nd t>0wewritev< t w if the length-t prefix of v is lexiogrphilly smller thn the length-t prefix of w. Similrly we define the symols t, = t nd so on. Let s 1,s 2 denote two suffixes nd ssume s 1 = t s 2 (tht is, T [s 1,n] nd T [s 2,n]hvelength-t ommon prefix). Let ŝ 1 = s 1 + t denote the suffix T [s 1 + t, n] nd similrly let ŝ 2 = s 2 + t. The fundmentl oservtion of the douling tehnique is tht s 1 2t s 2 ŝ 1 t ŝ 2. (1) In other words, we n derive the 2t order etween s 1 nd s 2 y looking t the rnk of ŝ 1 nd ŝ 2 in the < t order.

4 Engineering Lightweight Suffix Arry Constrution Algorithm 701 The lgorithm qsufsort works in rounds. At the eginning of the ith round the suffixes re lredy sorted ording to the 2 i ordering. In the ith round the lgorithm looks for groups of suffixes shring the first 2 i hrters nd sorts them ording to the 2 i ordering using Bentley-MIlroy ternry quiksort [1]. Beuse of (1) eh omprison in the quiksort lgorithm tkes O(1) time. After t most log n rounds ll the suffixes re sorted. Thnks to very lever dt orgniztion qsufsort only uses 8n ytes. Even more surprisingly, the whole lgorithm fits in two pges of len nd elegnt C ode. The experiments reported in [15] show tht qsufsort outperforms other suffix sorting lgorithm sed on either the douling tehnique or the suffix tree onstrution. The only lgorithm whih runs fster thn qsufsort, ut only for files with verge lp less thn 20, is the Bentley-Sedgewik multikey quiksort [2]. Multikey quiksort is diret omprison lgorithm sine it onsiders the suffixes s ordinry strings nd sorts them vi hrter-y-hrter omprison without tking dvntge of their speil struture. 2.2 The Itoh-Tnk two-stge Algorithm In [12] Itoh nd Tnk desrie suffix sorting lgorithm lled two-stge suffix sort (two-stge from now on). two-stge only uses the text T nd the suffix rry SA for totl spe oupny of 5n ytes. To desrie how it works, let us ssume Σ = {,,...,z}. Using ounting sort, two-stge initilly prtitions the suffixes into Σ ukets B,...,B z ording to their first hrter. Note tht uket is nothing more thn set of onseutive entries in the rry SA whih now is sorted ording to the 1 ordering. Within eh uket twostge distinguishes etween two types of suffixes: Type A suffixes in whih the seond hrter of the suffix is smller thn the first, nd Type B suffixes in whih the seond hrter is lrger thn or equl to the first suffix hrter. The ruil oservtion of lgorithm two-stge is tht when ll Type B suffixes re sorted we n derive the ordering of Type A suffixes. This is done with single pss over the rry SA; when we meet suffix T [i, n] welooktsuffix T [i 1,n]: if it is Type A suffix we move it to the first empty position of uket B T [i 1]. Type B suffixes re sorted using textook string sorting lgorithms: in their implementtion the uthors use MSD rdix sort [18] for sorting lrge groups of suffixes, Bentley-Sedgewik multikey quiksort for medium size groups, nd insertion sort for smll groups. Summing up, two-stge n e onsidered n dvned diret omprison lgorithm sine Type B suffixes re sorted y diret omprison wheres Type A suffixes re sorted y muh fster proedure whih tkes dvntge of the speil struture of the suffixes. In [12] the uthors ompre two-stge with three diret omprison lgorithms (quiksort, multikey quiksort, nd MSD rdix sort) nd with n erlier version of qsufsort. two-stge turns out to e roughly 4 times fster thn quiksort nd MSD rdix sort, nd 2 to 3 times fster thn multikey quiksort nd qsufsort. However, the files used for the experiments hve n verge lp of t

5 702 Giovnni Mnzini nd Polo Ferrgin most 31, nd we know tht the dvntge of douling lgorithms (like qsufsort) eome pprent for muh lrger verge lp s. 2.3 Sewrd opy Algorithm Independently of Itoh nd Tnk, in [21] Sewrd desries lightweight lgorithm, lled opy, whih is sed on onept similr to the Type A/Type B suffixes used y lgorithm two-stge. Using ounting sort, opy initilly sorts the rry SA ording to the 2 ordering. As efore we use the term uket to denote the ontiguous portion of SA ontining set of suffixes shring the sme first hrter. Similrly, we use the term smll uket to denote the ontiguous portion of SA ontining suffixes shring the first two hrters. Hene, there re Σ ukets eh one onsisting of Σ smll ukets. Note tht one or more (smll) ukets n e empty. opy sorts the ukets one t time strting with the one ontining the fewest suffixes, nd proeeding up to the lrgest one. Assume for simpliity tht Σ = {,,...,z}. To sort uket, let us sy uket B p, opy sorts the smll ukets p, p,..., pz. The ruil point of lgorithm opy is tht when uket B p is ompletely sorted, with simple pss over it opy sorts ll the smll ukets p, p,..., zp. These smll ukets re mrked s sorted nd therefore opy will skip them when their prent uket is sorted. As further improvement, Sewrd shows tht even the sorting of the smll uket pp n e voided sine its ordering n e derived from the ordering of the smll ukets p,..., po nd pq,..., pz. This trik is extremely effetive when working on files ontining long runs of identil hrters. Algorithm opy sorts the smll ukets using Bentley-MIlroy ternry quiksort. During this sorting the suffixes re onsidered tomi, tht is, eh omprison onsists of the omplete omprison of two suffixes. The stndrd trik of sorting the lrger side of the prtition lst nd eliminting til reursion ensures tht the mount of spe required y the reursion stk grows, in the worst se, logrithmilly with the size of the input text. In [21] Sewrd ompres tuned implementtion of opy with the qsufsort lgorithm on set of files with verge lp up to 400. In these tests opy outperforms qsufsort for ll files ut one. However, Sewrd reports tht opy is muh slower thn qsufsort when the verge lp exeeds thousnd, nd for this reson he suggests the use of qsufsort s fllk when the verge lp is lrge. 4 Sine the soure ode of oth qsufsort nd opy is ville 5,wehvetested oth lgorithms on our suite of test files whih hve n verge lp rnging from to (see Tle 1). The results of our experiments re reported in the top two rows of Tle 2 (for AMD Athlon proessor) nd Tle 3 (for Pentium III proessor). In ordne with Sewrd s results, opy is fster thn 4 In [21] Sewrd desries nother lgorithm, lled he,whihisfsterthnopy for files with lrger verge lp. However, lgorithm he uses 6n ytes. 5 Algorithm opy ws originlly oneived to split the input file into 1MB loks. We modified it to llow the omputtion of the suffix rry for the whole file.

6 Engineering Lightweight Suffix Arry Constrution Algorithm 703 qsufsort when the verge lp is smll, nd it is slower when the verge lp is lrge. The turning point ppers to e when the verge lp is in the rnge However, this is not the omplete story. For exmple for ll files the running time of qsufsort on the Pentium is smller thn the running time for the Athlon; this is not true for opy (see for exmple files jdk13 nd g). We onjeture tht differene in the he rhiteture nd ehvior ould explin this differene, nd we pln to investigte it in the full pper (see lso Setion 4). We n lso see tht the differene in performne etween the two lgorithms does not depend on the verge lp lone. The DNA file hr22 hs very lrge verge lp, nevertheless the two lgorithms hve similr running times. The file linux hs muh greter verge lp thn reuters nd roughly the sme size. Nevertheless, the differene in the running times etween qsufsort nd opy is smller for linux thn for reuters. The most strikingdt in Tles2 nd 3 re the running times for g:for this file lgorithm opy is times slower thn qsufsort. This is not eptle sine g is not pthologil file uilt to show the wekness of opy, onthe ontrry it is file downloded from very usy site nd we n expet tht there re other files like it on our omputers. 6 In the next setion we desrie new lgorithm whih uses severl tehniques for voiding suh tstrophi ehvior nd t the sme time retining the nie fetures of opy: the5n ytes spe oupny nd the good performne for files with moderte verge lp. 3 Our Contriution: Deep-Shllow Suffix Sorting Our strting point for the design of n effiient suffix rry onstrution lgorithm is Sewrd opy lgorithm. Within this lgorithm we reple the proedure used for sorting the smll ukets (i.e. the groups of suffixes hving the first two hrters in ommon). Insted of using Bentley-MIlroy ternry quiksort we use more sophistited tehnique. More preisely, we sort the smll ukets using Bentley-Sedgewik multikey quiksort nd we stop the reursion when we reh predefined depth L (tht is, when we hve to sort group of suffixes with length-l ommon prefix). At this point we swith to different string sorting lgorithm. This pproh hs severl dvntges: (1) it provides simple nd effiient men to detet the groups of suffixes with long ommon prefix; (2) euse of the limit L, the size of the reursion stk is ounded y predefined onstnt whih is independent of the size of the input text nd n e tuned y the user; (3) if the suffixes in the smll uket hve ommon prefixes whih never exeed L, ll the sorting is done y multikey quiksort whih is n extremely effiient string sorting lgorithm. We ll this pproh to suffix sorting deep-shllow sorting sine we mix n lgorithm for sorting suffixes with smll lp (shllow sorter) with n lgorithm (tully more thn one, s we shll see) for sorting suffixes with lrge lp (deep 6 As we hve lredy pointed out, lgorithm opy ws oneived to work on loks of dt of size t most 1MB. The reder should e wre tht we re using n lgorithm outside its intended domin!

7 704 Giovnni Mnzini nd Polo Ferrgin Tle 2. Running times (in seonds) for 1400 MHz AMD Athlon proessor, with 1GB min memory nd 256K L2 he. The operting system ws Dein GNU/Linux Dein 2.2. The ompiler ws g ver with options -O3 - fomit-frme-pointer. The tle reports (user + system) time verged over five runs. The running times do not inlude the time spent for reding the input files ile e.oli wrld sprot rf howto reuters linux jdk13 hr22 g qsufsort opy ds0 L = ds0 L = ds0 L = ds0 L = ds1 L = ds1 L = ds1 L = ds1 L = ds1 L = ds2 d = ds2 d = ds2 d = ds2 d = sorter). In the next setions we desrie severl deep sorting strtegies, i.e. lgorithms for sorting suffixes whih hve length-l ommon prefix. 3.1 Blind Sorting Let s 1,s 2,...,s m denote group of m suffixes with length-l ommon prefix tht we need to deep-sort. If m is smll (we will disuss lter wht this mens) we sort them using n lgorithm, lled lind sort, whih is sed on the lind trie dt struture introdued in [6, Set. 2.1] (see Fig. 1). Blind sorting simply onsists in inserting the strings s 1,...,s m one t time in n initilly empty lind trie; then we trverse the trie from left to right thus otining the strings sorted in lexiogrphi order. The insertion of string s i in the trie requires first phse in whih we sn s i nd simultneously trverse the trie until we reh lef l. Then we ompre s i with the string ssoited to lef l nd we determine the length of their ommon prefix. Finlly, we updte the trie dding the lef orresponding to s i (see [6] for detils nd for the merits of lind tries with respet to stndrd ompted tries). Oviously in the onstrution of the trie we ignore the first L hrters of eh suffix euse they re identil. Our implementtion of the lind sort lgorithm uses t most 36m ytes of memory. Therefore, we use it when the numer of suffixes to e sorted is less

8 Engineering Lightweight Suffix Arry Constrution Algorithm 705 Tle 3. Running times (in seonds) for 1000MHz Pentium III proessor, with 1GB min memory nd 256K L2 he. The operting system ws GNU/Linux Red Ht 7.1. The ompiler ws g ver with options -O3 -fomit-frmepointer. The tle reports (user + system) time verged over five runs. The running times do not inlude the time spent for reding the input files ile e.oli wrld sprot rf howto reuters linux jdk13 hr22 g qsufsort opy ds0 L = ds0 L = ds0 L = ds0 L = ds1 L = ds1 L = ds1 L = ds1 L = ds1 L = ds2 d = ds2 d = ds2 d = ds2 d = n 2000 thn B = If the text is 100MB long, this overhed is 1.8MB whih should e ompred with the 500MB required y the text nd the suffix rry. 7 If the numer of suffixes to e sorted is lrger thn B = n we sort them. Thus, the spe overhed of using lind sort is t most 9n 500 ytes. using Bentley-MIlroy ternry quiksort. However, with respet to lgorithm opy, we introdue the following two improvements: 1. As soon s we re working with group of suffixes smller thn B we stop the reursion nd we sort them using lind sort; 2. during eh prtitioning phse we ompute L S (resp. L L ) whih is the longest ommon prefix etween the pivot nd the strings whih re lexiogrphilly smller (resp. lrger) thn the pivot. When we sort the strings whih re smller (resp. lrger) thn the pivot we n skip the first L S (resp. L L ) hrters sine we know they onstitute ommon prefix. We hve lled the ove lgorithm ds0 nd its performnes re reported in Tles 2 nd 3 for severl vlues of the prmeter L (the depth t whih we 7 Although we elieve this is smll overhed, we point out tht the limit B = n 2000 ws hosen somewht ritrrily. Preliminry experimentl results show tht there is only mrginl degrdtion in performne when we tke B = n,orb = n In the future we pln to etter investigte the spe/time trdeoff introdued y this prmeter nd its impt on the he performne. 2000

9 706 Giovnni Mnzini nd Polo Ferrgin Compted Trie Blind Trie Fig. 1. A stndrd ompted trie (left) nd the orresponding lind trie (right) for the strings:,,,,,,. Eh internl node of the lind trie ontins n integer nd set of outgoing lelled rs. A node ontining the integer k represent set of strings whih hve length-k ommonprefixnddifferinthe(k +1)st hrter. The outgoing rs re lelled with the different hrters tht we find in position k +1. Note tht sine the outgoing rs re ordered lphetilly, y visiting the trie leves from left to right we get the strings in lexiogrphi order stop multikey quiksort nd we swith to the lind sort/quiksort lgorithms). We n see tht lgorithm ds0 is slower thn qsufsort only for the files jdk13 nd g. Ifweompreopy nd ds0 we notie tht our deep-shllow pproh hs redued the running time for g y ftor 10. This is ertinly good strt. As we shll see, we will e le to redue it gin y the sme ftor tking dvntge of the ft tht the strings we re sorting re ll suffixes of the sme text. 3.2 Indued Sorting One of the nie fetures of two-stge nd opy lgorithms is tht some of the suffixes re not sorted y diret omprison: insted their reltive order is derived in onstnt time from the ordering of other suffixes whih hve een lredy sorted. We use generliztion of this tehnique in the deep-sorting phse of our lgorithm. Assume we need to sort the suffixes s 1,...,s m whih hve length- L ommon prefix. We sn the first L hrters of s 1 looking t eh pir of onseutive hrters (e.g. T [s 1 ]T [s 1 +1], T [s 1 +1]T [s 1 +2], up to T [s 1 + L 2]T [s 1 + L 1]). As soon s we find pir of hrters, sy αβ, elonging to n lredy sorted smll uket αβ the ordering of s 1,...,s m n e derived from the ordering of αβ s follows.

10 Engineering Lightweight Suffix Arry Constrution Algorithm 707 Assume α = T [s 1 +t]ndβ = T [s 1 +t+1] for some t<l 1. Sine s 1,...,s m hve length-l ommon prefix, every s i ontins the pir αβ strting from position t. Hene αβ ontins m suffixes orresponding to s 1,...,s m (tht is, αβ ontins the suffixes strting t s 1 + t, s 2 + t,...,s m + t). Note tht these suffixes re not neessrily onseutive in αβ.sinethefirstt 1hrters of s 1,...,s m re identil, the ordering of s 1,...,s m n e derived from the ordering of the orresponding suffixes in αβ. Summing up, the ordering is done s follows: 1. We sort the suffixes s 1,...,s m ording to their strting position in the input text T [1,n]. This is done so tht in Step 3 we n use inry serh to nswer memership queries in the set s 1,...,s m. 2. Let ŝ denote the suffix strting t the text position T [s 1 + t]. We sn the smll uket αβ in order to find the position of ŝ within αβ. 3. We sn the suffixes preeding nd following ŝ in the smll uket αβ.for eh suffix s we hek whether the suffix strting t the position T [s t] is in the set s 1,...,s m ;ifsowemrkthesuffixs When m suffixes in αβ hve een mrked, we sn them from left to right. Sine αβ is sorted this gives us the orret ordering of s 1,...,s m. Oviously there is no gurntee tht in the length-l ommon prefix of s 1,..., s m there is pir of hrters elonging to n lredy sorted smll uket. In this se we simply resort to the quiksort/lind sort omintion. We ll this lgorithm ds1 nd its performnes re reported in Tles 2 nd 3 for severl vlues of L. Wenseethtds1 with L = 500 runs fster thn qsufsort for ll files exept g. In generl, ds1 ppers to e slightly slower thn ds0 for files with smll verge lp ut it is lerly fster for the files with lrge verge lp: forg it is 8-9 times fster. 3.3 Anhor Sorting Profiling shows tht the most ostly opertion of indued sorting is the snning of the smll uket αβ in serh of the position of suffix ŝ (Step 2 ove). We now show tht we n void this opertion if we re willing to use smll mount of extr memory. For fixed d>0 we prtition the text T [1,n]into n/d segments of length d: T [1,d],T[d +1, 2d] nd so on up to T [n d +1,n](for simpliity let us ssume tht d divides n). We define two rrys Anhor[ ] nd Offset[ ] ofsizen/d suh tht, for i =1,...,n/d: Offset[i] ontins the position of leftmost suffix whih strts in the ith segment nd elongs to n lredy sorted smll uket. If in the ith segment does not strt ny suffix elonging to n lredy sorted smll uket then Offset[i] =0. Let ŝ i denote the suffix whose strting position is stored Offset[i]. Anhor[i] ontins the position of ŝ i within its smll uket. 8 The mrking is done setting the most signifint it of s. This mens tht we n work with texts of size t most The sme restrition holds for qsufsort s well.

11 708 Giovnni Mnzini nd Polo Ferrgin The use of the rrys Anhor[ ] ndoffset[ ] is firly simple. Assume tht we need to sort the suffixes s 1,...,s m whih hve length-l ommon prefix. For j =1,...,m,lett j denote the segment ontining the strting position of s j.if ŝ tj (tht is, the leftmost lredy sorted suffix in segment t j ) strts within the first L hrters of s j (tht is, s j < ŝ tj <s j + L) thenwensorts 1,...,s m using the indued sorting lgorithm desried in the previous setion. However, wenskipstep2sinethepositionofŝ tj within its smll uket is stored in Anhor[t j ]. Oviously, it is possile tht for some j ŝ tj does not exist or nnot e used. However, sine the suffixes s 1,...,s m usully elong to different segments, we hve m possile ndidtes. In our implementtion mong the ville sorted suffixes ŝ tj s we use the one whose strting position is losest to the orresponding s j (tht is, we hoose j whih minimizes ŝ tj s j ; this helps Step 3 of indued sorting). If there is no ville sorted suffix, then we resort to the lind sort/quiksort omintion. For wht onerns the spe oupny of nhor sorting, we note tht in Offset[i] we n store the distne etween the eginning of the ith segment nd the leftmost sorted suffix. Hene Offset[i] <d.ifwetked<2 16 the rry Offset requires 2n/d ytes of storge. Sine eh entry of Anhor requires four ytes, the overll spe oupny is 6n/d ytes. In our tests we used t lest d = 500 whih yields n overhed of 6n 9n 500 ytes. If we dd the 500 ytes required y lind sorting with B = n 3n 2000, we get mximum overhed of t most 100 ytes. Hene, for 100MB text the overhed is t most 3MB, whih we onsider smll mount ompred with the 500MB used y the text nd the suffix rry. In Tles 2 nd 3 we report the running times of nhor sorting (under the nme ds2) ford rnging from 500 to 5000 nd L = d We see tht for the files with moderte verge lp ds2 with d = 500 is signifintly fster thn opy nd roughly two times fster thn qsufsort. For the files with lrge verge lp ds2 is fster thn qsufsort for ll files exept g. Forg ds2 is 15% slower thn qsufsort on the Athlon nd 50% slower on the Pentium. In our opinion this slowdown on single file is n eptle prie to py in exhnge for the redution in spe oupny hieved over qsufsort (5.03n ytes vs. 8n ytes). We elieve tht the possiility of uilding suffix rrys for lrger files hs more vlue thn greter effiieny in hndling files with very lrge verge lp. 4 Conlusions nd Further Work In this pper we hve presented novel lgorithm for uilding the suffix rry of text T [1,n]. Our lgorithm uses 5.03n ytes nd is fster thn ny other tested lgorithm. Only on single file our lgorithm is outperformed y qsufsort whih however uses 8n ytes. For pthologil inputs, i.e. texts with n verge lp of Θ(n), ll lightweight lgorithms tke Θ ( n 2 log n ) time. Although this worst se ehvior does not our in prtie, it is n interesting theoretil open question whether we n hieve O(n log n) timeusingo(n) spe in ddition to the spe required y the input text nd the suffix rry.

12 Referenes Engineering Lightweight Suffix Arry Constrution Algorithm 709 [1] J. L. Bentley nd M. D. MIlroy. Engineering sort funtion. Softwre Prtie nd Experiene, 23(11): , [2] J. L. Bentley nd R. Sedgewik. Fst lgorithms for sorting nd serhing strings. In Proeedings of the 8th ACM-SIAM Symposium on Disrete Algorithms, pges , [3] M. Burrows nd D. Wheeler. A lok sorting lossless dt ompression lgorithm. Tehnil Report 124, Digitl Equipment Corportion, [4] M. Crohemore nd W. Rytter. Text Algorithms. Oxford University Press, [5] M. Frh-Colton, P. Ferrgin, nd S. Muthukrishnn. On the sorting-omplexity of suffix tree onstrution. Journl of the ACM, 47(6): , [6] P. Ferrgin nd R. Grossi. The string B-tree: A new dt struture for string serh in externl memory nd its pplitions. Journl of the ACM, 46(2): , [7] P. Ferrgin nd G. Mnzini. Opportunisti dt strutures with pplitions. In Pro. of the 41st IEEE Symposium on Foundtions of Computer Siene, pges , [8] P. Ferrgin nd G. Mnzini. An experimentl study of n opportunisti index. In Pro. 12th ACM-SIAM Symposium on Disrete Algorithms, pges , [9] G. H. Gonnet, R. A. Bez-Ytes, nd T. Snider. New indies for text: PAT trees nd PAT rrys. In B. Frkes nd R. A. Bez-Ytes nd, editors, Informtion Retrievl: Dt Strutures nd Algorithms, hpter 5, pges Prentie-Hll, , 699 [10] R. Grossi nd J. Vitter. Compressed suffix rrys nd suffix trees with pplitions to text indexing nd string mthing. In Pro. of the 32nd ACM Symposium on Theory of Computing, pges , [11] D. Gusfield. Algorithms on Strings, Trees, nd Sequenes: Computer Siene nd Computtionl Biology. Cmridge University Press, [12] H. Itoh nd H. Tnk. An effiient method for in memory onstrution of suffix rrys. In Proeedings of the sixth Symposium on String Proessing nd Informtion Retrievl, SPIRE 99, pges IEEE Computer Soiety Press, , 701 [13] R. Krp, R. Miller, nd A. Rosenerg. Rpid Identifition of Repeted Ptterns in Strings, Arrys nd Trees. In Proeedings of the ACM Symposium on Theory of Computtion, pges , [14] S. Kurtz. Reduing the spe requirement of suffix trees. Softwre Prtie nd Experiene, 29(13): , [15] N. J. Lrsson nd K. Sdkne. Fster suffix sorting. Tehnil Report LU- CS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Deprtment of Computer Siene, Lund University, Sweden, , 700, 701 [16] U. Mner nd G. Myers. Suffix rrys: new method for on-line string serhes. SIAM Journl on Computing, 22(5): , , 699, 700 [17] G. Mnzini. An nlysis of the Burrows-Wheeler trnsform. Journl of the ACM, 48(3): , [18] P. M. MIlroy nd K. Bosti. Engineering rdix sort. Computing Systems, 6(1):5 27,

13 710 Giovnni Mnzini nd Polo Ferrgin [19] K. Sdkne. Compressed text dtses with effiient query lgorithms sed on the ompressed suffix rry. In Proeeding of the 11th Interntionl Symposium on Algorithms nd Computtion, pges Springer-Verlg, LNCS n. 1969, [20] J. Sewrd. The zip2 home pge, zip2/index.html. 698 [21] J. Sewrd. On the performne of BWT sorting lgorithms. In DCC: Dt Compression Conferene, pges IEEE Computer Soiety TCC, , 702 [22] O. Zmir nd O. Etzioni. Grouper: A dynmi lustering interfe to we serh results. Computer Networks, 31(11-16): ,

Engineering a Lightweight Suffix Array Construction Algorithm 1. Giovanni Manzini 2 and Paolo Ferragina 3

Engineering a Lightweight Suffix Array Construction Algorithm 1. Giovanni Manzini 2 and Paolo Ferragina 3 Algorithmi (2004) 40: 33 50 DOI: 10.1007/s00453-004-1094-1 Algorithmi 2004 Springer-Verlg New York, LLC Engineering Lightweight Suffix Arry Constrution Algorithm 1 Giovnni Mnzini 2 nd Polo Ferrgin 3 Astrt.

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely

More information

Fast index for approximate string matching

Fast index for approximate string matching Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit

More information

Prefix-Free Regular-Expression Matching

Prefix-Free Regular-Expression Matching Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15 Pttern Mthing Given pttern P nd text T, find ll sustrings

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions MEP: Demonstrtion Projet UNIT 4: Trigonometry UNIT 4 Trigonometry tivities tivities 4. Pythgors' Theorem 4.2 Spirls 4.3 linometers 4.4 Rdr 4.5 Posting Prels 4.6 Interloking Pipes 4.7 Sine Rule Notes nd

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Lossless Compression Lossy Compression

Lossless Compression Lossy Compression Administrivi CSE 39 Introdution to Dt Compression Spring 23 Leture : Introdution to Dt Compression Entropy Prefix Codes Instrutor Prof. Alexnder Mohr mohr@s.sunys.edu offie hours: TBA We http://mnl.s.sunys.edu/lss/se39/24-fll/

More information

Lecture Notes No. 10

Lecture Notes No. 10 2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4. Mth 5 Tutoril Week 1 - Jnury 1 1 Nme Setion Tutoril Worksheet 1. Find ll solutions to the liner system by following the given steps x + y + z = x + y + z = 4. y + z = Step 1. Write down the rgumented mtrix

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Section 1.3 Triangles

Section 1.3 Triangles Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

Spacetime and the Quantum World Questions Fall 2010

Spacetime and the Quantum World Questions Fall 2010 Spetime nd the Quntum World Questions Fll 2010 1. Cliker Questions from Clss: (1) In toss of two die, wht is the proility tht the sum of the outomes is 6? () P (x 1 + x 2 = 6) = 1 36 - out 3% () P (x 1

More information

QUADRATIC EQUATION. Contents

QUADRATIC EQUATION. Contents QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep

More information

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix tries Definition of tri mtri is regulr rry of numers enlosed inside rkets SCHOOL OF ENGINEERING & UIL ENVIRONEN Emple he following re ll mtries: ), ) 9, themtis ), d) tries Definition of tri Size of tri

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Fun Gme Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Fun Gme Properties Arrow s Theorem Leture Overview 1 Rep 2 Fun Gme 3 Properties

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

AVL Trees. D Oisín Kidney. August 2, 2018

AVL Trees. D Oisín Kidney. August 2, 2018 AVL Trees D Oisín Kidne August 2, 2018 Astrt This is verified implementtion of AVL trees in Agd, tking ides primril from Conor MBride s pper How to Keep Your Neighours in Order [2] nd the Agd stndrd lirr

More information

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P. Chpter 7: The Riemnn Integrl When the derivtive is introdued, it is not hrd to see tht the it of the differene quotient should be equl to the slope of the tngent line, or when the horizontl xis is time

More information

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic Chpter 3 Vetor Spes In Chpter 2, we sw tht the set of imges possessed numer of onvenient properties. It turns out tht ny set tht possesses similr onvenient properties n e nlyzed in similr wy. In liner

More information

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing I9 Introdution to Bioinformtis, 0 Indeing tehniques Yuzhen Ye (yye@indin.edu) Shool of Informtis & Computing, IUB Contents We hve seen indeing tehnique used in BLAST Applitions tht rely on n effiient indeing

More information

TOPIC: LINEAR ALGEBRA MATRICES

TOPIC: LINEAR ALGEBRA MATRICES Interntionl Blurete LECTUE NOTES for FUTHE MATHEMATICS Dr TOPIC: LINEA ALGEBA MATICES. DEFINITION OF A MATIX MATIX OPEATIONS.. THE DETEMINANT deta THE INVESE A -... SYSTEMS OF LINEA EQUATIONS. 8. THE AUGMENTED

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

Learning Partially Observable Markov Models from First Passage Times

Learning Partially Observable Markov Models from First Passage Times Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs).

More information

Part 4. Integration (with Proofs)

Part 4. Integration (with Proofs) Prt 4. Integrtion (with Proofs) 4.1 Definition Definition A prtition P of [, b] is finite set of points {x 0, x 1,..., x n } with = x 0 < x 1

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

Symmetrical Components 1

Symmetrical Components 1 Symmetril Components. Introdution These notes should e red together with Setion. of your text. When performing stedy-stte nlysis of high voltge trnsmission systems, we mke use of the per-phse equivlent

More information

Engr354: Digital Logic Circuits

Engr354: Digital Logic Circuits Engr354: Digitl Logi Ciruits Chpter 4: Logi Optimiztion Curtis Nelson Logi Optimiztion In hpter 4 you will lern out: Synthesis of logi funtions; Anlysis of logi iruits; Tehniques for deriving minimum-ost

More information

Trigonometry Revision Sheet Q5 of Paper 2

Trigonometry Revision Sheet Q5 of Paper 2 Trigonometry Revision Sheet Q of Pper The Bsis - The Trigonometry setion is ll out tringles. We will normlly e given some of the sides or ngles of tringle nd we use formule nd rules to find the others.

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information

Electromagnetism Notes, NYU Spring 2018

Electromagnetism Notes, NYU Spring 2018 Eletromgnetism Notes, NYU Spring 208 April 2, 208 Ation formultion of EM. Free field desription Let us first onsider the free EM field, i.e. in the bsene of ny hrges or urrents. To tret this s mehnil system

More information

Chapter 8 Roots and Radicals

Chapter 8 Roots and Radicals Chpter 8 Roots nd Rdils 7 ROOTS AND RADICALS 8 Figure 8. Grphene is n inredily strong nd flexile mteril mde from ron. It n lso ondut eletriity. Notie the hexgonl grid pttern. (redit: AlexnderAIUS / Wikimedi

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES PAIR OF LINEAR EQUATIONS IN TWO VARIABLES. Two liner equtions in the sme two vriles re lled pir of liner equtions in two vriles. The most generl form of pir of liner equtions is x + y + 0 x + y + 0 where,,,,,,

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Siene Deprtment Compiler Design Spring 7 Lexil Anlysis Smple Exerises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sienes Institute 47 Admirlty Wy, Suite

More information

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Generalization of 2-Corner Frequency Source Models Used in SMSIM Generliztion o 2-Corner Frequeny Soure Models Used in SMSIM Dvid M. Boore 26 Mrh 213, orreted Figure 1 nd 2 legends on 5 April 213, dditionl smll orretions on 29 My 213 Mny o the soure spetr models ville

More information

Ch. 2.3 Counting Sample Points. Cardinality of a Set

Ch. 2.3 Counting Sample Points. Cardinality of a Set Ch..3 Counting Smple Points CH 8 Crdinlity of Set Let S e set. If there re extly n distint elements in S, where n is nonnegtive integer, we sy S is finite set nd n is the rdinlity of S. The rdinlity of

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

System Validation (IN4387) November 2, 2012, 14:00-17:00

System Validation (IN4387) November 2, 2012, 14:00-17:00 System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Maintaining Mathematical Proficiency

Maintaining Mathematical Proficiency Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Appendix C Partial discharges. 1. Relationship Between Measured and Actual Discharge Quantities

Appendix C Partial discharges. 1. Relationship Between Measured and Actual Discharge Quantities Appendi Prtil dishrges. Reltionship Between Mesured nd Atul Dishrge Quntities A dishrging smple my e simply represented y the euilent iruit in Figure. The pplied lternting oltge V is inresed until the

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministi Finite utomt The Power of Guessing Tuesdy, Otoer 4, 2 Reding: Sipser.2 (first prt); Stoughton 3.3 3.5 S235 Lnguges nd utomt eprtment of omputer Siene Wellesley ollege Finite utomton (F)

More information

Behavior Composition in the Presence of Failure

Behavior Composition in the Presence of Failure Behvior Composition in the Presene of Filure Sestin Srdin RMIT University, Melourne, Austrli Fio Ptrizi & Giuseppe De Giomo Spienz Univ. Rom, Itly KR 08, Sept. 2008, Sydney Austrli Introdution There re

More information

Can one hear the shape of a drum?

Can one hear the shape of a drum? Cn one her the shpe of drum? After M. K, C. Gordon, D. We, nd S. Wolpert Corentin Lén Università Degli Studi di Torino Diprtimento di Mtemti Giuseppe Peno UNITO Mthemtis Ph.D Seminrs Mondy 23 My 2016 Motivtion:

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

Computing the Cyclic Edit Distance for Pattern Classification by Ranking Edit Paths

Computing the Cyclic Edit Distance for Pattern Classification by Ranking Edit Paths Computing the Cyli Edit Distne for Pttern Clssifition y Rnking Edit Pths Vítor M. Jiménez, Andrés Mrzl, Viente Plzón, nd Guillermo Peris DLSI, Universitt Jume I, 27 Cstellón, Spin {vjimenez,mrzl,plzon,peris}@uji.es

More information

Semi-local string comparison

Semi-local string comparison Semi-lol string omprison Alexnder Tiskin http://www.ds.wrwik..uk/~tiskin Deprtment of Computer Siene University of Wrwik 1 The prolem 2 Effiient output representtion 3 The lgorithms 4 Conlusions nd future

More information

Instructions. An 8.5 x 11 Cheat Sheet may also be used as an aid for this test. MUST be original handwriting.

Instructions. An 8.5 x 11 Cheat Sheet may also be used as an aid for this test. MUST be original handwriting. ID: B CSE 2021 Computer Orgniztion Midterm Test (Fll 2009) Instrutions This is losed ook, 80 minutes exm. The MIPS referene sheet my e used s n id for this test. An 8.5 x 11 Chet Sheet my lso e used s

More information

Exercise sheet 6: Solutions

Exercise sheet 6: Solutions Eerise sheet 6: Solutions Cvet emptor: These re merel etended hints, rther thn omplete solutions. 1. If grph G hs hromti numer k > 1, prove tht its verte set n e prtitioned into two nonempt sets V 1 nd

More information

Iowa Training Systems Trial Snus Hill Winery Madrid, IA

Iowa Training Systems Trial Snus Hill Winery Madrid, IA Iow Trining Systems Tril Snus Hill Winery Mdrid, IA Din R. Cohrn nd Gil R. Nonneke Deprtment of Hortiulture, Iow Stte University Bkground nd Rtionle: Over the lst severl yers, five sttes hve een evluting

More information

Chapter Gauss Quadrature Rule of Integration

Chapter Gauss Quadrature Rule of Integration Chpter 7. Guss Qudrture Rule o Integrtion Ater reding this hpter, you should e le to:. derive the Guss qudrture method or integrtion nd e le to use it to solve prolems, nd. use Guss qudrture method to

More information

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS Dvid Miller West Virgini University P.O. BOX 6310 30 Armstrong Hll Morgntown, WV 6506 millerd@mth.wvu.edu

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Dt Strutures nd Algorithm Xioqing Zheng zhengxq@fudn.edu.n String mthing prolem Pttern P ours with shift s in text T (or, equivlently, tht pttern P ours eginning t position s + in text T) if T[s +... s

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

#A42 INTEGERS 11 (2011) ON THE CONDITIONED BINOMIAL COEFFICIENTS

#A42 INTEGERS 11 (2011) ON THE CONDITIONED BINOMIAL COEFFICIENTS #A42 INTEGERS 11 (2011 ON THE CONDITIONED BINOMIAL COEFFICIENTS Liqun To Shool of Mthemtil Sienes, Luoyng Norml University, Luoyng, Chin lqto@lynuedun Reeived: 12/24/10, Revised: 5/11/11, Aepted: 5/16/11,

More information

Review Topic 14: Relationships between two numerical variables

Review Topic 14: Relationships between two numerical variables Review Topi 14: Reltionships etween two numeril vriles Multiple hoie 1. Whih of the following stterplots est demonstrtes line of est fit? A B C D E 2. The regression line eqution for the following grph

More information

12.4 Similarity in Right Triangles

12.4 Similarity in Right Triangles Nme lss Dte 12.4 Similrit in Right Tringles Essentil Question: How does the ltitude to the hpotenuse of right tringle help ou use similr right tringles to solve prolems? Eplore Identifing Similrit in Right

More information

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then.

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then. pril 8, 2017 Mth 9 Geometry Solving vetor prolems Prolem Prove tht if vetors nd stisfy, then Solution 1 onsider the vetor ddition prllelogrm shown in the Figure Sine its digonls hve equl length,, the prllelogrm

More information

Polynomials. Polynomials. Curriculum Ready ACMNA:

Polynomials. Polynomials. Curriculum Ready ACMNA: Polynomils Polynomils Curriulum Redy ACMNA: 66 www.mthletis.om Polynomils POLYNOMIALS A polynomil is mthemtil expression with one vrile whose powers re neither negtive nor frtions. The power in eh expression

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

6. Suppose lim = constant> 0. Which of the following does not hold?

6. Suppose lim = constant> 0. Which of the following does not hold? CSE 0-00 Nme Test 00 points UTA Stuent ID # Multiple Choie Write your nswer to the LEFT of eh prolem 5 points eh The k lrgest numers in file of n numers n e foun using Θ(k) memory in Θ(n lg k) time using

More information

2. There are an infinite number of possible triangles, all similar, with three given angles whose sum is 180.

2. There are an infinite number of possible triangles, all similar, with three given angles whose sum is 180. SECTION 8-1 11 CHAPTER 8 Setion 8 1. There re n infinite numer of possile tringles, ll similr, with three given ngles whose sum is 180. 4. If two ngles α nd β of tringle re known, the third ngle n e found

More information

Comparing the Pre-image and Image of a Dilation

Comparing the Pre-image and Image of a Dilation hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity

More information

Probability. b a b. a b 32.

Probability. b a b. a b 32. Proility If n event n hppen in '' wys nd fil in '' wys, nd eh of these wys is eqully likely, then proility or the hne, or its hppening is, nd tht of its filing is eg, If in lottery there re prizes nd lnks,

More information

Table of Content. c 1 / 5

Table of Content. c 1 / 5 Tehnil Informtion - t nd t Temperture for Controlger 03-2018 en Tble of Content Introdution....................................................................... 2 Definitions for t nd t..............................................................

More information

Precise timing analysis for direct-mapped caches

Precise timing analysis for direct-mapped caches Preise timing nlysis for diret-mpped hes Sidhrt Andlm, Roopk Sinh, Prth Roop, Alin Girult, Jn Reineke To ite this version: Sidhrt Andlm, Roopk Sinh, Prth Roop, Alin Girult, Jn Reineke. Preise timing nlysis

More information

Pythagoras Theorem. Pythagoras Theorem. Curriculum Ready ACMMG: 222, 245.

Pythagoras Theorem. Pythagoras Theorem. Curriculum Ready ACMMG: 222, 245. Pythgors Theorem Pythgors Theorem Curriulum Redy ACMMG:, 45 www.mthletis.om Fill in these spes with ny other interesting fts you n find out Pythgors. In the world of Mthemtis, Pythgors is legend. He lived

More information