Algorithms in Bioinformatics: A Practical Introduction. Suffix tree

Size: px
Start display at page:

Download "Algorithms in Bioinformatics: A Practical Introduction. Suffix tree"

Transcription

1 Alorithms in Bioinformtis: A Prtil Introdution Suffix tree

2 Overview Wht is suffix tree? Simple pplition of suffix tree Liner time lorithm for onstrutin suffix tree Suffix rry FM-index 1-mismth serh

3 Suffix Trie E.. onsider the strin S = Suffix Trie: ties of ll possile suffies of S Suffix

4 Suffix Tree Pth-lel of node v is Denoted s α(v) Suffix tree for S=: mere nodes with only one hild 7 v S= is n ede lel This is lef ede

5 Size of Suffix Tree (I) How i is suffix tree? Suffix tree hs extly n leves nd t most 2n-1 edes The totl lenth of ll ede lels is O(n 2 ). Cn we store suffix tree usin o(n 2 ) it spe? S=

6 Size of Suffix Tree (II) Suffix tree hs extly n leves nd t most 2n-1 edes Note tht eh ede lel n e represented usin 2 indies Thus, suffix tree n e represented usin O(n lo n) its 7 4,7 7,7 6,7 2,3 6, ,1 6, ,7 2,3 6,7 6 S= Note: The end index of every lef ede should e 7, the lst index of S. Thus, for lef edes, we only need to store the strt index.

7 Property of suffix tree Ft: For ny internl node v in the suffix tree, if the pth lel of v is α(v)=p, then there exists nother node w in the suffix tree suh tht α(w)=p. Proof: Skip the proof. Definition of Suffix Link: For ny internl node v, define its suffix link sl(v) = w.

8 Suffix Link exmple S=

9 Generlized suffix tree Build suffix tree for two or more strins E.. S 1 = t#, S 2 = t 6 # 4 t t t t t t # # # # # t

10 Applitions of Suffix Tree

11 Ext strin mthin prolem To find ll ourrenes of Q in S (serhin) Serh for the node x in the suffix tree whih represent Q All the leves in the sutree rooted t x re the ourrenes Time: O( Q + o) where o is the totl no. of ourrenes E.. S = Q = Ourrenes: 1, 3

12 Lonest repeted sustrin prolem To find the lonest repeted sustrin in S Find the deepest internl node Time: O(n) E.. S = The lonest repet is.

13 Lonest ommon sustrin prolem To find the lonest ommon sustrin of two or more sequenes Note: 1970, Don Knuth onjetured tht liner time lorithm for this prolem is impossile Now, we know tht it n e solved in liner time. E.. onsider two strin S 1 nd S 2, 1. Build enerlized suffix tree for S 1 # nd S 2 2. Then, mrk eh internl node with leves representin suffixes of oth S 1 nd S Report the deepest mrked node

14 Exmple for the lonest ommon sustrin E.. S 1 = t#, S 2 = t The lonest ommon sustrin is. Its lenth is 2. 6 # t 4 t t t t # t # # t # #

15 Lonest ommon prefix (I) Given strin S. For ny i, j, Denote lp(i, j) e the lenth of the lonest ommon prefix of suffix i nd j of S S= The lonest ommon prefix of suffix 1 nd suffix 3 is! lp(1, 3) = 3

16 Lonest ommon prefix (II) Note tht the lowest ommon nestor(l) of leves i nd j identifies the lonest ommon prefix. lp(i, j) = α(l(i, j)). A well-know result: Consider tree of size n, fter n O(n) time preproessin, the l for ny two nodes n e returned in O(1) time. First otined y Hrel nd Trjn (SIAM J. Comp. 1984) Simplified y Shieer nd Vishkin (SIAM J. Comp. 1988) Bsed on the ove result, After n O(n) time preproessin, For ny suffix i nd suffix j, we n ompute the lonest ommon prefix of them in O(1) time.

17 Findin Plindrome (I) Given strin S, plindrome is sustrin u of S s.t. u = u r E.. ACAGACA Consider plindrome u=s[i..i+ u -1], u is lled mximl plindrome if S[i..j ] is not plindrome for ny [i..j ] [i..i+ u -1]. Note tht every plindrome is ontined in mximl plindrome. Thus, mximl plindromes re ompt wy to represent ll plindromes. Complemented Plindrome is strin u s.t. u = ū r E.. ACAUGU Mximl omplemented plindrome is defined similrly.

18 Findin Plindrome (II) Rell tht restrition enzyme usully is in the form of omplemented plindrome. This motivtes the followin two prolems: The plindrome prolem: Given strin S (representin the enome) of lenth n, the prolem is to lote ll mximl plindromes in S. The omplemented plindrome prolem: Given strin S (representin the enome) of lenth n, the prolem is to lote ll mximl omplemented plindromes in S.

19 Properties of plindrome (I) If S[i..i+k-1]=S r [n-i+1..n-i+k], then u=s[i-k+1..i+k-1] is n odd lenth plindrome 1 i-k+1 i i+k-1 n S 1 n-i+1 n S r

20 Properties of plindrome (II) If S[i..i+k-1]=S r [n-i+2..n-i+k+1], then u=s[i-k..i+k-1] is n even lenth plindrome S 1 i-k i i+k-1 n S r 1 n-i+2 n

21 Solution to the plindrome prolem Preproess S nd S r so tht ny lonest ommon prefix query n e nswered in onstnt time. For i=1 to n, Find the lonest ommon prefix for (S i, S r n-i+1). If the lonest prefix is k, we find n odd lenth mximl plindrome S[i-k+1..i+k-1]. Find the lonest ommon prefix for (S i, S r n-i+2). If the lonest prefix is k, we find n even lenth mximl plindrome S[i-k..i+k-1].

22 Extrtin emedded suffix tree from enerlized suffix tree 6 Input: The enerlized suffix tree T of K strins S 1,, S K. Aim: Compute the suffix tree T i of the strin S i. r # t w x y 4 z t t t t t t # # # # # T r # w 6 t # t t t t # # # # T 1 S 1 = t#, S 2 = t

23 6 Extrtin emedded suffix tree from enerlized suffix tree Oservtion: T i is sutree of T suh tht The leves of T i re the leves of T orrespondin to S i. The internl nodes of T i re the lowest ommon nestors of some leves for S i. The edes of T i n e inferred from the nestor desendent reltionship mon those nodes. r # t w x y 4 z t t t t t t # # # # # T r # w 6 t # t t t t # # # # T 1 S 1 = t#, S 2 = t

24 Extrtin emedded suffix tree from enerlized suffix tree r t t t t # w t # 5 # # 6 # # r t t t # w t # # 6 # # r t t w t # # 6 # # r 4 1 t w t # 6 # # r 1 t 6 # # r 6 #

25 Common sustrins of more thn 2 strins (I) Given set of strins (protein or DNA sequenes), we wnt to know wht sustrins re ommon to lre numer of these strins? Why this question is importnt? DNA nd protein sequenes will evolve. If sustrin our ommonly in wide rne of speies. This my men tht the sustrin is ritil for the orret funtionlity.

26 Common sustrins of more thn 2 strins (II) Given K strins whose totl lenth is n. For every 2 k K, define l(k) e the lenth of the lonest sustrin ommon to t lest k of these strins. The prolem is to ompute l(k) for ll k.

27 Common sustrins of more thn 2 strins (III) Exmple: Consider set of 5 strins { sndollr, sndlot, hndler, rnd, pntry } Then, we hve k l(k) orrespondin sustrin 2 4 snd 3 3 nd 4 3 nd 5 2 n

28 Common sustrins of more thn 2 strins (IV) Illustrtin the solution y exmple: S 1 =, S 2 = #, S 3 = %. (K=3) 1. Build enerlized suffix tree T for the K strins in O(n) time. 4 % 5 3 # 5 % 1 # # % 3 1 # 2 % 4 2 # 3

29 Common sustrins of more thn 2 strins (V) 2. By trversin T, for eh internl node v, ompute its strin depth. In totl, O(n) time. 0 4 % 5 3 # 5 % # # 2 % 3 1 # 2 1 % 4 2 # 3

30 Common sustrins of more thn 2 strins (VI) 3. By trversin T, for eh internl node v, ompute C(v). [C(v) is defined s the numer of distint termintion symols in the sutree rooted t v] This step tkes O(Kn) time. 3 4 % 5 3 # 5 % # # 3 % 3 1 # 2 3 % 4 2 # 3

31 Common sustrins of more thn 2 strins (VII) 4. Trverse T nd visit every internl node v. For eh v, if V(C(v)) < strin-depth of v, set V(C(v)) = strin-depth of v. [After step 4, V(k) = the lenth of the lonest sustrin ommon to extly k of these strins.] 5. l(k)=v(k). For i=k-1 downto 2, l(i)=mx{l(i+1), V(i)}. This two steps tke O(n) time. For our exmple, V(2) = 3, V(3) = 2. Thus, l(3) = 2, l(2) = 3. In totl, this lorithm tkes O(Kn) time. Atully, we n improve this lorithm to O(n) time y men of lp!

32 Liner time lorithm for onstrutin suffix tree

33 Strihtforwrd onstrution of suffix tree Consider S = s 1 s 2 s n where s n = Alorithm: Initilize the tree with only root For i = n to 1 Inludes S[i..n] into the tree Time: O(n 2 )

34 Exmple of onstrution S= Init For-loop I 5 I 4 I 3 I 2 I 1

35 Constrution of enerlized suffix tree S = # Init For-loop # # # I 1 J 2 J 1

36 Cn we onstrut suffix tree in o(n 2 ) time? Yes. We n onstrut it in O(n) time. Weiner s lorithm [1973] Liner time for onstnt size lphet, ut muh spe MGreiht s lorithm [JACM 1976] Liner time for onstnt size lphet, qudrti spe Ukkonen s lorithm [Alorithmi, 1995] Online lorithm, liner time for onstnt size lphet, less spe Frh s lorithm [FOCS 1997] Liner time for enerl lphet Hon,Sdkne, nd Sun s lorithm [FOCS 2003] O(n) it spe O(n lo e n) time for 0<e<1 O(n) it spe O(n) time for suffix rry onstrution We will disuss Frh s lorithm lter.

37 Ide Build Odd Suffix Tree nd Even Suffix Tree Then, mere odd nd even suffix tree Even Suffix Tree Odd Suffix Tree

38 Ide Input: strin S of lenth n 1. Reursively ompute the suffix tree T o of ll suffixes einnin t the odd positions. T o is of size n/2. 2. From T o, ompute T e whih is the suffix tree for ll suffixes einnin t the even positions. 3. Mere T o nd T e to form the suffix tree for S.

39 Ste 1: Construtin odd suffix tree Given strin S[1..n], we enerte new strin S [1..n/2] s follows. we mp pirs of hrters into sinle hrters s follows: S[1..2], S[3..4], S[5..6],, S[n-1..n]. Remove the duplites from the pirs of hrters nd sort them y rdix sort. S [i] = rnk of S[2i-1..2i] in the sorted list, for i=1, 2,, n/2. By reursion, we et the suffix tree T for S Convert T to the odd suffix tree T o.

40 Exmple (I) S = S[1..2]=, S[3..4]=, S[5..6]=, S[7..8]=, S[9..10]=, S[11..12]=. By stle sort, < < <. Rnk()=1, Rnk()=2, Rnk()=3, Rnk()=4. So, S =

41 Exmple (II) By reursion, onstrut the suffix tree T for S :

42 Exmple (III) Convert T to the odd tree: 13 5 i 2i This is not suffix tree

43 Exmple (IV) Refine the odd tree T o :

44 Time omplexity for uildin the odd tree Let Time(n) e the time to uild suffix tree for strin of lenth n. Stle sortin nd refinement of the odd trees tke O(n) time. Build suffix tree for S tkes Time(n/2). So, Ste 1 tkes Time(n/2)+O(n) time.

45 Ste 2: Build the even tree 1. Generte the lex-orderin of the leves in T e. 2. For ny two djent leves 2i nd 2j, we find lp(2i, 2j). 3. Construt the even tree T e from left to riht (ordin to the lex-orderin).

46 Build the even tree (Step 1) We et the lex-orderin of the leves in T o. Generte the lex-orderin of the leves in T e. For eh lef i in T o, et the preedin hrter =S[i-1] nd form pir (,i). Eh pir represents even suffix i-1. Perform stle sortin on those pirs. We et the lex-orderin of the leves in T e.

47 Exmple S = Lex-orderin of the leves in T o : 13 < 1 < 7 < 3 < 11 < 9 < 5 The pirs re: (,13), (,1), (,7), (, 3), (, 11), (, 9), (, 5). After stle sortin, we hve (, 1), (, 13), (, 3), (, 11), (, 7), (, 9), (, 5). Hene, the lex-orderin of the leves of T e : 12 < 2 < 10 < 6 < 8 < 4

48 Build the even tree (Step 2) For ny two djent leves 2i nd 2j, we first find lp(2i, 2j). Oservtion: lp(2i, 2j) = lp(2i+1, 2j+1)+1 if S[2i]=S[2j] 0 otherwise Proof: If S[2i] S[2j], lp(2i,2j)=0. Otherwise, lp(2i,2j)=1+lp(2i+1,2j+1).

49 Exmple Rell tht the lexorderin of leves: 12 < 2 < 10 < 6 < 8 < 4. By the previous oservtion, we hve lp(8,4)=lp(9,5)+1=2 Similrly, we hve lp(12,2)=1, lp(2,10)=1, lp(10,6)=0, lp(6,8)=1, lp(8,4)=

50 Build the even tree (Step 3) Construt the even tree T e from left to riht

51 Build the even tree (Step 3)

52 Time omplexity for uildin the even tree Step 1: O(n) time Step 2: O(n) time Step 3: O(n) time

53 Ste 3: Mere odd nd even trees We n mere T o nd T e y DFS. However, it tkes O(n 2 ) time Odd tree Even tree ,11 1,1,1,1,1,1,1,2,1,10 10,4,9,1,1,1,1,2 11 9, ,2,2 8,5 5,8 4

54 Ste 3: Mere odd nd even trees We mere T o nd T e y DFS. We mere two edes s lon s they strt with the sme hrter. The mere is ended when one ede is loner thn the other

55 Mere odd nd even trees We mere T o nd T e y DFS. We mere two edes s lon s they strt with the sme hrter. The mere is ended when one ede is loner thn the other.,1,1,1 13,1,2,1,1 12 2,1 1,11,1 10,4 7,9,1,3 11 9,3 6 3,4 8,3 5,8 4

56 Mere odd nd even trees The merin my over-mered some nodes. To orret the tree, we need to unmere some nodes.,1,1,1 13,1,2,1,1 12,11,1 10,1,3,4 8,8 2,1,4 7,9 11 9,3,

57 Definition of L() nd d() For every node u whih my e over-mered, there exist two leves 2i nd 2j-1 suh tht u=l(2i, 2j-1). Denote L(u) e the orret depth of u, tht is, lp(2i,2j-1). Note tht lp(2i,2j-1) = 1+lp(2i+1,2j) if S[2i]=S[2j-1]; 0 otherwise. Let v e l(2i+1,2j). Denote d(u) = v. Note tht d() is equivlent to suffix link!

58 Exmple of d(),1,1,1 13,1,2,1,1 d() 12,11,1 10,1,3,4 8,8 2,1 1,4 7,9 11 9,3 6 3,3 5 4

59 Reltionship etween L() nd d() Suppose u = l(2i, 2j-1). Note1: if u is not the root, then S[2i]=S[2j-1]. Note2: lp(2i,2j-1) = 1+lp(2i+1,2j) if S[2i]=S[2j-1]; 0 otherwise Note3: d(u) = lp(2i+1,2j) Hene, L(u) = 1 + L(d(u)) if u is not the root. Otherwise, L(u)=0. Lemm: L(u) = the lenth of the purple pth from u to the root.

60 Exmple of L() 13 2,1 12,1 L( )=1,1,1,11,1 10,2,4 7,1 L( )=2,9 L( )=1,1,1,1,3 11 9, L( )=2 L( )=2 L( )=3,4 8,3 5 L( )=2,8 L( )=4 4

61 Unmere the order nodes sed on L() (I) ,1,1 L( )=1,1,1,11,1 10,2,4 7,1 L( )=2,9 L( )=1,1,1,1,3 11 9, L( )=2 L( )=2 L( )=3,4 8,3 5 L( )=2,8 L( )=4 4

62 Unmere the order nodes sed on L() (II),1,1, ,11,1,1,1,2,1,10 10,4,9,1,1,1,1,2 11 9, ,2,2 8,5 5,8 4

63 Time omplexity for merin Mere the tree usin DFS tkes O(n) time. Compute the links d() tkes O(n) time. Compute L() tkes O(n) time. Unmere tkes O(n) time.

64 Totl time omplexity of Frh s lorithm Ste 1: Time(n/2)+O(n) Ste 2: O(n) Ste 3: O(n) Thus, Time(n) = Time(n/2)+O(n). By solvin the eqution, Time(n)= O(n).

65 Disdvnte of suffix tree Suffix tree is spe ineffiient. It requires O(n Σ lo n) its. Mner nd Myers (SIAM J. Comp 1993) proposes new dt struture, lled suffix rry, whih hs similr funtionlity s suffix tree. Moreover, it only requires O(n lo n) its.

66 Suffix Arry (I) It is just sorted suffixes. E.. onsider S = Suffix Position SA[i] Suffix => 3 4 Sort Suffix rry is n rry of n indies. Thus, it tkes O(n lo n) its.

67 Oservtion The leves of suffix tree is in suffix rry order SA[i] Suffix

68 Liner time onstrution of suffix rry from suffix tree Rell tht the suffix tree T of S[1..n] n e onstruted in O(n) time. Then, y lexil depth-first trversl of T, the suffix rry of S is otined. This tkes O(n) time. However, the spe used durin onstrution is the sme s tht for suffix tree! This defets the purpose of suffix rry. Tody, we n uild suffix rry usin O(n) it spe nd O(n) time.

69 rne(t,q) For pttern Q, its ourrenes in T form onseutive SA rne. Exmple: For T=, ours in SA[5] nd SA[6]. Definition: We lled rne(t,q)=[st..ed] if Q is prefix of every T j for j=sa[st], SA[st+1],, SA[ed] where T j = j suffix of T = T[j..n]. Exmple: rne(t,)=[5..6] SA[i] Suffix

70 Find ourrene of query Q in strin S usin suffix rry Input: (1) the suffix rry of strin T of lenth n nd (2) query Q of lenth m Aim: hek if Q ours in T Ide: inry serh!

71 Alorithm

72 Exmple Consider T = Pttern Q = L=1 R=7 M=(L+R)/2=4 i SA[i] Suffix

73 Exmple Consider T = Pttern Q = L=1 R=7 M=(L+R)/2=4 suffix-sa[m] > Q. Set R=M=4. i SA[i] Suffix

74 Exmple Consider T = Pttern Q = L=1 R=4 M=(L+R)/2=2 suffix-sa[m] < Q. Set L=M=2. i SA[i] Suffix

75 Exmple Consider T = i SA[i] Suffix Pttern Q = L=2 R=4 M=(L+R)/2= The pttern Q is found t SA[M]=3.

76 Cn we do etter? Durin eh step of inry serh, we need to ompre Q with suffix usin O(m) time, whih is time onsumin. Cn we do etter? We hve the followin oservtion. Suppose LCP(Q, suffix-sa[l]) is l nd LCP(Q, suffix-sa[r]) is r. Then, LCP(Q, suffix-sa[m]) > min{l,r}. Below, we desrie how to utilize this oservtion to speedup the omputtion.

77 Alorithm

78 Exmple Consider T = Pttern Q = L=1, l=0 R=7, r=0 mlr = min(l,r)=0 M=(L+R)/2=4 i SA[i] Suffix

79 Exmple Consider T = Pttern Q = L=1, l=0 R=7, r=0 mlr = min(l,r)=0 M=(L+R)/2=4, m=1 The (m+1) hr of suffix-sa[m] is. The (m+1) hr of Q is. So, suffix-sa[m] > Q. Set R=M=4 nd r=m=1. i SA[i] Suffix

80 Exmple Consider T = Pttern Q = L=1, l=0 R=4, r=1 mlr = min(l,r)=0 M=(L+R)/2=2, m=3 The (m+1) hr of suffix-sa[m] is. The (m+1) hr of Q is. So, suffix-sa[m] < Q. Set L=M=2 nd l=m=3. i SA[i] Suffix

81 Exmple Consider T = Pttern Q = L=2, l=3 R=4, r=1 mlr = min(l,r)=1 M=(L+R)/2=3, m=4 i SA[i] Suffix The pttern Q is found t SA[M]=3.

82 Time nlysis Binry serh will perform lo n omprisons Eh omprison tkes t most O(m) time In the worst se, O(m lo n) time. Myers nd Mner report tht, in prtie, the time is O(m + lo n).

83 Suffix rry nd suffix tree We show one exmple of replin suffix tree y suffix rry Note tht most pplitions relted to suffix tree n e solved usin suffix rry with some time low up! When spe is limited, replin suffix tree y suffix rry is ood hoie.

84 The size is still too i! Why? DNA sequenes n e very lon! E.. Fly: ~100M ses, Humn: ~3G ses, Tree: ~9G ses Store to store indexin dt struture for humn enome Suffix Tree: ~40G ytes Suffix Arry: ~13G ytes Cn we further redue the spe?

85 Solution Grossi, Vitter (STOC2000) Compressed suffix rry (CSA) Ferrine, Mnzini (FOCS2000) FM-index Both of them n e stored in O(n) it spe For Humn Genome Both CSA nd FM-index n e stored within 2G ytes.

86 FM-index Consider text T= FM-index stores: A. The strin BW= B. C[x] = totl no. of ourrenes of eh symol less thn x. E.. C[]=1, C[]=4, C[]=6, C[t]=7 C. A dt-struture o(x, i) whih tells us the numer of ourrenes of x in BW[1..i] usin O(1) time. SA[i] Suffix T[SA[i]-1]

87 Dt-struture for nswerin the o(x, i) query? BW 1 lo 2 n n/lo 2 n lo n lo n 00x0xxx0x0 Given the text BW[1..n], we divide BW[1..n] into ukets of size lo 2 n. For eh uket i = 1,, n/lo 2 n, we store P[i] = numer of x s in BW[1.. ilo 2 n]. Eh uket is further sudivided into lo n su-ukets of size lo n. For eh su-uket j of the uket i, we store Q[i][j] = numer of x s in the first j su-ukets.

88 Dt-struture for nswerin the o(x, i) query? We lso need lookup tle rnk(,k) is ny strin of lenth (lo n)/2 1 k (lo n)/2 rnk(,k) = numer of x in the first k hrters of (lo n)/ x0x xx x Numer of = 2 lon 2 Eh entry tkes O(lo lo n) its Totl spe = O( lo 2 n entries = n lo 2 n lo n lo lo n) = o( n) its n

89 Spe omplexity of the o() dt-struture P[1..n/lo 2 n] uses O(n/lo n) its Q[1..n/lo 2 n][1..lo n] uses O(n lo lo n / lo n) its rnk(,k) uses 2 lon/2 (lo n/2) = o(n) its In totl, we use O(n lo lo n / lo n) its.

90 How to ompute o(x,i)? BW 1 lo 2 n n/lo 2 n lo n lo n 00x0xxx0x0 Suppose lo n = 10. To ompute o(x, 327), The result is P[3]+Q[4][2]+rnk(00x0x,5)+rnk(xx0x0,2) Hene, O(1) time to ompute o(x,i).

91 Size of FM-index Struture A n e store in 2n its Struture B n e store in O(lo n) its Struture C n e store in O(n lo lo n/lo n) its In totl, the size of FM-index is O(n) its.

92 Oservtion C[x]+o(x,i) is the numer of suffixes smller thn of xt SA[i]. Exmple: C[]=4 o(, 6)=2 Numer of suffixes smller thn T[SA[6]..n]= is 6. SA[i] Suffix T[SA[i]-1]

93 Lemm Suppose rne(t,q) is [st..ed]. Then, rne(t,xq) = [p..q] where p = C[x] + o(x,st-1) + 1 q = C[x] + o(x,ed) Proof: p = 1+numer of suffixes stritly smller thn xq. The ltter term = numer of suffixes smller thn or equl to xt SA[st-1]. q = numer of suffixes smller thn or equl to xt SA[ed].

94 Bkwrd serh Given the text T nd the FM-index, we wnt to determine if Q exists in T. Alorithm BW_exist(Q[1..m]) 1. x=q[m], i=m-1; 2. /* find rne(t,q[m]) */ st = C[x]+1, ed = C[x+1]; 3. while (st ed nd i>1) { /* find rne(t, Q[i-1..m]) */ x = Q[i-1]; st = C[x] + o(x, st-1) + 1; ed = C[x] + o(x, ed); i = i 1; } 4. if st > ed, then pttern not found else pttern found.

95 Exmple T = Q = Q[3..3]= sp=c[] +1 =1+1=2 ep=c[] =4 SA[i] Suffix T[SA[i]-1]

96 Exmple T = Q = Q[2..3]= st=c[]+o(,st old -1)+1 =4+0+1=5 ed=c[]+o(,ed old ) =4+2=6 SA[i] Suffix T[SA[i]-1]

97 Exmple T = Q = Q[1..3]= st=c[]+o(,st old -1)+1 =1+0+1=2 ed=c[]+o(,ed old ) =1+2=3 SA[i] Suffix T[SA[i]-1]

98 Exmple T = Q = Q[1..3]= st=c[]+o(,st old -1)+1 =1+0+1=2 ed=c[]+o(,ed old ) =1+2=3 Q ourrenes in T! SA[i] Suffix T[SA[i]-1]

99 Time omplexity of Bkwrd Serh To find pttern Q[1..m] Step 1, 2, nd 4 n e omputed in O(1) time. For step 3, We need to iterte the loop for m-1 times. Eh itertion of the loop n e omputed in O(1) time. The loop tkes O(m) time. In totl, O(m) time for kwrd serh.

100 Conlusion Suffix tree is powerful dt-struture whih hs lot of pplitions in Computtionl Bioloy. Prolems: Suffix tree is too i! Cn e solved usin CSA nd FM-index Suffix tree n only solve ext mth prolem! (Most of the ioloy prolems re pproximte mth!) Mny works hve een done on this re! But still not prtil. One of the importnt re to explore!

101 1-mismth prolem

102 1-mismth prolem Index: the suffix tree of text T[1..n] For ny pttern P[1..m], the 1-mismth prolem finds ll ourrenes of P in T tht hs hmmin distne t most 1. Exmple: P = ACGT T = AACGTGGCCAACTTGGA

103 Nïve solution Index: rete the suffix tree for T Alorithm for query: Generte ll possile 1-mismth ptterns of P. Find ourrenes of every 1-mismth pttern Runnin time: There re Σ m possile 1-mismth ptterns. Usin suffix tree, it tkes O(m) time to find ourrenes of eh 1-mismth pttern In totl, O(m 2 +o) time where o is totl numer of ourrenes.

104 Any other solutions? Cos [CPM 1995] O(n lo n) it index, O(m 2 +o) query time. Amir et l [Journl of Alorithm 2000] O(n lo 3 n) it index, O(m lo n lolo n + o) query time. Buhsum et l [ESA 2000] O(n lo 2 n) it index, O(m lo lo n + o) query time. Cole et l [SODA 2004] O(n lo 2 n) it index, O(m + lo lo n + o) query time. Trinh et l [CPM 2004] O(n lo n) it index, O(m lo n + o) query time. Lm et l [ISAAC 2005] O(n sqrt(lo n)) it index, O(m lo lo n + o) query time. Chn et l [ESA 2006] O(n lo n) it index, O(m + lo lo n + o) query time. Tody, we hve look of the solution of Trinh et l.

105 Index Suffix rry of T SA[1..n] Inversed suffix rry of T SA -1 [1..n] where SA[SA -1 [i]] = i. Definition: We lled rne(t,p)=[st..ed] if SA[i] 7 Suffix P is prefix of every T j for j=sa[st], SA[st+1],, SA[ed] where T j = j suffix of T = T[j..n]. Exmple: For T=, rne(t,)=[5..6] i

106 Lemm 1 (Forwrd serh) Assume [st..ed]=rne(t,p). We n ompute [st..ed ]=rne(t,p) in O(lo n) time. Proof: inry serh on SA.

107 Lemm 2 Assume [st 1..ed 1 ]=rne(t,p 1 ) nd [st 2..ed 2 ]=rne(t,p 2 ). We n ompute [st..ed]=rne(t,p 1 P 2 ) in O(lo n) time. Proof: Let the lenth of P 1 e k. Note tht T SA[st1],T SA[st1+1],,T SA[ed1] re lexiorphilly inresin. Hene, T SA[st1]+k,T SA[st1+1]+k,,T SA[ed1]+k re lexiorphilly inresin. Thus, SA -1 [SA[st 1 ]+k] < SA -1 [SA[st 1 +1]+k] < < SA -1 [SA[ed 1 ]+k]. To find st nd ed, we need to find the smllest st suh tht st 2 < SA -1 [SA[st]+k]<ed 2 nd the lrest ed suh tht st 2 < SA -1 [SA[ed]+k] < ed 2. This n e done y inry serh.

108 Exmple T= P 1 =, P 2 =. SA[i] 7 rne(t,p 1 )=[2..4], rne(t,p 2 )=[5..6] 7 To find rne(t,p 1 P 2 ), we do the followin: Note tht T SA[2] <T SA[3] <T SA[4] re einnin with. So, T SA[2]+1 <T SA[3]+1 <T SA[4]+1. Note tht 6 SA -1 [SA[2]+1]=5, SA -1 [SA[3]+1]=6, SA -1 [SA[4]+1]=7. Hene, T SA[5] <T SA[6] <T SA[7]. Amon the three suffixes, we need to identify suffix einnin with P 2. Sine rne(t,p 2 )=[5..6], oth T SA[5] nd T SA[6] ontin P 2. As SA -1 [SA[2]+1]=5 nd SA -1 [SA[3]+1]=6, we hve rne(t,p 1 P 2 ) = [2..3]. i Suffix

109 Lemm 3 (Bkwrd serh) Assume [st..ed]=rne(t,p). We n ompute [st..ed ]=rne(t,p) in O(lo n) time. Proof: Let P 1 =, P 2 =P. By Lemm 2, rne(t,p 1 P 2 ) n e omputed in O(lo n) time.

110 Alorithm 1. For j=m,m-1,,1, By kwrd serh, find rne(t,p[j..m]). 2. For j=1,2,,m, By forwrd serh, find rne(t,p[1..j]). 3. Report ll ourrenes of rne(t,p[1..m]) 4. For j=1,2,,m, Let P 1 =P[1..j-1], P 2 =P[j+1..m] For every hrter P[j], By forwrd serh, find rne(t,p 1 ) By Lemm 2, find rne(t,p 1 P 2 ) Report ll ourrenes of rne(t,p 1 P 2 )

111 Time nlysis Inorin the O(o) reportin time! Steps 1 nd 2 tke O(m lo n) time. Step 4 tries ll possile mismthes. There re in totl Σ m mismthes. For eh mismth, it tke O(lo n) time. So, Step 4 tkes O( Σ m lo n) time. In totl, the runnin time is O( Σ m lo n + o). Note tht this solution n e enerlized to hndle k-mismth or k-differene.

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

Fast index for approximate string matching

Fast index for approximate string matching Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

Algorithms for bioinformatics Part 2: Data structures

Algorithms for bioinformatics Part 2: Data structures Alorithms for bioinformtics Prt 2: Dt structures Greory Kucherov LIGM/CNRS Mrne-l-Vllée Pln Clssicl indexes Suffix trees DAWG nd Position heps Suffix rrys Succinct (compressed) indexes Burrows-Wheeler

More information

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Dt Strutures nd Algorithm Xioqing Zheng zhengxq@fudn.edu.n String mthing prolem Pttern P ours with shift s in text T (or, equivlently, tht pttern P ours eginning t position s + in text T) if T[s +... s

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

Prefix-Free Regular-Expression Matching

Prefix-Free Regular-Expression Matching Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15 Pttern Mthing Given pttern P nd text T, find ll sustrings

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information

On-Line Construction. of Suffix Trees. Overview. Suffix Trees. Notations. goo. Suffix tries

On-Line Construction. of Suffix Trees. Overview. Suffix Trees. Notations. goo. Suffix tries On-Line Cnstrutin Overview Suffix tries f Suffix Trees E. Ukknen On-line nstrutin f suffix tries in qudrti time Suffix trees On-line nstrutin f suffix trees in liner time Applitins 1 2 Suffix Trees A suffix

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

Section 1.3 Triangles

Section 1.3 Triangles Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior

More information

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES PAIR OF LINEAR EQUATIONS IN TWO VARIABLES. Two liner equtions in the sme two vriles re lled pir of liner equtions in two vriles. The most generl form of pir of liner equtions is x + y + 0 x + y + 0 where,,,,,,

More information

QUADRATIC EQUATION. Contents

QUADRATIC EQUATION. Contents QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely

More information

Data Structures (INE2011)

Data Structures (INE2011) Dt Strutures (INE2011) Eletronis nd Communition Engineering Hnyng University Hewoon Nm Leture 7 INE2011 Dt Strutures 1 Binry Tree Trversl Mny inry tree opertions re done y perorming trversl o the inry

More information

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite! Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

CS241 Week 6 Tutorial Solutions

CS241 Week 6 Tutorial Solutions 241 Week 6 Tutoril olutions Lnguges: nning & ontext-free Grmmrs Winter 2018 1 nning Exerises 1. 0x0x0xd HEXINT 0x0 I x0xd 2. 0xend--- HEXINT 0xe I nd ER -- MINU - 3. 1234-120x INT 1234 INT -120 I x 4.

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α Disrete Strutures, Test 2 Mondy, Mrh 28, 2016 SOLUTIONS, VERSION α α 1. (18 pts) Short nswer. Put your nswer in the ox. No prtil redit. () Consider the reltion R on {,,, d with mtrix digrph of R.. Drw

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Comparing the Pre-image and Image of a Dilation

Comparing the Pre-image and Image of a Dilation hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

Figure 1. The left-handed and right-handed trefoils

Figure 1. The left-handed and right-handed trefoils The Knot Group A knot is n emedding of the irle into R 3 (or S 3 ), k : S 1 R 3. We shll ssume our knots re tme, mening the emedding n e extended to solid torus, K : S 1 D 2 R 3. The imge is lled tuulr

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations Leture XML into RDBMS XML n Dtses Sestin Mneth NICTA n UNSW Leture XML Into RDBMS CSE@UNSW -- Semester, 00 Memory Representtions Memory Representtions Fts DOM is esy to use, ut memory hevy. in-memory size

More information

Pre-Lie algebras, rooted trees and related algebraic structures

Pre-Lie algebras, rooted trees and related algebraic structures Pre-Lie lgers, rooted trees nd relted lgeri strutures Mrh 23, 2004 Definition 1 A pre-lie lger is vetor spe W with mp : W W W suh tht (x y) z x (y z) = (x z) y x (z y). (1) Exmple 2 All ssoitive lgers

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

The Word Problem in Quandles

The Word Problem in Quandles The Word Prolem in Qundles Benjmin Fish Advisor: Ren Levitt April 5, 2013 1 1 Introdution A word over n lger A is finite sequene of elements of A, prentheses, nd opertions of A defined reursively: Given

More information

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing

11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing I9 Introdution to Bioinformtis, 0 Indeing tehniques Yuzhen Ye (yye@indin.edu) Shool of Informtis & Computing, IUB Contents We hve seen indeing tehnique used in BLAST Applitions tht rely on n effiient indeing

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions MEP: Demonstrtion Projet UNIT 4: Trigonometry UNIT 4 Trigonometry tivities tivities 4. Pythgors' Theorem 4.2 Spirls 4.3 linometers 4.4 Rdr 4.5 Posting Prels 4.6 Interloking Pipes 4.7 Sine Rule Notes nd

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information

System Validation (IN4387) November 2, 2012, 14:00-17:00

System Validation (IN4387) November 2, 2012, 14:00-17:00 System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Matrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24

Matrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24 Mtrix lger Mtrix ddition, Sclr Multipliction nd rnsposition Mtrix lger Section.. Mtrix ddition, Sclr Multipliction nd rnsposition rectngulr rry of numers is clled mtrix ( the plurl is mtrices ) nd the

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministi Finite utomt The Power of Guessing Tuesdy, Otoer 4, 2 Reding: Sipser.2 (first prt); Stoughton 3.3 3.5 S235 Lnguges nd utomt eprtment of omputer Siene Wellesley ollege Finite utomton (F)

More information

Transition systems (motivation)

Transition systems (motivation) Trnsition systems (motivtion) Course Modelling of Conurrent Systems ( Modellierung neenläufiger Systeme ) Winter Semester 2009/0 University of Duisurg-Essen Brr König Tehing ssistnt: Christoph Blume In

More information

NFAs continued, Closure Properties of Regular Languages

NFAs continued, Closure Properties of Regular Languages lgorithms & Models of omputtion S/EE 374, Spring 209 NFs continued, losure Properties of Regulr Lnguges Lecture 5 Tuesdy, Jnury 29, 209 Regulr Lnguges, DFs, NFs Lnguges ccepted y DFs, NFs, nd regulr expressions

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

Maintaining Mathematical Proficiency

Maintaining Mathematical Proficiency Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Siene Deprtment Compiler Design Spring 7 Lexil Anlysis Smple Exerises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sienes Institute 47 Admirlty Wy, Suite

More information

Dynamic Fully-Compressed Suffix Trees

Dynamic Fully-Compressed Suffix Trees Motivtion Dynmic FCST s Conclusions Dynmic Fully-Compressed Suffix Trees Luís M. S. Russo Gonzlo Nvrro Arlindo L. Oliveir INESC-ID/IST {lsr,ml}@lgos.inesc-id.pt Dept. of Computer Science, University of

More information

CSE 401 Compilers. Today s Agenda

CSE 401 Compilers. Today s Agenda CSE 401 Compilers Leture 3: Regulr Expressions & Snning, on?nued Mihel Ringenurg Tody s Agend Lst?me we reviewed lnguges nd grmmrs, nd riefly strted disussing regulr expressions. Tody I ll restrt the regulr

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

CHENG Chun Chor Litwin The Hong Kong Institute of Education

CHENG Chun Chor Litwin The Hong Kong Institute of Education PE-hing Mi terntionl onferene IV: novtion of Mthemtis Tehing nd Lerning through Lesson Study- onnetion etween ssessment nd Sujet Mtter HENG hun hor Litwin The Hong Kong stitute of Edution Report on using

More information

AVL Trees. D Oisín Kidney. August 2, 2018

AVL Trees. D Oisín Kidney. August 2, 2018 AVL Trees D Oisín Kidne August 2, 2018 Astrt This is verified implementtion of AVL trees in Agd, tking ides primril from Conor MBride s pper How to Keep Your Neighours in Order [2] nd the Agd stndrd lirr

More information

5. Every rational number have either terminating or repeating (recurring) decimal representation.

5. Every rational number have either terminating or repeating (recurring) decimal representation. CHAPTER NUMBER SYSTEMS Points to Rememer :. Numer used for ounting,,,,... re known s Nturl numers.. All nturl numers together with zero i.e. 0,,,,,... re known s whole numers.. All nturl numers, zero nd

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Factorising FACTORISING.

Factorising FACTORISING. Ftorising FACTORISING www.mthletis.om.u Ftorising FACTORISING Ftorising is the opposite of expning. It is the proess of putting expressions into rkets rther thn expning them out. In this setion you will

More information

Proving the Pythagorean Theorem

Proving the Pythagorean Theorem Proving the Pythgoren Theorem W. Bline Dowler June 30, 2010 Astrt Most people re fmilir with the formul 2 + 2 = 2. However, in most ses, this ws presented in lssroom s n solute with no ttempt t proof or

More information

Ling 3701H / Psych 3371H: Lecture Notes 9 Hierarchic Sequential Prediction

Ling 3701H / Psych 3371H: Lecture Notes 9 Hierarchic Sequential Prediction Ling 3701H / Psyh 3371H: Leture Notes 9 Hierrhi Sequentil Predition Contents 9.1 Complex events.................................... 1 9.2 Reognition of omplex events using event frgments................

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introdution to Bioinformtis Outline } Method without onsidering bkground distribution } Generl pproh onsidering bkground distribution } Wys to speed up the lgorithm Trnsription Ftor Binding Sites (TFBSs)

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

12.4 Similarity in Right Triangles

12.4 Similarity in Right Triangles Nme lss Dte 12.4 Similrit in Right Tringles Essentil Question: How does the ltitude to the hpotenuse of right tringle help ou use similr right tringles to solve prolems? Eplore Identifing Similrit in Right

More information

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

Computing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt

Computing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt Computing dt with spredsheets Exmple: Computing tringulr numers nd their squre roots. Rell, we showed 1 ` 2 ` `n npn ` 1q{2. Enter the following into the orresponding ells: A1: n B1: tringle C1: sqrt A2:

More information