Algorithms in Bioinformatics: A Practical Introduction. Suffix tree
|
|
- Silvester Heath
- 6 years ago
- Views:
Transcription
1 Alorithms in Bioinformtis: A Prtil Introdution Suffix tree
2 Overview Wht is suffix tree? Simple pplition of suffix tree Liner time lorithm for onstrutin suffix tree Suffix rry FM-index 1-mismth serh
3 Suffix Trie E.. onsider the strin S = Suffix Trie: ties of ll possile suffies of S Suffix
4 Suffix Tree Pth-lel of node v is Denoted s α(v) Suffix tree for S=: mere nodes with only one hild 7 v S= is n ede lel This is lef ede
5 Size of Suffix Tree (I) How i is suffix tree? Suffix tree hs extly n leves nd t most 2n-1 edes The totl lenth of ll ede lels is O(n 2 ). Cn we store suffix tree usin o(n 2 ) it spe? S=
6 Size of Suffix Tree (II) Suffix tree hs extly n leves nd t most 2n-1 edes Note tht eh ede lel n e represented usin 2 indies Thus, suffix tree n e represented usin O(n lo n) its 7 4,7 7,7 6,7 2,3 6, ,1 6, ,7 2,3 6,7 6 S= Note: The end index of every lef ede should e 7, the lst index of S. Thus, for lef edes, we only need to store the strt index.
7 Property of suffix tree Ft: For ny internl node v in the suffix tree, if the pth lel of v is α(v)=p, then there exists nother node w in the suffix tree suh tht α(w)=p. Proof: Skip the proof. Definition of Suffix Link: For ny internl node v, define its suffix link sl(v) = w.
8 Suffix Link exmple S=
9 Generlized suffix tree Build suffix tree for two or more strins E.. S 1 = t#, S 2 = t 6 # 4 t t t t t t # # # # # t
10 Applitions of Suffix Tree
11 Ext strin mthin prolem To find ll ourrenes of Q in S (serhin) Serh for the node x in the suffix tree whih represent Q All the leves in the sutree rooted t x re the ourrenes Time: O( Q + o) where o is the totl no. of ourrenes E.. S = Q = Ourrenes: 1, 3
12 Lonest repeted sustrin prolem To find the lonest repeted sustrin in S Find the deepest internl node Time: O(n) E.. S = The lonest repet is.
13 Lonest ommon sustrin prolem To find the lonest ommon sustrin of two or more sequenes Note: 1970, Don Knuth onjetured tht liner time lorithm for this prolem is impossile Now, we know tht it n e solved in liner time. E.. onsider two strin S 1 nd S 2, 1. Build enerlized suffix tree for S 1 # nd S 2 2. Then, mrk eh internl node with leves representin suffixes of oth S 1 nd S Report the deepest mrked node
14 Exmple for the lonest ommon sustrin E.. S 1 = t#, S 2 = t The lonest ommon sustrin is. Its lenth is 2. 6 # t 4 t t t t # t # # t # #
15 Lonest ommon prefix (I) Given strin S. For ny i, j, Denote lp(i, j) e the lenth of the lonest ommon prefix of suffix i nd j of S S= The lonest ommon prefix of suffix 1 nd suffix 3 is! lp(1, 3) = 3
16 Lonest ommon prefix (II) Note tht the lowest ommon nestor(l) of leves i nd j identifies the lonest ommon prefix. lp(i, j) = α(l(i, j)). A well-know result: Consider tree of size n, fter n O(n) time preproessin, the l for ny two nodes n e returned in O(1) time. First otined y Hrel nd Trjn (SIAM J. Comp. 1984) Simplified y Shieer nd Vishkin (SIAM J. Comp. 1988) Bsed on the ove result, After n O(n) time preproessin, For ny suffix i nd suffix j, we n ompute the lonest ommon prefix of them in O(1) time.
17 Findin Plindrome (I) Given strin S, plindrome is sustrin u of S s.t. u = u r E.. ACAGACA Consider plindrome u=s[i..i+ u -1], u is lled mximl plindrome if S[i..j ] is not plindrome for ny [i..j ] [i..i+ u -1]. Note tht every plindrome is ontined in mximl plindrome. Thus, mximl plindromes re ompt wy to represent ll plindromes. Complemented Plindrome is strin u s.t. u = ū r E.. ACAUGU Mximl omplemented plindrome is defined similrly.
18 Findin Plindrome (II) Rell tht restrition enzyme usully is in the form of omplemented plindrome. This motivtes the followin two prolems: The plindrome prolem: Given strin S (representin the enome) of lenth n, the prolem is to lote ll mximl plindromes in S. The omplemented plindrome prolem: Given strin S (representin the enome) of lenth n, the prolem is to lote ll mximl omplemented plindromes in S.
19 Properties of plindrome (I) If S[i..i+k-1]=S r [n-i+1..n-i+k], then u=s[i-k+1..i+k-1] is n odd lenth plindrome 1 i-k+1 i i+k-1 n S 1 n-i+1 n S r
20 Properties of plindrome (II) If S[i..i+k-1]=S r [n-i+2..n-i+k+1], then u=s[i-k..i+k-1] is n even lenth plindrome S 1 i-k i i+k-1 n S r 1 n-i+2 n
21 Solution to the plindrome prolem Preproess S nd S r so tht ny lonest ommon prefix query n e nswered in onstnt time. For i=1 to n, Find the lonest ommon prefix for (S i, S r n-i+1). If the lonest prefix is k, we find n odd lenth mximl plindrome S[i-k+1..i+k-1]. Find the lonest ommon prefix for (S i, S r n-i+2). If the lonest prefix is k, we find n even lenth mximl plindrome S[i-k..i+k-1].
22 Extrtin emedded suffix tree from enerlized suffix tree 6 Input: The enerlized suffix tree T of K strins S 1,, S K. Aim: Compute the suffix tree T i of the strin S i. r # t w x y 4 z t t t t t t # # # # # T r # w 6 t # t t t t # # # # T 1 S 1 = t#, S 2 = t
23 6 Extrtin emedded suffix tree from enerlized suffix tree Oservtion: T i is sutree of T suh tht The leves of T i re the leves of T orrespondin to S i. The internl nodes of T i re the lowest ommon nestors of some leves for S i. The edes of T i n e inferred from the nestor desendent reltionship mon those nodes. r # t w x y 4 z t t t t t t # # # # # T r # w 6 t # t t t t # # # # T 1 S 1 = t#, S 2 = t
24 Extrtin emedded suffix tree from enerlized suffix tree r t t t t # w t # 5 # # 6 # # r t t t # w t # # 6 # # r t t w t # # 6 # # r 4 1 t w t # 6 # # r 1 t 6 # # r 6 #
25 Common sustrins of more thn 2 strins (I) Given set of strins (protein or DNA sequenes), we wnt to know wht sustrins re ommon to lre numer of these strins? Why this question is importnt? DNA nd protein sequenes will evolve. If sustrin our ommonly in wide rne of speies. This my men tht the sustrin is ritil for the orret funtionlity.
26 Common sustrins of more thn 2 strins (II) Given K strins whose totl lenth is n. For every 2 k K, define l(k) e the lenth of the lonest sustrin ommon to t lest k of these strins. The prolem is to ompute l(k) for ll k.
27 Common sustrins of more thn 2 strins (III) Exmple: Consider set of 5 strins { sndollr, sndlot, hndler, rnd, pntry } Then, we hve k l(k) orrespondin sustrin 2 4 snd 3 3 nd 4 3 nd 5 2 n
28 Common sustrins of more thn 2 strins (IV) Illustrtin the solution y exmple: S 1 =, S 2 = #, S 3 = %. (K=3) 1. Build enerlized suffix tree T for the K strins in O(n) time. 4 % 5 3 # 5 % 1 # # % 3 1 # 2 % 4 2 # 3
29 Common sustrins of more thn 2 strins (V) 2. By trversin T, for eh internl node v, ompute its strin depth. In totl, O(n) time. 0 4 % 5 3 # 5 % # # 2 % 3 1 # 2 1 % 4 2 # 3
30 Common sustrins of more thn 2 strins (VI) 3. By trversin T, for eh internl node v, ompute C(v). [C(v) is defined s the numer of distint termintion symols in the sutree rooted t v] This step tkes O(Kn) time. 3 4 % 5 3 # 5 % # # 3 % 3 1 # 2 3 % 4 2 # 3
31 Common sustrins of more thn 2 strins (VII) 4. Trverse T nd visit every internl node v. For eh v, if V(C(v)) < strin-depth of v, set V(C(v)) = strin-depth of v. [After step 4, V(k) = the lenth of the lonest sustrin ommon to extly k of these strins.] 5. l(k)=v(k). For i=k-1 downto 2, l(i)=mx{l(i+1), V(i)}. This two steps tke O(n) time. For our exmple, V(2) = 3, V(3) = 2. Thus, l(3) = 2, l(2) = 3. In totl, this lorithm tkes O(Kn) time. Atully, we n improve this lorithm to O(n) time y men of lp!
32 Liner time lorithm for onstrutin suffix tree
33 Strihtforwrd onstrution of suffix tree Consider S = s 1 s 2 s n where s n = Alorithm: Initilize the tree with only root For i = n to 1 Inludes S[i..n] into the tree Time: O(n 2 )
34 Exmple of onstrution S= Init For-loop I 5 I 4 I 3 I 2 I 1
35 Constrution of enerlized suffix tree S = # Init For-loop # # # I 1 J 2 J 1
36 Cn we onstrut suffix tree in o(n 2 ) time? Yes. We n onstrut it in O(n) time. Weiner s lorithm [1973] Liner time for onstnt size lphet, ut muh spe MGreiht s lorithm [JACM 1976] Liner time for onstnt size lphet, qudrti spe Ukkonen s lorithm [Alorithmi, 1995] Online lorithm, liner time for onstnt size lphet, less spe Frh s lorithm [FOCS 1997] Liner time for enerl lphet Hon,Sdkne, nd Sun s lorithm [FOCS 2003] O(n) it spe O(n lo e n) time for 0<e<1 O(n) it spe O(n) time for suffix rry onstrution We will disuss Frh s lorithm lter.
37 Ide Build Odd Suffix Tree nd Even Suffix Tree Then, mere odd nd even suffix tree Even Suffix Tree Odd Suffix Tree
38 Ide Input: strin S of lenth n 1. Reursively ompute the suffix tree T o of ll suffixes einnin t the odd positions. T o is of size n/2. 2. From T o, ompute T e whih is the suffix tree for ll suffixes einnin t the even positions. 3. Mere T o nd T e to form the suffix tree for S.
39 Ste 1: Construtin odd suffix tree Given strin S[1..n], we enerte new strin S [1..n/2] s follows. we mp pirs of hrters into sinle hrters s follows: S[1..2], S[3..4], S[5..6],, S[n-1..n]. Remove the duplites from the pirs of hrters nd sort them y rdix sort. S [i] = rnk of S[2i-1..2i] in the sorted list, for i=1, 2,, n/2. By reursion, we et the suffix tree T for S Convert T to the odd suffix tree T o.
40 Exmple (I) S = S[1..2]=, S[3..4]=, S[5..6]=, S[7..8]=, S[9..10]=, S[11..12]=. By stle sort, < < <. Rnk()=1, Rnk()=2, Rnk()=3, Rnk()=4. So, S =
41 Exmple (II) By reursion, onstrut the suffix tree T for S :
42 Exmple (III) Convert T to the odd tree: 13 5 i 2i This is not suffix tree
43 Exmple (IV) Refine the odd tree T o :
44 Time omplexity for uildin the odd tree Let Time(n) e the time to uild suffix tree for strin of lenth n. Stle sortin nd refinement of the odd trees tke O(n) time. Build suffix tree for S tkes Time(n/2). So, Ste 1 tkes Time(n/2)+O(n) time.
45 Ste 2: Build the even tree 1. Generte the lex-orderin of the leves in T e. 2. For ny two djent leves 2i nd 2j, we find lp(2i, 2j). 3. Construt the even tree T e from left to riht (ordin to the lex-orderin).
46 Build the even tree (Step 1) We et the lex-orderin of the leves in T o. Generte the lex-orderin of the leves in T e. For eh lef i in T o, et the preedin hrter =S[i-1] nd form pir (,i). Eh pir represents even suffix i-1. Perform stle sortin on those pirs. We et the lex-orderin of the leves in T e.
47 Exmple S = Lex-orderin of the leves in T o : 13 < 1 < 7 < 3 < 11 < 9 < 5 The pirs re: (,13), (,1), (,7), (, 3), (, 11), (, 9), (, 5). After stle sortin, we hve (, 1), (, 13), (, 3), (, 11), (, 7), (, 9), (, 5). Hene, the lex-orderin of the leves of T e : 12 < 2 < 10 < 6 < 8 < 4
48 Build the even tree (Step 2) For ny two djent leves 2i nd 2j, we first find lp(2i, 2j). Oservtion: lp(2i, 2j) = lp(2i+1, 2j+1)+1 if S[2i]=S[2j] 0 otherwise Proof: If S[2i] S[2j], lp(2i,2j)=0. Otherwise, lp(2i,2j)=1+lp(2i+1,2j+1).
49 Exmple Rell tht the lexorderin of leves: 12 < 2 < 10 < 6 < 8 < 4. By the previous oservtion, we hve lp(8,4)=lp(9,5)+1=2 Similrly, we hve lp(12,2)=1, lp(2,10)=1, lp(10,6)=0, lp(6,8)=1, lp(8,4)=
50 Build the even tree (Step 3) Construt the even tree T e from left to riht
51 Build the even tree (Step 3)
52 Time omplexity for uildin the even tree Step 1: O(n) time Step 2: O(n) time Step 3: O(n) time
53 Ste 3: Mere odd nd even trees We n mere T o nd T e y DFS. However, it tkes O(n 2 ) time Odd tree Even tree ,11 1,1,1,1,1,1,1,2,1,10 10,4,9,1,1,1,1,2 11 9, ,2,2 8,5 5,8 4
54 Ste 3: Mere odd nd even trees We mere T o nd T e y DFS. We mere two edes s lon s they strt with the sme hrter. The mere is ended when one ede is loner thn the other
55 Mere odd nd even trees We mere T o nd T e y DFS. We mere two edes s lon s they strt with the sme hrter. The mere is ended when one ede is loner thn the other.,1,1,1 13,1,2,1,1 12 2,1 1,11,1 10,4 7,9,1,3 11 9,3 6 3,4 8,3 5,8 4
56 Mere odd nd even trees The merin my over-mered some nodes. To orret the tree, we need to unmere some nodes.,1,1,1 13,1,2,1,1 12,11,1 10,1,3,4 8,8 2,1,4 7,9 11 9,3,
57 Definition of L() nd d() For every node u whih my e over-mered, there exist two leves 2i nd 2j-1 suh tht u=l(2i, 2j-1). Denote L(u) e the orret depth of u, tht is, lp(2i,2j-1). Note tht lp(2i,2j-1) = 1+lp(2i+1,2j) if S[2i]=S[2j-1]; 0 otherwise. Let v e l(2i+1,2j). Denote d(u) = v. Note tht d() is equivlent to suffix link!
58 Exmple of d(),1,1,1 13,1,2,1,1 d() 12,11,1 10,1,3,4 8,8 2,1 1,4 7,9 11 9,3 6 3,3 5 4
59 Reltionship etween L() nd d() Suppose u = l(2i, 2j-1). Note1: if u is not the root, then S[2i]=S[2j-1]. Note2: lp(2i,2j-1) = 1+lp(2i+1,2j) if S[2i]=S[2j-1]; 0 otherwise Note3: d(u) = lp(2i+1,2j) Hene, L(u) = 1 + L(d(u)) if u is not the root. Otherwise, L(u)=0. Lemm: L(u) = the lenth of the purple pth from u to the root.
60 Exmple of L() 13 2,1 12,1 L( )=1,1,1,11,1 10,2,4 7,1 L( )=2,9 L( )=1,1,1,1,3 11 9, L( )=2 L( )=2 L( )=3,4 8,3 5 L( )=2,8 L( )=4 4
61 Unmere the order nodes sed on L() (I) ,1,1 L( )=1,1,1,11,1 10,2,4 7,1 L( )=2,9 L( )=1,1,1,1,3 11 9, L( )=2 L( )=2 L( )=3,4 8,3 5 L( )=2,8 L( )=4 4
62 Unmere the order nodes sed on L() (II),1,1, ,11,1,1,1,2,1,10 10,4,9,1,1,1,1,2 11 9, ,2,2 8,5 5,8 4
63 Time omplexity for merin Mere the tree usin DFS tkes O(n) time. Compute the links d() tkes O(n) time. Compute L() tkes O(n) time. Unmere tkes O(n) time.
64 Totl time omplexity of Frh s lorithm Ste 1: Time(n/2)+O(n) Ste 2: O(n) Ste 3: O(n) Thus, Time(n) = Time(n/2)+O(n). By solvin the eqution, Time(n)= O(n).
65 Disdvnte of suffix tree Suffix tree is spe ineffiient. It requires O(n Σ lo n) its. Mner nd Myers (SIAM J. Comp 1993) proposes new dt struture, lled suffix rry, whih hs similr funtionlity s suffix tree. Moreover, it only requires O(n lo n) its.
66 Suffix Arry (I) It is just sorted suffixes. E.. onsider S = Suffix Position SA[i] Suffix => 3 4 Sort Suffix rry is n rry of n indies. Thus, it tkes O(n lo n) its.
67 Oservtion The leves of suffix tree is in suffix rry order SA[i] Suffix
68 Liner time onstrution of suffix rry from suffix tree Rell tht the suffix tree T of S[1..n] n e onstruted in O(n) time. Then, y lexil depth-first trversl of T, the suffix rry of S is otined. This tkes O(n) time. However, the spe used durin onstrution is the sme s tht for suffix tree! This defets the purpose of suffix rry. Tody, we n uild suffix rry usin O(n) it spe nd O(n) time.
69 rne(t,q) For pttern Q, its ourrenes in T form onseutive SA rne. Exmple: For T=, ours in SA[5] nd SA[6]. Definition: We lled rne(t,q)=[st..ed] if Q is prefix of every T j for j=sa[st], SA[st+1],, SA[ed] where T j = j suffix of T = T[j..n]. Exmple: rne(t,)=[5..6] SA[i] Suffix
70 Find ourrene of query Q in strin S usin suffix rry Input: (1) the suffix rry of strin T of lenth n nd (2) query Q of lenth m Aim: hek if Q ours in T Ide: inry serh!
71 Alorithm
72 Exmple Consider T = Pttern Q = L=1 R=7 M=(L+R)/2=4 i SA[i] Suffix
73 Exmple Consider T = Pttern Q = L=1 R=7 M=(L+R)/2=4 suffix-sa[m] > Q. Set R=M=4. i SA[i] Suffix
74 Exmple Consider T = Pttern Q = L=1 R=4 M=(L+R)/2=2 suffix-sa[m] < Q. Set L=M=2. i SA[i] Suffix
75 Exmple Consider T = i SA[i] Suffix Pttern Q = L=2 R=4 M=(L+R)/2= The pttern Q is found t SA[M]=3.
76 Cn we do etter? Durin eh step of inry serh, we need to ompre Q with suffix usin O(m) time, whih is time onsumin. Cn we do etter? We hve the followin oservtion. Suppose LCP(Q, suffix-sa[l]) is l nd LCP(Q, suffix-sa[r]) is r. Then, LCP(Q, suffix-sa[m]) > min{l,r}. Below, we desrie how to utilize this oservtion to speedup the omputtion.
77 Alorithm
78 Exmple Consider T = Pttern Q = L=1, l=0 R=7, r=0 mlr = min(l,r)=0 M=(L+R)/2=4 i SA[i] Suffix
79 Exmple Consider T = Pttern Q = L=1, l=0 R=7, r=0 mlr = min(l,r)=0 M=(L+R)/2=4, m=1 The (m+1) hr of suffix-sa[m] is. The (m+1) hr of Q is. So, suffix-sa[m] > Q. Set R=M=4 nd r=m=1. i SA[i] Suffix
80 Exmple Consider T = Pttern Q = L=1, l=0 R=4, r=1 mlr = min(l,r)=0 M=(L+R)/2=2, m=3 The (m+1) hr of suffix-sa[m] is. The (m+1) hr of Q is. So, suffix-sa[m] < Q. Set L=M=2 nd l=m=3. i SA[i] Suffix
81 Exmple Consider T = Pttern Q = L=2, l=3 R=4, r=1 mlr = min(l,r)=1 M=(L+R)/2=3, m=4 i SA[i] Suffix The pttern Q is found t SA[M]=3.
82 Time nlysis Binry serh will perform lo n omprisons Eh omprison tkes t most O(m) time In the worst se, O(m lo n) time. Myers nd Mner report tht, in prtie, the time is O(m + lo n).
83 Suffix rry nd suffix tree We show one exmple of replin suffix tree y suffix rry Note tht most pplitions relted to suffix tree n e solved usin suffix rry with some time low up! When spe is limited, replin suffix tree y suffix rry is ood hoie.
84 The size is still too i! Why? DNA sequenes n e very lon! E.. Fly: ~100M ses, Humn: ~3G ses, Tree: ~9G ses Store to store indexin dt struture for humn enome Suffix Tree: ~40G ytes Suffix Arry: ~13G ytes Cn we further redue the spe?
85 Solution Grossi, Vitter (STOC2000) Compressed suffix rry (CSA) Ferrine, Mnzini (FOCS2000) FM-index Both of them n e stored in O(n) it spe For Humn Genome Both CSA nd FM-index n e stored within 2G ytes.
86 FM-index Consider text T= FM-index stores: A. The strin BW= B. C[x] = totl no. of ourrenes of eh symol less thn x. E.. C[]=1, C[]=4, C[]=6, C[t]=7 C. A dt-struture o(x, i) whih tells us the numer of ourrenes of x in BW[1..i] usin O(1) time. SA[i] Suffix T[SA[i]-1]
87 Dt-struture for nswerin the o(x, i) query? BW 1 lo 2 n n/lo 2 n lo n lo n 00x0xxx0x0 Given the text BW[1..n], we divide BW[1..n] into ukets of size lo 2 n. For eh uket i = 1,, n/lo 2 n, we store P[i] = numer of x s in BW[1.. ilo 2 n]. Eh uket is further sudivided into lo n su-ukets of size lo n. For eh su-uket j of the uket i, we store Q[i][j] = numer of x s in the first j su-ukets.
88 Dt-struture for nswerin the o(x, i) query? We lso need lookup tle rnk(,k) is ny strin of lenth (lo n)/2 1 k (lo n)/2 rnk(,k) = numer of x in the first k hrters of (lo n)/ x0x xx x Numer of = 2 lon 2 Eh entry tkes O(lo lo n) its Totl spe = O( lo 2 n entries = n lo 2 n lo n lo lo n) = o( n) its n
89 Spe omplexity of the o() dt-struture P[1..n/lo 2 n] uses O(n/lo n) its Q[1..n/lo 2 n][1..lo n] uses O(n lo lo n / lo n) its rnk(,k) uses 2 lon/2 (lo n/2) = o(n) its In totl, we use O(n lo lo n / lo n) its.
90 How to ompute o(x,i)? BW 1 lo 2 n n/lo 2 n lo n lo n 00x0xxx0x0 Suppose lo n = 10. To ompute o(x, 327), The result is P[3]+Q[4][2]+rnk(00x0x,5)+rnk(xx0x0,2) Hene, O(1) time to ompute o(x,i).
91 Size of FM-index Struture A n e store in 2n its Struture B n e store in O(lo n) its Struture C n e store in O(n lo lo n/lo n) its In totl, the size of FM-index is O(n) its.
92 Oservtion C[x]+o(x,i) is the numer of suffixes smller thn of xt SA[i]. Exmple: C[]=4 o(, 6)=2 Numer of suffixes smller thn T[SA[6]..n]= is 6. SA[i] Suffix T[SA[i]-1]
93 Lemm Suppose rne(t,q) is [st..ed]. Then, rne(t,xq) = [p..q] where p = C[x] + o(x,st-1) + 1 q = C[x] + o(x,ed) Proof: p = 1+numer of suffixes stritly smller thn xq. The ltter term = numer of suffixes smller thn or equl to xt SA[st-1]. q = numer of suffixes smller thn or equl to xt SA[ed].
94 Bkwrd serh Given the text T nd the FM-index, we wnt to determine if Q exists in T. Alorithm BW_exist(Q[1..m]) 1. x=q[m], i=m-1; 2. /* find rne(t,q[m]) */ st = C[x]+1, ed = C[x+1]; 3. while (st ed nd i>1) { /* find rne(t, Q[i-1..m]) */ x = Q[i-1]; st = C[x] + o(x, st-1) + 1; ed = C[x] + o(x, ed); i = i 1; } 4. if st > ed, then pttern not found else pttern found.
95 Exmple T = Q = Q[3..3]= sp=c[] +1 =1+1=2 ep=c[] =4 SA[i] Suffix T[SA[i]-1]
96 Exmple T = Q = Q[2..3]= st=c[]+o(,st old -1)+1 =4+0+1=5 ed=c[]+o(,ed old ) =4+2=6 SA[i] Suffix T[SA[i]-1]
97 Exmple T = Q = Q[1..3]= st=c[]+o(,st old -1)+1 =1+0+1=2 ed=c[]+o(,ed old ) =1+2=3 SA[i] Suffix T[SA[i]-1]
98 Exmple T = Q = Q[1..3]= st=c[]+o(,st old -1)+1 =1+0+1=2 ed=c[]+o(,ed old ) =1+2=3 Q ourrenes in T! SA[i] Suffix T[SA[i]-1]
99 Time omplexity of Bkwrd Serh To find pttern Q[1..m] Step 1, 2, nd 4 n e omputed in O(1) time. For step 3, We need to iterte the loop for m-1 times. Eh itertion of the loop n e omputed in O(1) time. The loop tkes O(m) time. In totl, O(m) time for kwrd serh.
100 Conlusion Suffix tree is powerful dt-struture whih hs lot of pplitions in Computtionl Bioloy. Prolems: Suffix tree is too i! Cn e solved usin CSA nd FM-index Suffix tree n only solve ext mth prolem! (Most of the ioloy prolems re pproximte mth!) Mny works hve een done on this re! But still not prtil. One of the importnt re to explore!
101 1-mismth prolem
102 1-mismth prolem Index: the suffix tree of text T[1..n] For ny pttern P[1..m], the 1-mismth prolem finds ll ourrenes of P in T tht hs hmmin distne t most 1. Exmple: P = ACGT T = AACGTGGCCAACTTGGA
103 Nïve solution Index: rete the suffix tree for T Alorithm for query: Generte ll possile 1-mismth ptterns of P. Find ourrenes of every 1-mismth pttern Runnin time: There re Σ m possile 1-mismth ptterns. Usin suffix tree, it tkes O(m) time to find ourrenes of eh 1-mismth pttern In totl, O(m 2 +o) time where o is totl numer of ourrenes.
104 Any other solutions? Cos [CPM 1995] O(n lo n) it index, O(m 2 +o) query time. Amir et l [Journl of Alorithm 2000] O(n lo 3 n) it index, O(m lo n lolo n + o) query time. Buhsum et l [ESA 2000] O(n lo 2 n) it index, O(m lo lo n + o) query time. Cole et l [SODA 2004] O(n lo 2 n) it index, O(m + lo lo n + o) query time. Trinh et l [CPM 2004] O(n lo n) it index, O(m lo n + o) query time. Lm et l [ISAAC 2005] O(n sqrt(lo n)) it index, O(m lo lo n + o) query time. Chn et l [ESA 2006] O(n lo n) it index, O(m + lo lo n + o) query time. Tody, we hve look of the solution of Trinh et l.
105 Index Suffix rry of T SA[1..n] Inversed suffix rry of T SA -1 [1..n] where SA[SA -1 [i]] = i. Definition: We lled rne(t,p)=[st..ed] if SA[i] 7 Suffix P is prefix of every T j for j=sa[st], SA[st+1],, SA[ed] where T j = j suffix of T = T[j..n]. Exmple: For T=, rne(t,)=[5..6] i
106 Lemm 1 (Forwrd serh) Assume [st..ed]=rne(t,p). We n ompute [st..ed ]=rne(t,p) in O(lo n) time. Proof: inry serh on SA.
107 Lemm 2 Assume [st 1..ed 1 ]=rne(t,p 1 ) nd [st 2..ed 2 ]=rne(t,p 2 ). We n ompute [st..ed]=rne(t,p 1 P 2 ) in O(lo n) time. Proof: Let the lenth of P 1 e k. Note tht T SA[st1],T SA[st1+1],,T SA[ed1] re lexiorphilly inresin. Hene, T SA[st1]+k,T SA[st1+1]+k,,T SA[ed1]+k re lexiorphilly inresin. Thus, SA -1 [SA[st 1 ]+k] < SA -1 [SA[st 1 +1]+k] < < SA -1 [SA[ed 1 ]+k]. To find st nd ed, we need to find the smllest st suh tht st 2 < SA -1 [SA[st]+k]<ed 2 nd the lrest ed suh tht st 2 < SA -1 [SA[ed]+k] < ed 2. This n e done y inry serh.
108 Exmple T= P 1 =, P 2 =. SA[i] 7 rne(t,p 1 )=[2..4], rne(t,p 2 )=[5..6] 7 To find rne(t,p 1 P 2 ), we do the followin: Note tht T SA[2] <T SA[3] <T SA[4] re einnin with. So, T SA[2]+1 <T SA[3]+1 <T SA[4]+1. Note tht 6 SA -1 [SA[2]+1]=5, SA -1 [SA[3]+1]=6, SA -1 [SA[4]+1]=7. Hene, T SA[5] <T SA[6] <T SA[7]. Amon the three suffixes, we need to identify suffix einnin with P 2. Sine rne(t,p 2 )=[5..6], oth T SA[5] nd T SA[6] ontin P 2. As SA -1 [SA[2]+1]=5 nd SA -1 [SA[3]+1]=6, we hve rne(t,p 1 P 2 ) = [2..3]. i Suffix
109 Lemm 3 (Bkwrd serh) Assume [st..ed]=rne(t,p). We n ompute [st..ed ]=rne(t,p) in O(lo n) time. Proof: Let P 1 =, P 2 =P. By Lemm 2, rne(t,p 1 P 2 ) n e omputed in O(lo n) time.
110 Alorithm 1. For j=m,m-1,,1, By kwrd serh, find rne(t,p[j..m]). 2. For j=1,2,,m, By forwrd serh, find rne(t,p[1..j]). 3. Report ll ourrenes of rne(t,p[1..m]) 4. For j=1,2,,m, Let P 1 =P[1..j-1], P 2 =P[j+1..m] For every hrter P[j], By forwrd serh, find rne(t,p 1 ) By Lemm 2, find rne(t,p 1 P 2 ) Report ll ourrenes of rne(t,p 1 P 2 )
111 Time nlysis Inorin the O(o) reportin time! Steps 1 nd 2 tke O(m lo n) time. Step 4 tries ll possile mismthes. There re in totl Σ m mismthes. For eh mismth, it tke O(lo n) time. So, Step 4 tkes O( Σ m lo n) time. In totl, the runnin time is O( Σ m lo n + o). Note tht this solution n e enerlized to hndle k-mismth or k-differene.
Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:
Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer
More informationGlobal alignment. Genome Rearrangements Finding preserved genes. Lecture 18
Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re
More informationFast index for approximate string matching
Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit
More informationChapter 4 State-Space Planning
Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different
More informationComputational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh
Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now
More informationNON-DETERMINISTIC FSA
Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is
More informationCS 573 Automata Theory and Formal Languages
Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple
More informationAlgorithms for bioinformatics Part 2: Data structures
Alorithms for bioinformtics Prt 2: Dt structures Greory Kucherov LIGM/CNRS Mrne-l-Vllée Pln Clssicl indexes Suffix trees DAWG nd Position heps Suffix rrys Succinct (compressed) indexes Burrows-Wheeler
More informationIntermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths
Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t
More informationCommon intervals of genomes. Mathieu Raffinot CNRS LIAFA
Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr
More informationLecture 6: Coding theory
Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those
More informationCS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6
CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized
More informationData Structures and Algorithm. Xiaoqing Zheng
Dt Strutures nd Algorithm Xioqing Zheng zhengxq@fudn.edu.n String mthing prolem Pttern P ours with shift s in text T (or, equivlently, tht pttern P ours eginning t position s + in text T) if T[s +... s
More information22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:
22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)
More informationPrefix-Free Regular-Expression Matching
Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15 Pttern Mthing Given pttern P nd text T, find ll sustrings
More informationTechnische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution
Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:
More informationLinear Algebra Introduction
Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +
More informationOn-Line Construction. of Suffix Trees. Overview. Suffix Trees. Notations. goo. Suffix tries
On-Line Cnstrutin Overview Suffix tries f Suffix Trees E. Ukknen On-line nstrutin f suffix tries in qudrti time Suffix trees On-line nstrutin f suffix trees in liner time Applitins 1 2 Suffix Trees A suffix
More informationCounting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs
Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if
More informationFinite State Automata and Determinisation
Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions
More informationSection 1.3 Triangles
Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior
More informationPAIR OF LINEAR EQUATIONS IN TWO VARIABLES
PAIR OF LINEAR EQUATIONS IN TWO VARIABLES. Two liner equtions in the sme two vriles re lled pir of liner equtions in two vriles. The most generl form of pir of liner equtions is x + y + 0 x + y + 0 where,,,,,,
More informationQUADRATIC EQUATION. Contents
QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,
More informationIntroduction to Olympiad Inequalities
Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................
More informationNondeterministic Automata vs Deterministic Automata
Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n
More informationData Structures LECTURE 10. Huffman coding. Example. Coding: problem definition
Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt
More informationGrammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages
5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive
More informationIntermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4
Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one
More informationSuffix Trays and Suffix Trists: Structures for Faster Text Indexing
Suffix Trys nd Suffix Trists: Strutures for Fster Text Indexing Rihrd Cole Tsvi Kopelowitz Moshe Lewenstein rxiv:1311.1762v1 [s.ds] 7 Nov 2013 Astrt Suffix trees nd suffix rrys re two of the most widely
More informationData Structures (INE2011)
Dt Strutures (INE2011) Eletronis nd Communition Engineering Hnyng University Hewoon Nm Leture 7 INE2011 Dt Strutures 1 Binry Tree Trversl Mny inry tree opertions re done y perorming trversl o the inry
More informationSolutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!
Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:
More information= state, a = reading and q j
4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those
More informationCS241 Week 6 Tutorial Solutions
241 Week 6 Tutoril olutions Lnguges: nning & ontext-free Grmmrs Winter 2018 1 nning Exerises 1. 0x0x0xd HEXINT 0x0 I x0xd 2. 0xend--- HEXINT 0xe I nd ER -- MINU - 3. 1234-120x INT 1234 INT -120 I x 4.
More informationCS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted
More informationDiscrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α
Disrete Strutures, Test 2 Mondy, Mrh 28, 2016 SOLUTIONS, VERSION α α 1. (18 pts) Short nswer. Put your nswer in the ox. No prtil redit. () Consider the reltion R on {,,, d with mtrix digrph of R.. Drw
More informationFirst Midterm Examination
24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet
More informationComparing the Pre-image and Image of a Dilation
hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity
More informationwhere the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b
CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}
More informationCMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014
CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA
More informationA Study on the Properties of Rational Triangles
Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn
More informationFigure 1. The left-handed and right-handed trefoils
The Knot Group A knot is n emedding of the irle into R 3 (or S 3 ), k : S 1 R 3. We shll ssume our knots re tme, mening the emedding n e extended to solid torus, K : S 1 D 2 R 3. The imge is lled tuulr
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationCS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014
S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown
More informationLecture 08: Feb. 08, 2019
4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny
More informationLecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations
Leture XML into RDBMS XML n Dtses Sestin Mneth NICTA n UNSW Leture XML Into RDBMS CSE@UNSW -- Semester, 00 Memory Representtions Memory Representtions Fts DOM is esy to use, ut memory hevy. in-memory size
More informationPre-Lie algebras, rooted trees and related algebraic structures
Pre-Lie lgers, rooted trees nd relted lgeri strutures Mrh 23, 2004 Definition 1 A pre-lie lger is vetor spe W with mp : W W W suh tht (x y) z x (y z) = (x z) y x (z y). (1) Exmple 2 All ssoitive lgers
More informationProject 6: Minigoals Towards Simplifying and Rewriting Expressions
MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy
More informationXML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW
XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr
More informationDiscrete Structures Lecture 11
Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.
More informationMinimal DFA. minimal DFA for L starting from any other
Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA
More information1 From NFA to regular expression
Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work
More informationCS 275 Automata and Formal Language Theory
CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)
More information1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.
1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationArrow s Impossibility Theorem
Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep
More informationCSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4
Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones
More informationA Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version
A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment
More informationThe Word Problem in Quandles
The Word Prolem in Qundles Benjmin Fish Advisor: Ren Levitt April 5, 2013 1 1 Introdution A word over n lger A is finite sequene of elements of A, prentheses, nd opertions of A defined reursively: Given
More information11/3/13. Indexing techniques. Short-read mapping software. Indexing a text (a genome, etc) Some terminologies. Hashing
I9 Introdution to Bioinformtis, 0 Indeing tehniques Yuzhen Ye (yye@indin.edu) Shool of Informtis & Computing, IUB Contents We hve seen indeing tehnique used in BLAST Applitions tht rely on n effiient indeing
More informationChapter 2 Finite Automata
Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht
More informationActivities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions
MEP: Demonstrtion Projet UNIT 4: Trigonometry UNIT 4 Trigonometry tivities tivities 4. Pythgors' Theorem 4.2 Spirls 4.3 linometers 4.4 Rdr 4.5 Posting Prels 4.6 Interloking Pipes 4.7 Sine Rule Notes nd
More informationHow do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?
XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=
More informationSystem Validation (IN4387) November 2, 2012, 14:00-17:00
System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise
More informationCS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018
CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA
More informationCoalgebra, Lecture 15: Equations for Deterministic Automata
Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined
More informationCS 491G Combinatorial Optimization Lecture Notes
CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationMatrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24
Mtrix lger Mtrix ddition, Sclr Multipliction nd rnsposition Mtrix lger Section.. Mtrix ddition, Sclr Multipliction nd rnsposition rectngulr rry of numers is clled mtrix ( the plurl is mtrices ) nd the
More informationNondeterministic Finite Automata
Nondeterministi Finite utomt The Power of Guessing Tuesdy, Otoer 4, 2 Reding: Sipser.2 (first prt); Stoughton 3.3 3.5 S235 Lnguges nd utomt eprtment of omputer Siene Wellesley ollege Finite utomton (F)
More informationTransition systems (motivation)
Trnsition systems (motivtion) Course Modelling of Conurrent Systems ( Modellierung neenläufiger Systeme ) Winter Semester 2009/0 University of Duisurg-Essen Brr König Tehing ssistnt: Christoph Blume In
More informationNFAs continued, Closure Properties of Regular Languages
lgorithms & Models of omputtion S/EE 374, Spring 209 NFs continued, losure Properties of Regulr Lnguges Lecture 5 Tuesdy, Jnury 29, 209 Regulr Lnguges, DFs, NFs Lnguges ccepted y DFs, NFs, nd regulr expressions
More informationAlgorithm Design and Analysis
Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,
More informationMaintaining Mathematical Proficiency
Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +
More informationCompiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz
University of Southern Cliforni Computer Siene Deprtment Compiler Design Spring 7 Lexil Anlysis Smple Exerises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sienes Institute 47 Admirlty Wy, Suite
More informationDynamic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions Dynmic Fully-Compressed Suffix Trees Luís M. S. Russo Gonzlo Nvrro Arlindo L. Oliveir INESC-ID/IST {lsr,ml}@lgos.inesc-id.pt Dept. of Computer Science, University of
More informationCSE 401 Compilers. Today s Agenda
CSE 401 Compilers Leture 3: Regulr Expressions & Snning, on?nued Mihel Ringenurg Tody s Agend Lst?me we reviewed lnguges nd grmmrs, nd riefly strted disussing regulr expressions. Tody I ll restrt the regulr
More informationFirst Midterm Examination
Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does
More informationConnected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs
Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini
More informationSurface maps into free groups
Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The
More informationTHE PYTHAGOREAN THEOREM
THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this
More informationMid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours
Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into
More informationCHENG Chun Chor Litwin The Hong Kong Institute of Education
PE-hing Mi terntionl onferene IV: novtion of Mthemtis Tehing nd Lerning through Lesson Study- onnetion etween ssessment nd Sujet Mtter HENG hun hor Litwin The Hong Kong stitute of Edution Report on using
More informationAVL Trees. D Oisín Kidney. August 2, 2018
AVL Trees D Oisín Kidne August 2, 2018 Astrt This is verified implementtion of AVL trees in Agd, tking ides primril from Conor MBride s pper How to Keep Your Neighours in Order [2] nd the Agd stndrd lirr
More information5. Every rational number have either terminating or repeating (recurring) decimal representation.
CHAPTER NUMBER SYSTEMS Points to Rememer :. Numer used for ounting,,,,... re known s Nturl numers.. All nturl numers together with zero i.e. 0,,,,,... re known s whole numers.. All nturl numers, zero nd
More informationFarey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University
U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions
More informationFactorising FACTORISING.
Ftorising FACTORISING www.mthletis.om.u Ftorising FACTORISING Ftorising is the opposite of expning. It is the proess of putting expressions into rkets rther thn expning them out. In this setion you will
More informationProving the Pythagorean Theorem
Proving the Pythgoren Theorem W. Bline Dowler June 30, 2010 Astrt Most people re fmilir with the formul 2 + 2 = 2. However, in most ses, this ws presented in lssroom s n solute with no ttempt t proof or
More informationLing 3701H / Psych 3371H: Lecture Notes 9 Hierarchic Sequential Prediction
Ling 3701H / Psyh 3371H: Leture Notes 9 Hierrhi Sequentil Predition Contents 9.1 Complex events.................................... 1 9.2 Reognition of omplex events using event frgments................
More informationIntroduction to Bioinformatics
Introdution to Bioinformtis Outline } Method without onsidering bkground distribution } Generl pproh onsidering bkground distribution } Wys to speed up the lgorithm Trnsription Ftor Binding Sites (TFBSs)
More information8 THREE PHASE A.C. CIRCUITS
8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),
More informationThe University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS
The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their
More informationAlgorithm Design and Analysis
Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing
More informationCS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions
CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More information12.4 Similarity in Right Triangles
Nme lss Dte 12.4 Similrit in Right Tringles Essentil Question: How does the ltitude to the hpotenuse of right tringle help ou use similr right tringles to solve prolems? Eplore Identifing Similrit in Right
More informationAP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals
AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into
More informationConvert the NFA into DFA
Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:
More informationPreview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms
Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,
More informationComputing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt
Computing dt with spredsheets Exmple: Computing tringulr numers nd their squre roots. Rell, we showed 1 ` 2 ` `n npn ` 1q{2. Enter the following into the orresponding ells: A1: n B1: tringle C1: sqrt A2:
More information