SA-REPC - Sequence Alignment with a Regular Expression Path Constraint
|
|
- Reynold McCormick
- 5 years ago
- Views:
Transcription
1 SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
2 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint 3 Applying SA-REPC to microrna target prediction 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
3 Michal s group Michal Ziv-Ukelson Tamar Pinhas Isana Vaksler Noa Mussa Sivan Yogev Shay Zakov Erez Katzenelson Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
4 Topics of interest in our group Sequence and tree alignments and similarity Indexing, searching and compression Secondary structure prediction of RNA: folding and co- folding. microrna-mrna target prediction Sequence/structure motifs involved in localization and post-transcriptional regulation Post-transcriptional regulation: virus-host micro RNA- mrna behavior Protein motif discovery (common signals within family) Algorithms on Strings and Trees More... Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
5 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
6 MTP: Manhattan Tourist Problem s a a a a a a a a a a a a a a a a a a a a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
7 MTP: Manhattan Tourist Problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
8 MTP: Manhattan Tourist Problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
9 Manhattan Tourist Problem: Formulation Goal Find the highest scoring path in a weighted grid. Input A weighted grid G with two distinct vertices, one labeled source and the other labeled sink. Ouput Output: A longest path in G from source to sink Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
10 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
11 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Computing the score for a point (i,j) by the recurrence relation: S 0,0 = 0 { } Si 1,j + score of the edge between(i 1, j)and(i, j) S i,j = max S i,j 1 + score of the edge between(i, j 1)and(i, j) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
12 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Computing the score for a point (i,j) by the recurrence relation: S 0,0 = 0 { } Si 1,j + score of the edge between(i 1, j)and(i, j) S i,j = max S i,j 1 + score of the edge between(i, j 1)and(i, j) Running time The running time of the above formula for a grid of size n m is: O(n m) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
13 Example 1 10 s a a a a 2 3 * a a a a S 1,0 = S 0,0 + 2 = a a a a a a a a a a 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
14 Example 1 10 s a a a a 2 3 a * a a a a a a a a a a a a a S 1,0 = S 0,0 + 2 = S 1,1 = max(s 0,1 + 0, S 1,0 + 3) = max(1 + 0, 2 + 3) 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
15 Extending the MTP problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
16 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. a a a a a a a a a a 3 a a a a a 3.12 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
17 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. Adding diagonal movement (edges in the graph). a a a a a a a a a a 3 a a a a a 3.12 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
18 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. Adding diagonal movement (edges in the graph). a a a a a a a a a a 3 a a a a a 3.12 a a a a t S i,j = max S i 1,j + score of the edge between(i 1, j)and(i, j) S i,j 1 + score of the edge between(i, j 1)and(i, j) S i 1,j 1 + score of the edge between(i 1, j 1)and(i, j) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
19 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
20 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
21 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
22 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. A sequence alignment is obtained by inserting gaps into S 1 and S 2, so that the symbols can be placed in one-to-one correspondence with each other. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
23 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. A sequence alignment is obtained by inserting gaps into S 1 and S 2, so that the symbols can be placed in one-to-one correspondence with each other. The optimal global sequence alignment is a sequence alignment that has the optimal sum of scores, according to s, over the pairs of symbols that correspond to each other in the alignment. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
24 Sequence Alignment example Example S 1 = AGCGCGUU S 2 = GUCAGACG Example A G C G C G U U G U C A G A C G Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
25 Sequence Alignment example Example S 1 = AGCGCGUU S 2 = GUCAGACG The scoring matrix s to be -1 for mismatch/indel (space), 1 for match. Example A G C G C G U U G U C A G A C G An optimal alignment of S 1 and S 2 is scored -1. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
26 Adding some sequences to the grid We can extend the grid to represent an alignment between two sequences in the following way: We create a grid with size S S vertices. The additional row / column is for the gap sign ( - ). The scores on the edges will be as follows: - j j+1 s[ -,S 2 [j]] i a a s[s 1 [i], - ] s[s 1 [i], S 2 [j]] i+1 a a Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
27 A G C G C G U U G U C A G A C G s t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
28 A G C G C G U U G U C A G A C G s t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
29 The alignment table S - G U C A G A C G A G C G C G U U S 1 = AGCGCGUU S 2 = GUCAGACG Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
30 The alignment table S - G U C A G A C G S 1 = AGCGCGUU S 2 = GUCAGACG A G C A G G U C C A G G A C C G G U U G C G U U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
31 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
32 Constraint Sequence Alignment Numerous studies suggest the application of additional constraints to sequence alignment for the purpose of improved speed or accuracy. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
33 Constraint Sequence Alignment Numerous studies suggest the application of additional constraints to sequence alignment for the purpose of improved speed or accuracy. The additional constraints can reflect a priori knowledge of the alignment and, therefore, narrows the problem search space or guides the search towards a preferred solution. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
34 Related Work Position anchoring [Myers-96, Sammeth-03] Demanding that the path will pass in certain cells in the table. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
35 Related Work Spaced seeds [Ma-02, Kucherov-05, Benson-06] Constraint on the path in the form of a partial word. Partial words are alignments based on letters 1 (match) and * (dont-care). For example: 11*11* will allow and also Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
36 Related Work Regular Expression Constraint Sequence Alignment (RECSA) [Arslan-05] Each string should satisfy a regular expression constraint. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
37 Related Work SA-REPC Constraint on the path in the form of a regular expression. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
38 Related Work Position anchoring [Myers-96, Sammeth-03] Spaced seeds [Ma-02, Kucherov-05, Benson-06] Regular Expression Constraint Sequence Alignment (RECSA) [Arslan-05] SA-REPC Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
39 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
40 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
41 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Example The constraint is in the form of a regular expression. S 1 = AGCGCGUU S 2 = GUCAGACG R = (1 - match, 0 - everything else) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
42 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Example The constraint is in the form of a regular expression. S 1 = AGCGCGUU S 2 = GUCAGACG R = (1 - match, 0 - everything else) A G C G C G U U G U C A G A C G Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
43 Preliminaries - Alignment alphabet examples Σ r = {1, 0} 1 match 0 any other Example The letters A and A are mapped to 1. U and are mapped to 0. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
44 Preliminaries - Alignment alphabet examples Σ r = {m, s, i, d} m s i d match substitution insertion deletion Example The letters A and A are mapped to m. U and are mapped to d. and A are mapped to i. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
45 Preliminaries - Alignment alphabet examples Σ r = { σ1 σ σ 1, σ 2 Σ } { { } \ } 2 Each letter is mapped to a different symbol in the alignment alphabet Example The letters A and U are mapped to A U in the alignment alphabet and A, to A -. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
46 Preliminaries - Alignment alphabet examples Because some Σ r symbols can be mapped from different symbols in Σ we need a mapping function f defined as: f : Σ Σ P(Σ r ) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
47 Preliminaries - Alignment alphabet examples Because some Σ r symbols can be mapped from different symbols in Σ we need a mapping function f defined as: f : Σ Σ P(Σ r ) Example In Σ r = {0, 1} { σ 1 σ 2 σ 1, σ 2 Σ { } f (A, A) = {1, A A }, f (A, U) = {0, A U } } { \ }: Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
48 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
49 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. R a regular expression over an alignment alphabet Σ r. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
50 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. R a regular expression over an alignment alphabet Σ r. Definition Find an alignment of S 1 and S 2 such that two conditions hold: 1 There exists an accepted region in the alignment belonging to L R. 2 The overall score of the alignment, computed according to s, is optimal among all such alignments. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
51 Sequence Alignment vs. SA-REPC Example (input) S 1 = AGCGCGUU S 2 = GUCAGACG s be a scoring matrix: match +1, all other -1. Example (Sequence Alignment) A G C G C G U U G U C A G A C G Optimal alignment value = -1 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
52 Sequence Alignment vs. SA-REPC Example (input) S 1 = AGCGCGUU S 2 = GUCAGACG s be a scoring matrix: match +1, all other -1. R = Example (Sequence Alignment) Example (SA-REPC ) A G C G C G U U A G C G C G U U G U C A G A C G Optimal alignment value = -1 G U C A G A C G Optimal alignment value = -3 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
53 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
54 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
55 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Automaton - A R 1 0 q 0 0 / start q 1 q 2 q 3 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
56 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Automaton - A R 1 0 q 0 0 / start q 1 q 2 q 3 Built Automaton - A Σ 1 0 Σ start q init ɛ 0 / q 0 q 1 q 2 q 3 ɛ q final Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
57 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
58 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
59 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
60 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q If no such alignment suffix exists, then the value of the entry M[i, j](q) is null Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
61 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q If no such alignment suffix exists, then the value of the entry M[i, j](q) is null The answer is in M[n, m](q final ). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
62 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
63 Single cell calculation The calculation of a single cell M[i, j] under the assumptions: S 1 [i] = S 2 [j] = C s[ C, C ] = 1 s[ C, - ] = s[ -, C ] = 0 Σ 1 0 Σ A = start q init ɛ 0 / q 0 q 1 q 2 q 3 ɛ q final Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
64 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
65 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i 1, j 1](q 0 ) + s[ C, C ] = M[i 1, j 1](q 0 ) + 1 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
66 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i 1, j](q 0 ) + s[ C, - ] = M[i 1, j](q 0 ) + 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
67 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i, j 1](q 0 ) + s[ -, C ] = M[i, j 1](q 0 ) + 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
68 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
69 Complexity analysis We denote: t = Q the number of states in A. n to be the length of S 1 m to be the length of S 2 Method Trace Time (NFA) Time (DFA) Memory naïve O(mnt 2 ) O(mnt) O(mnt) naïve O(mnt 2 ) O(mnt) O(min{m, n}t) Hirschberg O(mnt 2 ) O(mnt) O(min{m, n}t) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
70 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
71 The Cell Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
72 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
73 The central dogma Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
74 A short movie Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
75 mrna regions Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
76 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
77 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
78 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. A cell function at any given time is determined by the composition of proteins in it. micrornas suppress the translation of RNA to Protein. transcription translation DNA RNA Protein Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
79 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. 3 Operate by binding to complementary sequences on their mrna target (this interaction is called: hybridization). Hybridization is chemical bonding of bases (also called base pairing) A:U G:C G:U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
80 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. 3 Operate by binding to complementary sequences on their mrna target (this interaction is called: hybridization). 4 The complex created by hybridization of the microrna to its mrna target is called a duplex. Figure: picture from Lin et al Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
81 Another short movie Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
82 Hybridization and Sequence alignment Hybridization of two sequences can be solved with the standard sequence alignment framework. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
83 Hybridization and Sequence alignment Hybridization of two sequences can be solved with the standard sequence alignment framework. Example The only difference is the scoring scheme. In sequence alignment a match is when both symbols are the same. In hybridization a match is when the two symbols are complementary. The matching pairs are: A:U, G:C and G:U. C U C G U G A U A C A C U U U G U U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
84 Duplex properties Different properties of the microrna to target duplex were observed, some of which serve as a basis for current microrna target prediction algorithms. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
85 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ). 5 Seed The 5 end of the seed is unpaired or starts with U, and doesn t contain wobble pairs (G:U). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
86 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Seed Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
87 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Multiplicity: micrornas have been shown to be capable of functioning in a collaborative manner. There are two types of multiplicity: microrna microrna1 microrna2 Target Target Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
88 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Multiplicity: micrornas have been shown to be capable of functioning in a collaborative manner. 4 Accessibility and Thermodynamics: Thermodynamics and accessibility of the duplex and its surroundings area are very important properties. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
89 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
90 Using the current dogma on duplexes Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
91 Utilizing SA-REPC for microrna target prediction Some properties of the duplex can be written as a regular expression constraint. 5 -end dominant seed: ( i A G A A A C ) WCB 5 7 ii (WCB) 6 Where: WCB = 3 -end compensatory seed: ( G C C G A U U A ) s 0 2 Inner buldge of the duplex: ( i 1 4 d 1 6)? ( 11 + ( i 1 4 d 1 6)) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
92 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
93 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
94 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. The complexity of such computations ranges from O(nm 2 ) [Stadler-06] (with restrictions) and up to O(nm 5 ) [Hofacker-08]. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
95 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. The complexity of such computations ranges from O(nm 2 ) [Stadler-06] (with restrictions) and up to O(nm 5 ) [Hofacker-08]. We suggest using our method as an initial filter for target prediction tools that rely on energy computation. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
96 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
97 Target prediction Implementation Implemented the tool in a java package named: calign. A web version is available at: negevcb/calign Our data set 99 micrornas UTRs of human genes (2183 transcripts). 873 verified duplexes from mirecords. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
98 Comparative Results Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
99 Results Tool # of predicted pairs # of True Positives Sensitivity miranda 22, % PITA 28, % RNA hybrid 43, % calign 43, % Table: Results on all 63,360 pairs Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
100 Conclusions Conclusions Extended Sequence alignment to support a path constraint (SA-REPC ). Presented an application for our algorithm. Implemented the algorithm (calign). Showed preliminary comparative results. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
101 Conclusions Conclusions Extended Sequence alignment to support a path constraint (SA-REPC ). Presented an application for our algorithm. Implemented the algorithm (calign). Showed preliminary comparative results. Future work Find more properties of duplexes that can be used in SA-REPC. Find more applications for SA-REPC. Maybe extended to more general language classifications, such as grammars. An interesting open problem might be the application of some of the techniques previously used to obtain sub-quadratic sequence alignment, such as Four Russians and acceleration by compression. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
102 Acknowledgements Special Thanks Tamar Pinhas Co-Author Dr. Michal Ziv-Ukelson My Advisor The rest of Michal s group at BGU Erez Katznelson Isana Vaksler Sivan Yogev Shay Zakov Noa Mussa Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54
RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev
RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC
More informationRNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev
RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend
More informationREDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH
REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH SHAY ZAKOV, DEKEL TSUR, AND MICHAL ZIV-UKELSON Abstract. We study Valiant s classical algorithm for Context
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationVideos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.
Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The
More informationReducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach
RESEARCH Open Access Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach Shay Zakov, Dekel Tsur and Michal Ziv-Ukelson * Abstract Background: RNA secondary
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationDynamic Programming: Edit Distance
Dynamic Programming: Edit Distance Bioinformatics: Issues and Algorithms SE 308-408 Fall 2007 Lecture 10 Lopresti Fall 2007 Lecture 10-1 - Outline Setting the Stage DNA Sequence omparison: First Successes
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationRNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable
RNA STRUCTURE RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U wobble pairing Bases can only pair with one other base. 23 Hydrogen Bonds more stable RNA Basics transfer RNA (trna) messenger
More informationUNIT 5. Protein Synthesis 11/22/16
UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More information98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006
98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.
More informationProtein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.
Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps
More informationUNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r
Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite
More informationTandem Mass Spectrometry: Generating function, alignment and assembly
Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationIn Genomes, Two Types of Genes
In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationProtein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.
Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationLesson Overview. Ribosomes and Protein Synthesis 13.2
13.2 The Genetic Code The first step in decoding genetic messages is to transcribe a nucleotide base sequence from DNA to mrna. This transcribed information contains a code for making proteins. The Genetic
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationRegular expression constrained sequence alignment revisited
Regular expression constrained sequence alignment revisited Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson To cite this version: Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson. Regular expression
More informationProperties of Context-Free Languages
Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationChapter 6.2. p
Chapter 6.2 p. 148-155 Day M T W Th F Question Name Period Weekly Lifeline Using the following template: GTACTTATCGT what is the complementary strand of DNA? B_ Check KICK-OFF LEARNING LOG KICK-OFF Response
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationClosure under the Regular Operations
September 7, 2013 Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have shown this closure
More informationTranslation Part 2 of Protein Synthesis
Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation
More informationTwo Algorithms for LCS Consecutive Suffix Alignment
Two Algorithms for LCS Consecutive Suffix Alignment Gad M. Landau Eugene Myers Michal Ziv-Ukelson Abstract The problem of aligning two sequences A and to determine their similarity is one of the fundamental
More informationCombinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming
ombinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming Matthew Macauley Department of Mathematical Sciences lemson niversity http://www.math.clemson.edu/~macaule/ Math
More informationCSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings
More informationFinite Automata. Seungjin Choi
Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline
More information13 Comparative RNA analysis
13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,
More informationSparse RNA Folding: Time and Space Efficient Algorithms
Sparse RNA Folding: Time and Space Efficient Algorithms Rolf Backofen 1, Dekel Tsur 2, Shay Zakov 2, and Michal Ziv-Ukelson 2 1 Albert Ludwigs University, Freiburg, Germany backofen@informatik.uni-freiburg.de
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationRNA secondary structure prediction. Farhat Habib
RNA secondary structure prediction Farhat Habib RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures
More informationSimulation of Gene Regulatory Networks
Simulation of Gene Regulatory Networks Overview I have been assisting Professor Jacques Cohen at Brandeis University to explore and compare the the many available representations and interpretations of
More informationEfficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/35
Efficient Algorithms for Regular Expression Constrained Sequence Alignment Yun-Sheng Chung, Chin Lung Lu, and Chuan Yi Tang Department of Computer Science National Tsing Hua University, Taiwan Department
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationLecture 7: Simple genetic circuits I
Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma.
More informationPattern Matching (Exact Matching) Overview
CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationNon-context-Free Languages. CS215, Lecture 5 c
Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationLecture 4 Nondeterministic Finite Accepters
Lecture 4 Nondeterministic Finite Accepters COT 4420 Theory of Computation Section 2.2, 2.3 Nondeterminism A nondeterministic finite automaton can go to several states at once. Transitions from one state
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationNUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses
Teacher Page 1 NUMB3RS Activity: DNA Sequence Alignment Topic: Biomathematics DNA sequence alignment Grade Level: 10-12 Objective: Use mathematics to compare two strings of DNA Prerequisite: very basic
More informationJava II Finite Automata I
Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz November, 23 Processing Regular Expressions We already learned about Java s regular expression
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationPredicting RNA Secondary Structure
7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for
More informationMultiple Choice Review- Eukaryotic Gene Expression
Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule
More informationBefore we show how languages can be proven not regular, first, how would we show a language is regular?
CS35 Proving Languages not to be Regular Before we show how languages can be proven not regular, first, how would we show a language is regular? Although regular languages and automata are quite powerful
More informationHidden Markov Models 1
Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one
More informationFinite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University
Finite Automata Wen-Guey Tzeng Computer Science Department National Chiao Tung University Syllabus Deterministic finite acceptor Nondeterministic finite acceptor Equivalence of DFA and NFA Reduction of
More informationChapter 2: Finite Automata
Chapter 2: Finite Automata Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter before
More informationTobias Markus. January 21, 2015
Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationNetwork motifs in the transcriptional regulation network (of Escherichia coli):
Network motifs in the transcriptional regulation network (of Escherichia coli): Janne.Ravantti@Helsinki.Fi (disclaimer: IANASB) Contents: Transcription Networks (aka. The Very Boring Biology Part ) Network
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationFrom Gene to Protein
From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed
More informationCMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013
CMPSCI 250: Introduction to Computation Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013 λ-nfa s to NFA s to DFA s Reviewing the Three Models and Kleene s Theorem The Subset
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationCS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)
CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour
More informationA GENETIC ALGORITHM FOR FINITE STATE AUTOMATA
A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA Aviral Takkar Computer Engineering Department, Delhi Technological University( Formerly Delhi College of Engineering), Shahbad Daulatpur, Main Bawana Road,
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationLesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression
13.4 Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium
More informationName: SBI 4U. Gene Expression Quiz. Overall Expectation:
Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):
More informationSeptember 7, Formal Definition of a Nondeterministic Finite Automaton
Formal Definition of a Nondeterministic Finite Automaton September 7, 2014 A comment first The formal definition of an NFA is similar to that of a DFA. Both have states, an alphabet, transition function,
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationA faster algorithm for RNA co-folding
A faster algorithm for RNA co-folding Michal Ziv-Ukelson 1, Irit Gat-Viks 2, Ydo Wexler 3, and Ron Shamir 4 1 Computer Science Department, Ben Gurion University of the Negev, Beer-Sheva. 2 Computational
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationLecture 5,6 Local sequence alignment
Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution
More informationCS 154 Formal Languages and Computability Assignment #2 Solutions
CS 154 Formal Languages and Computability Assignment #2 Solutions Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak www.cs.sjsu.edu/~mak Assignment #2: Question 1
More informationA Structure-Based Flexible Search Method for Motifs in RNA
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 7, 2007 Mary Ann Liebert, Inc. Pp. 908 926 DOI: 10.1089/cmb.2007.0061 A Structure-Based Flexible Search Method for Motifs in RNA ISANA VEKSLER-LUBLINSKY,
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationTheory Bridge Exam Example Questions
Theory Bridge Exam Example Questions Annotated version with some (sometimes rather sketchy) answers and notes. This is a collection of sample theory bridge exam questions. This is just to get some idea
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationCSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm
CSCI 1760 - Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm Shay Mozes Brown University shay@cs.brown.edu Abstract. This report describes parallel Java implementations of
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationComment: The induction is always on some parameter, and the basis case is always an integer or set of integers.
1. For each of the following statements indicate whether it is true or false. For the false ones (if any), provide a counter example. For the true ones (if any) give a proof outline. (a) Union of two non-regular
More informationMathematics for linguists
1/13 Mathematics for linguists Gerhard Jäger gerhard.jaeger@uni-tuebingen.de Uni Tübingen, WS 2009/2010 November 26, 2009 2/13 The pumping lemma Let L be an infinite regular language over a finite alphabete
More informationMidterm 2 for CS 170
UC Berkeley CS 170 Midterm 2 Lecturer: Gene Myers November 9 Midterm 2 for CS 170 Print your name:, (last) (first) Sign your name: Write your section number (e.g. 101): Write your sid: One page of notes
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationCombinatorial approaches to RNA folding Part I: Basics
Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2015 M. Macauley (Clemson)
More information