SA-REPC - Sequence Alignment with a Regular Expression Path Constraint

Size: px
Start display at page:

Download "SA-REPC - Sequence Alignment with a Regular Expression Path Constraint"

Transcription

1 SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

2 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint 3 Applying SA-REPC to microrna target prediction 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

3 Michal s group Michal Ziv-Ukelson Tamar Pinhas Isana Vaksler Noa Mussa Sivan Yogev Shay Zakov Erez Katzenelson Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

4 Topics of interest in our group Sequence and tree alignments and similarity Indexing, searching and compression Secondary structure prediction of RNA: folding and co- folding. microrna-mrna target prediction Sequence/structure motifs involved in localization and post-transcriptional regulation Post-transcriptional regulation: virus-host micro RNA- mrna behavior Protein motif discovery (common signals within family) Algorithms on Strings and Trees More... Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

5 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

6 MTP: Manhattan Tourist Problem s a a a a a a a a a a a a a a a a a a a a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

7 MTP: Manhattan Tourist Problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

8 MTP: Manhattan Tourist Problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a Imagine seeking a path (from source to sink) to travel on (going only eastward and southward) with the highest number of attractions on it, marked by weights on the streets (edges) in a Manhattan grid. 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

9 Manhattan Tourist Problem: Formulation Goal Find the highest scoring path in a weighted grid. Input A weighted grid G with two distinct vertices, one labeled source and the other labeled sink. Ouput Output: A longest path in G from source to sink Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

10 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

11 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Computing the score for a point (i,j) by the recurrence relation: S 0,0 = 0 { } Si 1,j + score of the edge between(i 1, j)and(i, j) S i,j = max S i,j 1 + score of the edge between(i, j 1)and(i, j) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

12 MTP solution using Dynamic programming Each vertex s score is the maximum of the prior vertices score plus the weight of the respective edge in between Computing the score for a point (i,j) by the recurrence relation: S 0,0 = 0 { } Si 1,j + score of the edge between(i 1, j)and(i, j) S i,j = max S i,j 1 + score of the edge between(i, j 1)and(i, j) Running time The running time of the above formula for a grid of size n m is: O(n m) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

13 Example 1 10 s a a a a 2 3 * a a a a S 1,0 = S 0,0 + 2 = a a a a a a a a a a 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

14 Example 1 10 s a a a a 2 3 a * a a a a a a a a a a a a a S 1,0 = S 0,0 + 2 = S 1,1 = max(s 0,1 + 0, S 1,0 + 3) = max(1 + 0, 2 + 3) 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

15 Extending the MTP problem 1 10 s a a a a 2 3 a a a a a a a a a a a a a a a 3 a a a a a 3 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

16 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. a a a a a a a a a a 3 a a a a a 3.12 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

17 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. Adding diagonal movement (edges in the graph). a a a a a a a a a a 3 a a a a a 3.12 a a a a t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

18 Extending the MTP problem 1 10 s a a a a a a a a a Changing the scores to real numbers. Adding diagonal movement (edges in the graph). a a a a a a a a a a 3 a a a a a 3.12 a a a a t S i,j = max S i 1,j + score of the edge between(i 1, j)and(i, j) S i,j 1 + score of the edge between(i, j 1)and(i, j) S i 1,j 1 + score of the edge between(i 1, j 1)and(i, j) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

19 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

20 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

21 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

22 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. A sequence alignment is obtained by inserting gaps into S 1 and S 2, so that the symbols can be placed in one-to-one correspondence with each other. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

23 Sequence alignment Definition (Global sequence alignment problem) S 1 and S 2 two strings over an alphabet Σ. s a scoring matrix. A sequence alignment is obtained by inserting gaps into S 1 and S 2, so that the symbols can be placed in one-to-one correspondence with each other. The optimal global sequence alignment is a sequence alignment that has the optimal sum of scores, according to s, over the pairs of symbols that correspond to each other in the alignment. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

24 Sequence Alignment example Example S 1 = AGCGCGUU S 2 = GUCAGACG Example A G C G C G U U G U C A G A C G Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

25 Sequence Alignment example Example S 1 = AGCGCGUU S 2 = GUCAGACG The scoring matrix s to be -1 for mismatch/indel (space), 1 for match. Example A G C G C G U U G U C A G A C G An optimal alignment of S 1 and S 2 is scored -1. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

26 Adding some sequences to the grid We can extend the grid to represent an alignment between two sequences in the following way: We create a grid with size S S vertices. The additional row / column is for the gap sign ( - ). The scores on the edges will be as follows: - j j+1 s[ -,S 2 [j]] i a a s[s 1 [i], - ] s[s 1 [i], S 2 [j]] i+1 a a Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

27 A G C G C G U U G U C A G A C G s t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

28 A G C G C G U U G U C A G A C G s t Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

29 The alignment table S - G U C A G A C G A G C G C G U U S 1 = AGCGCGUU S 2 = GUCAGACG Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

30 The alignment table S - G U C A G A C G S 1 = AGCGCGUU S 2 = GUCAGACG A G C A G G U C C A G G A C C G G U U G C G U U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

31 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

32 Constraint Sequence Alignment Numerous studies suggest the application of additional constraints to sequence alignment for the purpose of improved speed or accuracy. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

33 Constraint Sequence Alignment Numerous studies suggest the application of additional constraints to sequence alignment for the purpose of improved speed or accuracy. The additional constraints can reflect a priori knowledge of the alignment and, therefore, narrows the problem search space or guides the search towards a preferred solution. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

34 Related Work Position anchoring [Myers-96, Sammeth-03] Demanding that the path will pass in certain cells in the table. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

35 Related Work Spaced seeds [Ma-02, Kucherov-05, Benson-06] Constraint on the path in the form of a partial word. Partial words are alignments based on letters 1 (match) and * (dont-care). For example: 11*11* will allow and also Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

36 Related Work Regular Expression Constraint Sequence Alignment (RECSA) [Arslan-05] Each string should satisfy a regular expression constraint. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

37 Related Work SA-REPC Constraint on the path in the form of a regular expression. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

38 Related Work Position anchoring [Myers-96, Sammeth-03] Spaced seeds [Ma-02, Kucherov-05, Benson-06] Regular Expression Constraint Sequence Alignment (RECSA) [Arslan-05] SA-REPC Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

39 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

40 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

41 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Example The constraint is in the form of a regular expression. S 1 = AGCGCGUU S 2 = GUCAGACG R = (1 - match, 0 - everything else) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

42 Preliminaries An extended definition of sequence alignment with alignment-path constraints. Example The constraint is in the form of a regular expression. S 1 = AGCGCGUU S 2 = GUCAGACG R = (1 - match, 0 - everything else) A G C G C G U U G U C A G A C G Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

43 Preliminaries - Alignment alphabet examples Σ r = {1, 0} 1 match 0 any other Example The letters A and A are mapped to 1. U and are mapped to 0. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

44 Preliminaries - Alignment alphabet examples Σ r = {m, s, i, d} m s i d match substitution insertion deletion Example The letters A and A are mapped to m. U and are mapped to d. and A are mapped to i. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

45 Preliminaries - Alignment alphabet examples Σ r = { σ1 σ σ 1, σ 2 Σ } { { } \ } 2 Each letter is mapped to a different symbol in the alignment alphabet Example The letters A and U are mapped to A U in the alignment alphabet and A, to A -. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

46 Preliminaries - Alignment alphabet examples Because some Σ r symbols can be mapped from different symbols in Σ we need a mapping function f defined as: f : Σ Σ P(Σ r ) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

47 Preliminaries - Alignment alphabet examples Because some Σ r symbols can be mapped from different symbols in Σ we need a mapping function f defined as: f : Σ Σ P(Σ r ) Example In Σ r = {0, 1} { σ 1 σ 2 σ 1, σ 2 Σ { } f (A, A) = {1, A A }, f (A, U) = {0, A U } } { \ }: Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

48 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

49 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. R a regular expression over an alignment alphabet Σ r. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

50 Sequence Alignment with a Regular Expression Path Constraint Definition (Global SA-REPC ) S 1 and S 2 be two strings over an alphabet Σ. s a scoring matrix over alphabet Σ. R a regular expression over an alignment alphabet Σ r. Definition Find an alignment of S 1 and S 2 such that two conditions hold: 1 There exists an accepted region in the alignment belonging to L R. 2 The overall score of the alignment, computed according to s, is optimal among all such alignments. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

51 Sequence Alignment vs. SA-REPC Example (input) S 1 = AGCGCGUU S 2 = GUCAGACG s be a scoring matrix: match +1, all other -1. Example (Sequence Alignment) A G C G C G U U G U C A G A C G Optimal alignment value = -1 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

52 Sequence Alignment vs. SA-REPC Example (input) S 1 = AGCGCGUU S 2 = GUCAGACG s be a scoring matrix: match +1, all other -1. R = Example (Sequence Alignment) Example (SA-REPC ) A G C G C G U U A G C G C G U U G U C A G A C G Optimal alignment value = -1 G U C A G A C G Optimal alignment value = -3 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

53 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

54 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

55 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Automaton - A R 1 0 q 0 0 / start q 1 q 2 q 3 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

56 Modifications in the automaton Regular expression - R 1 (1 0)1 2 0 Automaton - A R 1 0 q 0 0 / start q 1 q 2 q 3 Built Automaton - A Σ 1 0 Σ start q init ɛ 0 / q 0 q 1 q 2 q 3 ɛ q final Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

57 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

58 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

59 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

60 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q If no such alignment suffix exists, then the value of the entry M[i, j](q) is null Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

61 Dynamic Programming solution The DP solution We calculate a dynamic programming table M Each cell M[i, j] holds Q entries Cell M[i, j](q) holds the optimal score of aligning S 1 [1, i] with S 2 [1, j] such that there is a run on A which reached q If no such alignment suffix exists, then the value of the entry M[i, j](q) is null The answer is in M[n, m](q final ). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

62 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

63 Single cell calculation The calculation of a single cell M[i, j] under the assumptions: S 1 [i] = S 2 [j] = C s[ C, C ] = 1 s[ C, - ] = s[ -, C ] = 0 Σ 1 0 Σ A = start q init ɛ 0 / q 0 q 1 q 2 q 3 ɛ q final Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

64 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

65 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i 1, j 1](q 0 ) + s[ C, C ] = M[i 1, j 1](q 0 ) + 1 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

66 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i 1, j](q 0 ) + s[ C, - ] = M[i 1, j](q 0 ) + 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

67 Dynamic programming recurrence formula The recurrence formula for the problem is as follows: { 1 0 q = qinitial M[0, 0](q) = null otherwise 2 M[i, j](q) = max {M[i 1, j 1](p) + s[s 1 [i], S 2 [j]] q δ(p, f (S 1 [i], S 2 [j]))} max max {M[i 1, j](p) + s[s 1 [i], - ] q δ(p, f (S 1 [i], - ))} max {M[i, j 1](p) + s[ -, S 2 [j]] q δ(p, f ( -, S 2 [j]))} 3 If i = 0 (or j = 0) the sets above, corresponding to i 1 (or to j 1) are ignored. Example M[i, j](q 1 ) = M[i, j 1](q 0 ) + s[ -, C ] = M[i, j 1](q 0 ) + 0 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

68 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

69 Complexity analysis We denote: t = Q the number of states in A. n to be the length of S 1 m to be the length of S 2 Method Trace Time (NFA) Time (DFA) Memory naïve O(mnt 2 ) O(mnt) O(mnt) naïve O(mnt 2 ) O(mnt) O(min{m, n}t) Hirschberg O(mnt 2 ) O(mnt) O(min{m, n}t) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

70 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

71 The Cell Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

72 Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

73 The central dogma Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

74 A short movie Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

75 mrna regions Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

76 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

77 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

78 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. A cell function at any given time is determined by the composition of proteins in it. micrornas suppress the translation of RNA to Protein. transcription translation DNA RNA Protein Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

79 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. 3 Operate by binding to complementary sequences on their mrna target (this interaction is called: hybridization). Hybridization is chemical bonding of bases (also called base pairing) A:U G:C G:U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

80 micrornas 1 micrornas are short sequences of RNA (approximately 22 bases). 2 Function as specific gene regulators. 3 Operate by binding to complementary sequences on their mrna target (this interaction is called: hybridization). 4 The complex created by hybridization of the microrna to its mrna target is called a duplex. Figure: picture from Lin et al Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

81 Another short movie Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

82 Hybridization and Sequence alignment Hybridization of two sequences can be solved with the standard sequence alignment framework. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

83 Hybridization and Sequence alignment Hybridization of two sequences can be solved with the standard sequence alignment framework. Example The only difference is the scoring scheme. In sequence alignment a match is when both symbols are the same. In hybridization a match is when the two symbols are complementary. The matching pairs are: A:U, G:C and G:U. C U C G U G A U A C A C U U U G U U Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

84 Duplex properties Different properties of the microrna to target duplex were observed, some of which serve as a basis for current microrna target prediction algorithms. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

85 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ). 5 Seed The 5 end of the seed is unpaired or starts with U, and doesn t contain wobble pairs (G:U). Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

86 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Seed Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

87 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Multiplicity: micrornas have been shown to be capable of functioning in a collaborative manner. There are two types of multiplicity: microrna microrna1 microrna2 Target Target Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

88 Duplex properties 1 5 -end dominant seed: Several studies suggest the existence of a 6-8 nucleotides in the 5 -end of the microrna (the seed ) end compensatory seed: There is significant evidence that a 3 -end seed of microrna can compensate for a non-perfect 5 -seed. 3 Multiplicity: micrornas have been shown to be capable of functioning in a collaborative manner. 4 Accessibility and Thermodynamics: Thermodynamics and accessibility of the duplex and its surroundings area are very important properties. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

89 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

90 Using the current dogma on duplexes Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

91 Utilizing SA-REPC for microrna target prediction Some properties of the duplex can be written as a regular expression constraint. 5 -end dominant seed: ( i A G A A A C ) WCB 5 7 ii (WCB) 6 Where: WCB = 3 -end compensatory seed: ( G C C G A U U A ) s 0 2 Inner buldge of the duplex: ( i 1 4 d 1 6)? ( 11 + ( i 1 4 d 1 6)) Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

92 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

93 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

94 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. The complexity of such computations ranges from O(nm 2 ) [Stadler-06] (with restrictions) and up to O(nm 5 ) [Hofacker-08]. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

95 More complex duplex properties Thermodynamics and Accessibility of the duplex site and its surroundings are more complex properties. Both properties are the computational bottlenecks. The complexity of such computations ranges from O(nm 2 ) [Stadler-06] (with restrictions) and up to O(nm 5 ) [Hofacker-08]. We suggest using our method as an initial filter for target prediction tools that rely on energy computation. Thermodynamics: microrna-target hybridization tends to have low free energy. Accessibility: Target site accessibility plays an important role in the formation of the duplex. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

96 Outline 1 About Michal s Group 2 Sequence Alignment with a Regular Expression Path Constraint Manhattan Tourist Problem Sequence Alignment Constraint Sequence Alignment SA-REPC definition Algorithm for the SA-REPC Complexity analysis 3 Applying SA-REPC to microrna target prediction Background micrornas Modifying the SA-REPC for microrna target prediction Results 4 Summary Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

97 Target prediction Implementation Implemented the tool in a java package named: calign. A web version is available at: negevcb/calign Our data set 99 micrornas UTRs of human genes (2183 transcripts). 873 verified duplexes from mirecords. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

98 Comparative Results Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

99 Results Tool # of predicted pairs # of True Positives Sensitivity miranda 22, % PITA 28, % RNA hybrid 43, % calign 43, % Table: Results on all 63,360 pairs Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

100 Conclusions Conclusions Extended Sequence alignment to support a path constraint (SA-REPC ). Presented an application for our algorithm. Implemented the algorithm (calign). Showed preliminary comparative results. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

101 Conclusions Conclusions Extended Sequence alignment to support a path constraint (SA-REPC ). Presented an application for our algorithm. Implemented the algorithm (calign). Showed preliminary comparative results. Future work Find more properties of duplexes that can be used in SA-REPC. Find more applications for SA-REPC. Maybe extended to more general language classifications, such as grammars. An interesting open problem might be the application of some of the techniques previously used to obtain sub-quadratic sequence alignment, such as Four Russians and acceleration by compression. Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

102 Acknowledgements Special Thanks Tamar Pinhas Co-Author Dr. Michal Ziv-Ukelson My Advisor The rest of Michal s group at BGU Erez Katznelson Isana Vaksler Sivan Yogev Shay Zakov Noa Mussa Milo, Pinhas & Ziv-Ukelson (BGU) SA-REPC November / 54

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC

More information

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend

More information

REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH

REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH SHAY ZAKOV, DEKEL TSUR, AND MICHAL ZIV-UKELSON Abstract. We study Valiant s classical algorithm for Context

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach

Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach RESEARCH Open Access Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach Shay Zakov, Dekel Tsur and Michal Ziv-Ukelson * Abstract Background: RNA secondary

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

Dynamic Programming: Edit Distance

Dynamic Programming: Edit Distance Dynamic Programming: Edit Distance Bioinformatics: Issues and Algorithms SE 308-408 Fall 2007 Lecture 10 Lopresti Fall 2007 Lecture 10-1 - Outline Setting the Stage DNA Sequence omparison: First Successes

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable RNA STRUCTURE RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U wobble pairing Bases can only pair with one other base. 23 Hydrogen Bonds more stable RNA Basics transfer RNA (trna) messenger

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

11.3 Decoding Algorithm

11.3 Decoding Algorithm 11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Lesson Overview. Ribosomes and Protein Synthesis 13.2 13.2 The Genetic Code The first step in decoding genetic messages is to transcribe a nucleotide base sequence from DNA to mrna. This transcribed information contains a code for making proteins. The Genetic

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Regular expression constrained sequence alignment revisited

Regular expression constrained sequence alignment revisited Regular expression constrained sequence alignment revisited Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson To cite this version: Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson. Regular expression

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Chapter 6.2. p

Chapter 6.2. p Chapter 6.2 p. 148-155 Day M T W Th F Question Name Period Weekly Lifeline Using the following template: GTACTTATCGT what is the complementary strand of DNA? B_ Check KICK-OFF LEARNING LOG KICK-OFF Response

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Closure under the Regular Operations

Closure under the Regular Operations September 7, 2013 Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have shown this closure

More information

Translation Part 2 of Protein Synthesis

Translation Part 2 of Protein Synthesis Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation

More information

Two Algorithms for LCS Consecutive Suffix Alignment

Two Algorithms for LCS Consecutive Suffix Alignment Two Algorithms for LCS Consecutive Suffix Alignment Gad M. Landau Eugene Myers Michal Ziv-Ukelson Abstract The problem of aligning two sequences A and to determine their similarity is one of the fundamental

More information

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming ombinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming Matthew Macauley Department of Mathematical Sciences lemson niversity http://www.math.clemson.edu/~macaule/ Math

More information

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

More information

Finite Automata. Seungjin Choi

Finite Automata. Seungjin Choi Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

Sparse RNA Folding: Time and Space Efficient Algorithms

Sparse RNA Folding: Time and Space Efficient Algorithms Sparse RNA Folding: Time and Space Efficient Algorithms Rolf Backofen 1, Dekel Tsur 2, Shay Zakov 2, and Michal Ziv-Ukelson 2 1 Albert Ludwigs University, Freiburg, Germany backofen@informatik.uni-freiburg.de

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

RNA secondary structure prediction. Farhat Habib

RNA secondary structure prediction. Farhat Habib RNA secondary structure prediction Farhat Habib RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures

More information

Simulation of Gene Regulatory Networks

Simulation of Gene Regulatory Networks Simulation of Gene Regulatory Networks Overview I have been assisting Professor Jacques Cohen at Brandeis University to explore and compare the the many available representations and interpretations of

More information

Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/35

Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/35 Efficient Algorithms for Regular Expression Constrained Sequence Alignment Yun-Sheng Chung, Chin Lung Lu, and Chuan Yi Tang Department of Computer Science National Tsing Hua University, Taiwan Department

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

Lecture 7: Simple genetic circuits I

Lecture 7: Simple genetic circuits I Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma.

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Lecture 4 Nondeterministic Finite Accepters

Lecture 4 Nondeterministic Finite Accepters Lecture 4 Nondeterministic Finite Accepters COT 4420 Theory of Computation Section 2.2, 2.3 Nondeterminism A nondeterministic finite automaton can go to several states at once. Transitions from one state

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses Teacher Page 1 NUMB3RS Activity: DNA Sequence Alignment Topic: Biomathematics DNA sequence alignment Grade Level: 10-12 Objective: Use mathematics to compare two strings of DNA Prerequisite: very basic

More information

Java II Finite Automata I

Java II Finite Automata I Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz November, 23 Processing Regular Expressions We already learned about Java s regular expression

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Before we show how languages can be proven not regular, first, how would we show a language is regular? CS35 Proving Languages not to be Regular Before we show how languages can be proven not regular, first, how would we show a language is regular? Although regular languages and automata are quite powerful

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

More information

Finite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Finite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University Finite Automata Wen-Guey Tzeng Computer Science Department National Chiao Tung University Syllabus Deterministic finite acceptor Nondeterministic finite acceptor Equivalence of DFA and NFA Reduction of

More information

Chapter 2: Finite Automata

Chapter 2: Finite Automata Chapter 2: Finite Automata Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter before

More information

Tobias Markus. January 21, 2015

Tobias Markus. January 21, 2015 Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider

More information

Network motifs in the transcriptional regulation network (of Escherichia coli):

Network motifs in the transcriptional regulation network (of Escherichia coli): Network motifs in the transcriptional regulation network (of Escherichia coli): Janne.Ravantti@Helsinki.Fi (disclaimer: IANASB) Contents: Transcription Networks (aka. The Very Boring Biology Part ) Network

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

More Dynamic Programming

More Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013 CMPSCI 250: Introduction to Computation Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013 λ-nfa s to NFA s to DFA s Reviewing the Three Models and Kleene s Theorem The Subset

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF) CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour

More information

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA Aviral Takkar Computer Engineering Department, Delhi Technological University( Formerly Delhi College of Engineering), Shahbad Daulatpur, Main Bawana Road,

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Bio nformatics. Lecture 3. Saad Mneimneh

Bio nformatics. Lecture 3. Saad Mneimneh Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per

More information

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression 13.4 Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

September 7, Formal Definition of a Nondeterministic Finite Automaton

September 7, Formal Definition of a Nondeterministic Finite Automaton Formal Definition of a Nondeterministic Finite Automaton September 7, 2014 A comment first The formal definition of an NFA is similar to that of a DFA. Both have states, an alphabet, transition function,

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

A faster algorithm for RNA co-folding

A faster algorithm for RNA co-folding A faster algorithm for RNA co-folding Michal Ziv-Ukelson 1, Irit Gat-Viks 2, Ydo Wexler 3, and Ron Shamir 4 1 Computer Science Department, Ben Gurion University of the Negev, Beer-Sheva. 2 Computational

More information

Introduction to Sequence Alignment. Manpreet S. Katari

Introduction to Sequence Alignment. Manpreet S. Katari Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler

More information

Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

More information

CS 154 Formal Languages and Computability Assignment #2 Solutions

CS 154 Formal Languages and Computability Assignment #2 Solutions CS 154 Formal Languages and Computability Assignment #2 Solutions Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak www.cs.sjsu.edu/~mak Assignment #2: Question 1

More information

A Structure-Based Flexible Search Method for Motifs in RNA

A Structure-Based Flexible Search Method for Motifs in RNA JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 7, 2007 Mary Ann Liebert, Inc. Pp. 908 926 DOI: 10.1089/cmb.2007.0061 A Structure-Based Flexible Search Method for Motifs in RNA ISANA VEKSLER-LUBLINSKY,

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Theory Bridge Exam Example Questions

Theory Bridge Exam Example Questions Theory Bridge Exam Example Questions Annotated version with some (sometimes rather sketchy) answers and notes. This is a collection of sample theory bridge exam questions. This is just to get some idea

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm CSCI 1760 - Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm Shay Mozes Brown University shay@cs.brown.edu Abstract. This report describes parallel Java implementations of

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Comment: The induction is always on some parameter, and the basis case is always an integer or set of integers.

Comment: The induction is always on some parameter, and the basis case is always an integer or set of integers. 1. For each of the following statements indicate whether it is true or false. For the false ones (if any), provide a counter example. For the true ones (if any) give a proof outline. (a) Union of two non-regular

More information

Mathematics for linguists

Mathematics for linguists 1/13 Mathematics for linguists Gerhard Jäger gerhard.jaeger@uni-tuebingen.de Uni Tübingen, WS 2009/2010 November 26, 2009 2/13 The pumping lemma Let L be an infinite regular language over a finite alphabete

More information

Midterm 2 for CS 170

Midterm 2 for CS 170 UC Berkeley CS 170 Midterm 2 Lecturer: Gene Myers November 9 Midterm 2 for CS 170 Print your name:, (last) (first) Sign your name: Write your section number (e.g. 101): Write your sid: One page of notes

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Combinatorial approaches to RNA folding Part I: Basics

Combinatorial approaches to RNA folding Part I: Basics Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2015 M. Macauley (Clemson)

More information