Dynamic Programming: Edit Distance

Size: px
Start display at page:

Download "Dynamic Programming: Edit Distance"

Transcription

1 Dynamic Programming: Edit Distance Bioinformatics: Issues and Algorithms SE Fall 2007 Lecture 10 Lopresti Fall 2007 Lecture

2 Outline Setting the Stage DNA Sequence omparison: First Successes he hange Problem he Manhattan ourist Problem he Longest ommon Subsequence Problem Sequence Alignment he Edit Distance Problem Lopresti Fall 2007 Lecture

3 Sequence comparison and alignment onsider: AAA AAA versus AA AAA hat the two DNA fragments on the left somehow seem more similar than the two on the right could be significant. Question: how can we measure sequence similarity? Lopresti Fall 2007 Lecture

4 Motivation Why is this important? iven a new DNA sequence, one of the first things a biologist will want to do is search databases of known sequences to see if anyone has recorded something similar. (As we've seen, genetic sequences are long and the databases are enormous, so efficiency will be an issue.) Sequence similarity can provide clues about function. Similarity can provide clues about evolutionary relationships. Many other problems from computational biology incorporate some notion of sequence similarity as a basic premise. Lopresti Fall 2007 Lecture

5 Sequence concepts A sequence is a linear string of symbols over a finite alphabet. (Note: string and sequence are often used synonymously.) Some basic sequence concepts: length empty string subsequence supersequence number of symbols in s (written s ). sequence of length 0 (written ε). sequence that can be obtained from s by removing some symbols (A is a subsequence of A). if t is a subsequence of s, then s is a supersequence of t. Lopresti Fall 2007 Lecture

6 Sequence concepts More basic sequence concepts: substring sequence of consecutive symbols appearing in s (A is not a substring of A, but is). (Observation: every substring is a subsequence of the string in question, but not vice versa.) superstring interval if t is a substring of s, then s is a superstring of t. set of consecutive indices of a string, e.g., [2..4] refers to 2nd through 4th symbols of string s, or s[2]s[3]s[4]. Lopresti Fall 2007 Lecture

7 Sequence concepts And lastly: prefix substring of s of the form s[1..j] where 0 j s (when j = 0, prefix is the empty string). suffix substring of s of the form s[i.. s ] where 1 i s + 1 (when i = s + 1, suffix is the empty string). A is a prefix (and substring and subsequence) of AA. A is a suffix (and substring and subsequence) of AA. Lopresti Fall 2007 Lecture

8 Sequences he concept of a sequence is extremely broad. In this course, we are concerned with genetic sequences. However, there are other important kinds of sequence data: ASII text, speech, handwriting (pen-strokes). Likewise, the same algorithmic techniques turn up again and again, often under different names: approximate string matching, edit (or evolutionary) distance, dynamic time warping. Lopresti Fall 2007 Lecture

9 Sequence comparison: a false start An obvious idea that comes to mind is to line up each symbol and count the number that don't match: A A A A A = 2 As we know, this is Hamming distance and forms the basis for most error correcting codes, as well as the motif-finding problem we saw earlier. But it doesn't work for the kinds of sequences we care about:? = 6 Just one missing symbol at the start of the second sequence leads to a large distance. Lopresti Fall 2007 Lecture

10 Sequence manipulation at the genetic level enomes aren't static sequence comparison must account for this. Lopresti Fall 2007 Lecture

11 he human factor In addition, errors can arise during the sequencing process: "...the error rate is generally less than 1% over the first 650 bases and then rises significantly over the remaining sequence. A hard-to-read gel (arrow marks location where bands of similar intensity appear in two different lanes): A A? A A? A A? Lopresti Fall 2007 Lecture

12 Sequence alignment As we have seen, the two sequences we wish to compare may have different lengths. As a result, we need to allow for deletions and insertions. he notion of an alignment helps us visualize this: A A? - Alignment permits us to incorporate spaces (represented by a dash) in one or both sequences to make them the same length. Lopresti Fall 2007 Lecture

13 Lopresti Fall 2007 Lecture Sequence alignment No this one has 5! How can we find the best alignment? A his alignment has 6 mismatches. Is it the best possible? A

14 DNA sequence comparison First successes: Finding sequence similarities with genes of known function is a common approach to infer function of a newlysequenced gene. In 1984, Russell Doolittle and colleagues found similarities between cancer-causing gene and normal growth factor (PDF) gene. Lopresti Fall 2007 Lecture

15 ystic Fibrosis ystic fibrosis (F) is chronic and often fatal genetic disease of body's mucus glands (abnormally high level of mucus). Primarily affects respiratory systems in children. Mucus is slimy material that coats many epithelial surfaces and is secreted into fluids such as saliva. Inheritance of ystic Fibrosis: In early 1980's, biologists hypothesized that F is an autosomal recessive disorder caused by mutations in a gene that remained unknown until Heterozygous carriers are asymptomatic. Must be homozygously recessive in this gene in order to be diagnosed with F. Lopresti Fall 2007 Lecture

16 ystic Fibrosis: finding the gene Lopresti Fall 2007 Lecture

17 ystic Fibrosis Finding Similarities between ystic Fibrosis ene and AP binding proteins: AP binding proteins are present on cell membrane and act as transport channel. In 1989, biologists found similarity between cystic fibrosis gene and AP binding proteins. A plausible function for cystic fibrosis gene, given that F involves sweet secretion with abnormally high sodium level. Lopresti Fall 2007 Lecture

18 ystic Fibrosis: mutation analysis If a high percentage of cystic fibrosis (F) patients have a certain mutation in the gene, while unaffected patients don t, that could be an indicator of a mutation that is related to F. Indeed, a certain mutation was found in 70% of F patients; this is convincing evidence of a predominant genetic diagnostics marker for F. Lopresti Fall 2007 Lecture

19 ystic Fibrosis and the FR protein FR (ystic Fibrosis ransmembrane conductance Regulator) protein acts in cell membrane of epithelial cells that secrete mucus. hese cells line airways of nose, lungs, stomach wall, etc. FR protein (1480 amino acids) regulates a chloride ion channel: adjusts wateriness of fluids secreted by cell. F sufferers are missing a single amino acid in their FR. Mucus ends up being too thick, affecting many organs. Lopresti Fall 2007 Lecture

20 Bring in the Bioinformaticians ene similarities between two genes with known and unknown function alert biologists to some possibilities. omputing a similarity score between two genes tells how likely it is that they have similar functions. Dynamic programming is a technique for revealing similarities between genes. he hange Problem is a good problem to introduce idea of dynamic programming. Lopresti Fall 2007 Lecture

21 he hange Problem You hand cashier $20 for a $19.23 bill. Your change could be: 3 quarters + 2 pennies 77 pennies etc. he hange Problem. onvert some amount of money M into given denominations, using fewest possible number of coins. Input: An amount of money M, and an array of d denominations, c = (c 1, c 2,, c d ), with (c 1 > c 2 > > c d ). Output: A list of d integers, i 1, i 2,, i d, such that c 1 i 1 + c 2 i c d i d = M and i 1 + i i d is minimal. Lopresti Fall 2007 Lecture

22 he hange Problem: an example iven denominations 1, 3, and 5, what is minimum number of coins needed to make change for a given value? Value Min # of coins Only one coin needed to make change for values 1, 3, and 5. Lopresti Fall 2007 Lecture

23 he hange Problem: an example iven denominations 1, 3, and 5, what is minimum number of coins needed to make change for a given value? Value Min # of coins However, two coins needed to make change for values 2, 4, 6, 8, and 10. Lopresti Fall 2007 Lecture

24 he hange Problem: an example iven denominations 1, 3, and 5, what is minimum number of coins needed to make change for a given value? Value Min # of coins Lastly, three coins needed to make change for values 7 and 9. Lopresti Fall 2007 Lecture

25 he hange Problem: recurrence his example is expressed by the following recurrence relation: minnumoins(m) = min of minnumoins(m-1) + 1 minnumoins(m-3) + 1 minnumoins(m-5) + 1 iven denominations c 1, c 2,, c d, recurrence relation is: minnumoins(m) = min of minnumoins(m-c 1 ) + 1 minnumoins(m-c 2 ) + 1 minnumoins(m-c d ) + 1 Lopresti Fall 2007 Lecture

26 he hange Problem: a recursive algorithm Recursivehange(M, c, d) if M = 0 return 0 bestnumoins infinity for i 1 to d if M c i numoins Recursivehange(M c i, c, d) if numoins + 1 < bestnumoins bestnumoins numoins + 1 return bestnumoins Lopresti Fall 2007 Lecture

27 he hange Problem: a recursive algorithm he Recursivehange algorithm works, but it is not efficient: It recalculates optimal coin combination for a given amount of money repeatedly. E.g., if M = 77 and c = (1,3,7), then optimal combination of coins for 70 cents is computed on 9 different occasions! Lopresti Fall 2007 Lecture

28 he hange Problem: we can do better Rather than re-compute a given value more than once: Save results of each computation for 0 to M. his way, we can do a reference call to find an already computed value, instead of re-computing each time. Running time is M x d, where M is the value of money and d is the number of denominations. Lopresti Fall 2007 Lecture

29 he hange Problem: Dynamic Programming DPhange(M, c, d) bestnumoins 0 0 for m 1 to M bestnumoins m infinity for i 1 to d if m c i if bestnumoins m ci + 1 < bestnumoins m return bestnumoins M bestnumoins m bestnumoins m ci + 1 Lopresti Fall 2007 Lecture

30 DPhange example c = (1,3,7) M = 9 Lopresti Fall 2007 Lecture

31 he Manhattan ourist Problem (MP) Imagine seeking a path (from source to sink) to travel (only eastward and southward) with the most number of attractions (*) in the Manhattan grid. Source * * * * * * * * * * * * Sink Lopresti Fall 2007 Lecture

32 he Manhattan ourist Problem (MP) Imagine seeking a path (from source to sink) to travel (only eastward and southward) with the most number of attractions (*) in the Manhattan grid. Source * * * * * * * * * * * * Sink Lopresti Fall 2007 Lecture

33 he Manhattan ourist Problem (MP) he Manhattan ourist Problem. Find the longest path in a weighted grid. Input: A weighted grid with two distinct vertices, one labeled source and the other labeled sink. Output: A longest path in from source to sink. Lopresti Fall 2007 Lecture

34 MP: an example source j coordinate i coordinate sink Lopresti Fall 2007 Lecture

35 MP: greedy is not optimal source promising start, but leads to bad choices! sink Lopresti Fall 2007 Lecture

36 MP: a simple recursive program M(n, m) if n = 0 and m = 0 return 0 if n > 0 x M(n-1, m) + length of edge from (n-1, m) to (n, m) else x 0 if m > 0 y M(n, m-1) + length of edge from (n, m-1) to (n, m) else y 0 return max(x, y) What's wrong with this approach? Lopresti Fall 2007 Lecture

37 MP: Dynamic Programming source j 0 1 i S 0,1 = S 1,0 = 5 alculate optimal path score for each vertex in graph. Each vertex s score is maximum of prior vertices' scores plus weight of respective edges in between. Lopresti Fall 2007 Lecture

38 MP: Dynamic Programming (continued) source j i S 0,2 = S 1,1 = S 2,0 = 8 Lopresti Fall 2007 Lecture

39 MP: Dynamic Programming (continued) source j i S 3,0 = S 1,2 = S 3,0 = 8 9 S 2,1 = 9 Lopresti Fall 2007 Lecture

40 MP: Dynamic Programming (continued) source j i reedy algorithm fails! S 1,3 = 8 12 S 2,2 = S 3,1 = 9 Lopresti Fall 2007 Lecture

41 MP: Dynamic Programming (continued) source j i Done! S3,3 = 16 Lopresti Fall 2007 Lecture

42 MP: recurrence omputing score for a point (i, j) by recurrence relation: s i, j = max s i-1, j + weight of the edge between (i-1, j) and (i, j) s i, j-1 + weight of the edge between (i, j-1) and (i, j) he running time is n x m for a n-by-m grid, where n is the number of rows and m is the number of columns. Lopresti Fall 2007 Lecture

43 But Manhattan is not a perfect grid... A 2 A 3 What about diagonals? A 1 B Score at point B is given by: s B = max of s A1 + weight of the edge (A 1, B) s A2 + weight of the edge (A 2, B) s A3 + weight of the edge (A 3, B) Lopresti Fall 2007 Lecture

44 But Manhattan is not a perfect grid... omputing score s x for point x is given by recurrence relation: s x = max s y + weight of edge (y, x) where of y Predecessors(x) Predecessors(x) = set of vertices that have edges leading to x Running time for a graph (V, E) is O(E) since each edge is evaluated once Note: V is set of all vertices and E is set of all edges. Lopresti Fall 2007 Lecture

45 Alignment: two-row representation iven 2 DNA sequences v and w: v : w : A A A A m = 7 n = 6 Alignment : 2 * k matrix ( k > m, n ) letters of v letters of w A A A - A - 4 matches 2 insertions 3 deletions Lopresti Fall 2007 Lecture

46 Aligning DNA sequences v = AA w = AA n = 8 m = matches mismatches deletions insertions match mismatch V A A W A A indels insertion deletion Lopresti Fall 2007 Lecture

47 he Longest ommon Subsequence Problem (LS) he Longest ommon Subsequence Problem. Find the longest common subsequence of two sequences. Input: wo sequences: v = v 1 v 2 v m and w = w 1 w 2 w n Output: A sequence of positions in v : 1 i 1 < i 2 < < i t m and a sequence of positions in w : 1 j 1 < j 2 < < j t n such that symbol i k of v matches j k of w and t is maximal. Lopresti Fall 2007 Lecture

48 LS example i coords: elements of v elements of w j coords: 0 A - - A - A - A (0,0) (1,0) (2,1) (2,2) (3,3) (3,4) (4,5) (5,5) (6,6) (7,6) (8,7) Matches shown in red positions in v: positions in w: 2 < 3 < 4 < 6 < 8 1 < 3 < 5 < 6 < 7 Every common subsequence is a path in 2-D grid. Lopresti Fall 2007 Lecture

49 LS Dynamic Programming Find LS of two strings. Input: a weighted graph with two distinct vertices, one labeled source and one labeled sink. Output: a longest path in from source to sink. Solve using an MP graph with diagonals replaced by +1 edges. Lopresti Fall 2007 Lecture

50 LS Problem as Manhattan ourist Problem i 0 j A A A A Lopresti Fall 2007 Lecture

51 MP raph for LS Problem i 0 j A A A A Lopresti Fall 2007 Lecture

52 MP raph for LS Problem j A A i Every path is common subsequence. A A Diagonal edge adds extra element to subsequence. 7 LS Problem: find path with maximum diagonal edges. Lopresti Fall 2007 Lecture

53 omputing LS Let v i = prefix of v of length i : v 1 v i and w j = prefix of w of length j : w 1 w j he length of LS(v i, w j ) is computed by: s i-1, j s i, j = max s i, j-1 s i-1, j if v i = w j i-1,j -1 i,j -1 i-1,j i,j Lopresti Fall 2007 Lecture

54 LS Alignments Every path in grid corresponds to an alignment: W A V A V = A - W = A Lopresti Fall 2007 Lecture

55 Edit distance In 1966, Levenshtein introduced edit distance between two strings as minimum number of elementary operations (insertions, deletions, and substitutions) needed to transform one string into the other. d(v, w) = MIN number of elementary operations needed to transform v w. Keep in mind that LS computation uses a MAX. Lopresti Fall 2007 Lecture

56 he Edit Distance Problem he Edit Distance Problem. iven two sequences, find the optimal series of deletions, insertions, and substitutions to transform one into the other. Input: wo sequences: v = v 1 v 2 v m and w = w 1 w 2 w n Output: An optimal series of basic editing operations: e 1, e 2,..., e t such then, when applied to one of the sequences, say v, it is transformed into the other sequence, w. Here optimal can mean any of a number of things, including fewest or lowest- / highest-cost. Lopresti Fall 2007 Lecture

57 Edit distance vs. Hamming distance Hamming distance always compares i -th letter of v with i -th letter of w v = AAAA w = AAAA Hamming distance: d(v, w) = 8 Just one shift Make it all line up Edit distance may compare i -th letter of v with j -th letter of w v = -AAAA w = AAAA Edit distance: d(v, w) = 2 omputing Hamming distance is trivial task. omputing edit distance is non-trivial task. Lopresti Fall 2007 Lecture

58 Edit distance vs. Hamming distance Hamming distance always compares i -th letter of v with i -th letter of w v = AAAA w = AAAA Hamming distance: d(v, w) = 8 Just one shift Make it all line up Edit distance may compare i -th letter of v with j -th letter of w v = -AAAA w = AAAA- Edit distance: d(v, w) = 2 How to find which j goes with which i??? One insertion and one deletion. Lopresti Fall 2007 Lecture

59 Edit distance example AA AA in 5 steps: AA (delete last ) AA (delete last A) A (insert A at front) AA (substitute for 3 rd ) AA (insert before last A) AA (Done) Lopresti Fall 2007 Lecture

60 Edit distance example AA AA in 5 steps: AA (delete last ) AA (delete last A) A (insert A at front) AA (substitute for 3 rd ) AA (insert before last A) AA (Done) What is the edit distance? 5? Lopresti Fall 2007 Lecture

61 Edit distance example AA AA in 4 steps: AA (insert A at front) AAA (delete 8 th ) AAA (substitute for 5 th A) AA (substitute for 3 rd ) AA (Done) Lopresti Fall 2007 Lecture

62 Edit distance example AA AA in 4 steps: AA (insert A at front) AAA (delete 8 th ) AAA (substitute for 5 th A) AA (substitute for 3 rd ) AA (Done) an it be done in 3 steps??? Lopresti Fall 2007 Lecture

63 Alignment grid Every alignment path is from source to sink. Lopresti Fall 2007 Lecture

64 Alignment as a path in edit graph A _ A _ A _ A _ orresponding path: (0,0), (1,1), (2,2), (2,3), (3,4), (4,5), (5,5), (6,6), (7,6), (7,7) Lopresti Fall 2007 Lecture

65 Alignment as a path in edit graph and represent indels in v and w with score 0. represent matches with score 1. he score of the alignment path is 5. Lopresti Fall 2007 Lecture

66 Alignment as a path in edit graph Every path in edit graph orresponds to an alignment: Lopresti Fall 2007 Lecture

67 Alignment as a path in edit graph Old Alignment v = A_A_ w = A_A_ New Alignment New Alignment v = A_A_ w = A_A_ Lopresti Fall 2007 Lecture

68 Alignment as a path in edit graph v = A_A_ w = A_A_ (0,0), (1,1), (2,2), (2,3), (3,4), (4,5), (5,5), (6,6), (7,6), (7,7) Lopresti Fall 2007 Lecture

69 Sequence comparison: the basic algorithm We can't afford to enumerate all possible alignments looking for the best one that would be an exponential search. Fortunately we don't have to. he optimal alignment can be found using dynamic programming, which we've just seen. Dynamic programming is based on premise of computing solutions to smaller subproblems first and then using these to solve successively larger problems until we have our answer. (Dynamic programming was invented by Richard Bellman in the 1950's. Its application to sequence comparison came later, in the 1970's.) Lopresti Fall 2007 Lecture

70 Sequence alignment... First some ground-rules. Legal: A Legal: - deletion - insertion Not legal: - Legal: Legal: - match mismatch Lopresti Fall 2007 Lecture

71 Sequence comparison: the basic algorithm iven two sequences v and w, consider what's required to compute optimal alignment for prefixes v[1..i] and w[1..j]. Based on our rules for alignments, there are three possible cases: deletion insertion substitution or match optimal alignment for v[1..i-1] and w[1..j] v[i] - optimal alignment for v[1..i] and w[1..j-1] - w[j] optimal alignment for v[1..i-1] and w[1..j-1] v[i] or w[j] I II III So, assuming we've already computed solutions for all shorter prefixes, we can compute the alignment for v[1..i] and w[1..j]. Lopresti Fall 2007 Lecture

72 Sequence comparison: the basic algorithm onceptually, this might look something like this: optimal alignment at v[1..i-1] and w[1..j] + cost of deleting v[i] optimal alignment at v[1..i] and w[1..j] = max optimal alignment at v[1..i] and w[1..j-1] + cost of inserting w[j] Here we assume that deletions, insertions, and mismatches have negative costs, while matches have positive costs. optimal alignment at v[1..i-1] and w[1..j-1] + cost of matching v[i] and w[j] Lopresti Fall 2007 Lecture

73 Sequence comparison: the basic algorithm his computation can be viewed as building a 2-D matrix: s t r i n g v ε ε 0 cost of deleting v d[i,j] = max s t r i n g where indel is cost of a deletion or insertion, and sub is cost of a match/mismatch involving symbols v[i] and w[j]. w cost of inserting w d[i-1,j] + indel d[i,j-1] + indel d[i-1,j-1] + sub(v[i],w [j]) Lopresti Fall 2007 Lecture

74 Sequence comparison: the basic algorithm Stated more generally, say that our two sequences are: v[1]v[2]v[3]...v[m] w[1]w[2]w[3]...w[n] hen: d[0,0] = 0 d[i,0] = d[i-1,0] + c del (v[i]) d[0,j] = d[0,j-1] + c ins (w[j]) 1 i m 1 j n initial conditions And: d[i,j] = max d[i-1,j] + c del (v[i]) d[i,j-1] + c ins (w[j]) 1 i m, 1 j n d[i-1,j-1] + c sub (v[i], w[j]) recurrence Where c del, c ins, and c sub are the costs of a deletion, an insertion, and a substitution, respectively. Lopresti Fall 2007 Lecture

75 Sequence comparison: the basic algorithm Example: say that c del = -1, c ins = -1, c sub = -1 if mismatch and +1 if match A A Lopresti Fall 2007 Lecture

76 Algorithm for computing edit distance EditDistance1(v, w) for i 1 to n d i,0 d i-1,0 + c del (v i ) for j 1 to m d 0,j d 0,j-1 + c ins (w j ) for i 1 to n for j 1 to m d i-1,j + c del (v i ) d i,j max d i,j-1 + c ins (w j ) d i-1,j-1 + c sub (v i, w j ) return (d n,m ) Lopresti Fall 2007 Lecture

77 Sequence comparison: some observations omputation can progress in a number of ways: or or For sequences of length m and n, v[1]v[2]v[3]...v[m] w[1]w[2]w[3]...w[n] What is the computation time? O(mn) How much memory (storage) is required? O(mn) Lopresti Fall 2007 Lecture

78 Sequence comparison: getting an alignment We started with the notion of alignment. How do we get this? By keeping track of optimal decisions made during algorithm, A A and then tracing back optimal path. A - A (May not be unique). Lopresti Fall 2007 Lecture

79 Maintaining the traceback EditDistance1(v, w)... for i 1 to n for j 1 to m d i-1,j + c del (v i ) d i,j max d i,j-1 + c ins (w j ) d i-1,j-1 + c sub (v i, w j ) if d i,j = d i-1,j + c del (v i ) b i,j if d i,j = d i,j-1 + c ins (w j ) return (d n,m, b) if d i,j = d i-1,j-1 + c sub (v i, w j ) Lopresti Fall 2007 Lecture

80 More examples omparing A vs. A: Lopresti Fall 2007 Lecture

81 More examples omparing AA vs. A: Lopresti Fall 2007 Lecture

82 More examples omparing A vs. : Recall the trace we saw earlier: A Lopresti Fall 2007 Lecture

83 Optimality of the algorithm How do we know this agorithm finds an optimal alignment? We won't dig into finer details now, but basically it hinges on: Bellman's principle of optimality for dynamic programming. he cost functions behaving properly (i.e., they must satisfy the triangle inequality). his could be an interesting topic for a final paper... Lopresti Fall 2007 Lecture

84 Wrap-up Readings for next time: ontinue reading IBA hapter 6 (sequence comparison). Remember: ome to class having done the readings. heck Blackboard regularly for updates. Lopresti Fall 2007 Lecture

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity

More information

Bio nformatics. Lecture 3. Saad Mneimneh

Bio nformatics. Lecture 3. Saad Mneimneh Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per

More information

Sequence Alignment. Johannes Starlinger

Sequence Alignment. Johannes Starlinger Sequence Alignment Johannes Starlinger his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Dynamic Programming. Prof. S.J. Soni

Dynamic Programming. Prof. S.J. Soni Dynamic Programming Prof. S.J. Soni Idea is Very Simple.. Introduction void calculating the same thing twice, usually by keeping a table of known results that fills up as subinstances are solved. Dynamic

More information

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo,

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Introduction to spectral alignment

Introduction to spectral alignment SI Appendix C. Introduction to spectral alignment Due to the complexity of the anti-symmetric spectral alignment algorithm described in Appendix A, this appendix provides an extended introduction to the

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Similarity Search. The String Edit Distance. Nikolaus Augsten. Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester

More information

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance Jingbo Shang, Jian Peng, Jiawei Han University of Illinois, Urbana-Champaign May 6, 2016 Presented by Jingbo Shang 2 Outline

More information

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

LIFE SCIENCES: PAPER I ANSWER BOOKLET

LIFE SCIENCES: PAPER I ANSWER BOOKLET NATIONAL SENIOR CERTIFICATE EXAMINATION NOVEMBER 2012 LIFE SCIENCES: PAPER I EXAMINATION NUMBER ANSWER BOOKLET There are (vi) pages in this Answer Booklet. QUESTION 1 1.1 Select the term in Column B that

More information

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences

More information

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

More information

Multiple Sequence Alignment: HMMs and Other Approaches

Multiple Sequence Alignment: HMMs and Other Approaches Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

Introduction to Bioinformatics Algorithms Homework 3 Solution

Introduction to Bioinformatics Algorithms Homework 3 Solution Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for

More information

On the Monotonicity of the String Correction Factor for Words with Mismatches

On the Monotonicity of the String Correction Factor for Words with Mismatches On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.

More information

Sequence Comparison. mouse human

Sequence Comparison. mouse human Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Lecture 4: September 19

Lecture 4: September 19 CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

CMPSCI 311: Introduction to Algorithms Second Midterm Exam CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1 CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines Problem 1 (a) Consider the situation in the figure, every edge has the same weight and V = n = 2k + 2. Easy to check, every simple

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Algebraic Dynamic Programming. Dynamic Programming, Old Country Style

Algebraic Dynamic Programming. Dynamic Programming, Old Country Style Algebraic Dynamic Programming Session 2 Dynamic Programming, Old Country Style Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity

Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches in Sequence Comparison Doolittle et al. (Science, July 1983) searched for platelet-derived growth factor (PDGF)

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider

More information

More Dynamic Programming

More Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?

More information

What can you prove by induction?

What can you prove by induction? MEI CONFERENCE 013 What can you prove by induction? Martyn Parker M.J.Parker@keele.ac.uk Contents Contents iii 1 Splitting Coins.................................................. 1 Convex Polygons................................................

More information

Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

11.3 Decoding Algorithm

11.3 Decoding Algorithm 11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence

More information

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 4.1 Interval Scheduling Interval Scheduling Interval scheduling. Job j starts at s j and

More information

MITOCW MIT8_01F16_L12v01_360p

MITOCW MIT8_01F16_L12v01_360p MITOCW MIT8_01F16_L12v01_360p Let's look at a typical application of Newton's second law for a system of objects. So what I want to consider is a system of pulleys and masses. So I'll have a fixed surface

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied

More information

MITOCW ocw f99-lec23_300k

MITOCW ocw f99-lec23_300k MITOCW ocw-18.06-f99-lec23_300k -- and lift-off on differential equations. So, this section is about how to solve a system of first order, first derivative, constant coefficient linear equations. And if

More information

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion

More information

Exam EDAF May 2011, , Vic1. Thore Husfeldt

Exam EDAF May 2011, , Vic1. Thore Husfeldt Exam EDAF05 25 May 2011, 8.00 13.00, Vic1 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Samson Zhou. Pattern Matching over Noisy Data Streams

Samson Zhou. Pattern Matching over Noisy Data Streams Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt

More information

INF4130: Dynamic Programming September 2, 2014 DRAFT version

INF4130: Dynamic Programming September 2, 2014 DRAFT version INF4130: Dynamic Programming September 2, 2014 DRAFT version In the textbook: Ch. 9, and Section 20.5 Chapter 9 can also be found at the home page for INF4130 These slides were originally made by Petter

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of

More information

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

6. DYNAMIC PROGRAMMING II

6. DYNAMIC PROGRAMMING II 6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison

More information

MITOCW MITRES18_006F10_26_0501_300k-mp4

MITOCW MITRES18_006F10_26_0501_300k-mp4 MITOCW MITRES18_006F10_26_0501_300k-mp4 ANNOUNCER: The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

CS1800: Strong Induction. Professor Kevin Gold

CS1800: Strong Induction. Professor Kevin Gold CS1800: Strong Induction Professor Kevin Gold Mini-Primer/Refresher on Unrelated Topic: Limits This is meant to be a problem about reasoning about quantifiers, with a little practice of other skills, too

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Efficient High-Similarity String Comparison: The Waterfall Algorithm Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Chapter 5 Simplifying Formulas and Solving Equations

Chapter 5 Simplifying Formulas and Solving Equations Chapter 5 Simplifying Formulas and Solving Equations Look at the geometry formula for Perimeter of a rectangle P = L W L W. Can this formula be written in a simpler way? If it is true, that we can simplify

More information

1 Alphabets and Languages

1 Alphabets and Languages 1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,

More information

CS 580: Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis CS 58: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 28 Announcement: Homework 3 due February 5 th at :59PM Midterm Exam: Wed, Feb 2 (8PM-PM) @ MTHW 2 Recap: Dynamic Programming

More information

Sequence Analysis '17 -- lecture 7

Sequence Analysis '17 -- lecture 7 Sequence Analysis '17 -- lecture 7 Significance E-values How significant is that? Please give me a number for......how likely the data would not have been the result of chance,......as opposed to......a

More information

Exam EDAF August Thore Husfeldt

Exam EDAF August Thore Husfeldt Exam EDAF05 25 August 2011 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are the only aids

More information

String Matching Problem

String Matching Problem String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm

More information

Dynamic programming. Curs 2017

Dynamic programming. Curs 2017 Dynamic programming. Curs 2017 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming Algorithms and Theory of Computation Lecture 9: Dynamic Programming Xiaohui Bei MAS 714 September 10, 2018 Nanyang Technological University MAS 714 September 10, 2018 1 / 21 Recursion in Algorithm Design

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 02 Groups: Subgroups and homomorphism (Refer Slide Time: 00:13) We looked

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

CHAPTER 1. REVIEW: NUMBERS

CHAPTER 1. REVIEW: NUMBERS CHAPTER. REVIEW: NUMBERS Yes, mathematics deals with numbers. But doing math is not number crunching! Rather, it is a very complicated psychological process of learning and inventing. Just like listing

More information

Exam EDAF May Thore Husfeldt

Exam EDAF May Thore Husfeldt Exam EDAF05 29 May 13 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are the only aids that

More information

Compute the Fourier transform on the first register to get x {0,1} n x 0.

Compute the Fourier transform on the first register to get x {0,1} n x 0. CS 94 Recursive Fourier Sampling, Simon s Algorithm /5/009 Spring 009 Lecture 3 1 Review Recall that we can write any classical circuit x f(x) as a reversible circuit R f. We can view R f as a unitary

More information

Semiconductor Optoelectronics Prof. M. R. Shenoy Department of physics Indian Institute of Technology, Delhi

Semiconductor Optoelectronics Prof. M. R. Shenoy Department of physics Indian Institute of Technology, Delhi Semiconductor Optoelectronics Prof. M. R. Shenoy Department of physics Indian Institute of Technology, Delhi Lecture - 18 Optical Joint Density of States So, today we will discuss the concept of optical

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

Proving languages to be nonregular

Proving languages to be nonregular Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and

More information

Finite Automata Part Two

Finite Automata Part Two Finite Automata Part Two Recap from Last Time Old MacDonald Had a Symbol, Σ-eye-ε-ey, Oh! You may have noticed that we have several letter- E-ish symbols in CS103, which can get confusing! Here s a quick

More information

1 Seidel s LP algorithm

1 Seidel s LP algorithm 15-451/651: Design & Analysis of Algorithms October 21, 2015 Lecture #14 last changed: November 7, 2015 In this lecture we describe a very nice algorithm due to Seidel for Linear Programming in lowdimensional

More information

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

CSE 202 Dynamic Programming II

CSE 202 Dynamic Programming II CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,

More information

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

Solutions to November 2006 Problems

Solutions to November 2006 Problems Solutions to November 2006 Problems Problem 1. A six-sided convex polygon is inscribed in a circle. All its angles are equal. Show that its sides need not be equal. What can be said about seven-sided equal-angled

More information

MITOCW MITRES_18-007_Part1_lec3_300k.mp4

MITOCW MITRES_18-007_Part1_lec3_300k.mp4 MITOCW MITRES_18-007_Part1_lec3_300k.mp4 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources

More information