Algoritmi e strutture di dati 2

Size: px
Start display at page:

Download "Algoritmi e strutture di dati 2"

Transcription

1 Algoritmi e strutture di dati 2 Paola Vocca Lezione 5: Allineamento di sequenze Lezione 5 - Allineamento di sequenze 1

2 Allineamento sequenze Struttura secondaria dell RNA Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 2

3 RNA Secondary Structure RNA. String B = b 1 b 2 b n over alphabet {A, C, G, U}. Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 3 G U C A G A A G C G A U G A U U A G A C A A C U G A G U C A U C G G G C C G Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA complementary base pairs: A-U, C-G

4 RNA Secondary Structure Secondary structure. A set of pairs S = {(b i b j )} that satisfy: o [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick complement: A U, U A, C G, or G C. o [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (bi, b j ) S, then i < j 4. o [Non-crossing.] If (bi, bj) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l. Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy. approximate by number of base pairs Goal. Given an RNA molecule B = b 1 b 2 b n, find a secondary structure S that maximizes the number of base pairs. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 4

5 RNA Secondary Structure: Examples Examples. C G G U G G G C G G U C G C G C U A U A U A G U A U A U A base pair A U G U G G C C A U A U G G G G 4 C A U A G U U G G C C A U ok sharp turn crossing Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 5

6 RNA Secondary Structure: Subproblems First attempt. OPT(j) = maximum number of base pairs in a secondary structure of the substring b 1 b 2 b j. match b t and b n 1 t n Difficulty. Results in two sub-problems. o Finding secondary structure in: b 1 b 2 b t-1. o Finding secondary structure in: b t+1 b t+2 b n-1. OPT(t-1) need more sub-problems Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 6

7 Dynamic Programming Over Intervals Notation. OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i b i+1 b j. o Case 1. If i j 4. OPT(i, j) = 0 by no-sharp turns condition. o Case 2. Base b j is not involved in a pair. OPT(i, j) = OPT(i, j 1) o Case 3. Base b j pairs with b t for some i t < j 4. non-crossing constraint decouples resulting sub-problems OPT(i, j) = 1 + max{opt(i, t 1) + OPT(t + 1, j 1)} t take max over t such that i t < j 4 and b t and b j are Watson-Crick complements Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 7

8 Bottom Up Dynamic Programming Over Intervals Q. What order to solve the sub-problems? A. Do shortest intervals first. RNA(b 1,,b n ) { for k = 5, 6,, n-1 for i = 1, 2,, n-k j = i + k Compute M[i, j] } return M[1, n] using recurrence i j Running time. O(n 3 ). Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 8

9 Confronto fra sequenze Allineamento di sequenze Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 9

10 Sequence Alignment Applicazioni. o Alla basedel comando Unix diff. o Riconoscimento del parlato. o Biologia computazionale. La biologia computazionale spesso riguarda lo studio delle sequenze. Sequenze DNA Sequenze RNA Sequenze di proteine. Queste sequenze posso essere viste come stringhe sull alfabeto DNA & RNA: alfabeto di 4 lettere Proteine: Alfabeto di 20 lettere Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 10

11 Confronto fra sequenze Individuare similarità fra sequenze è importante inmolti contesti della biologia. Per esempio: Determinare geni/proteine con una origine comune Consente di predirne la funzione o la struttura. Individuare sottosequenze comuni in geni e/o proteine Identificare motifs comuni Individuare sequenze che si possono sovrapporre. Aiutare nell assemblaggio delle sequenze. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 11

12 Confronto di sequenze: Perché? E uno degli strumenti informatici maggiormente usati in biologia Le nuove sequenze vengono confrontate con le sequenze già presenti nelle base di dati. Sequenze simili spesso hanno una funzione od un origine simili. La selezione opera a livello di sistema, ma le mutazioni avvengono a livello di sequenza Le similarità sono riconoscibili nei secoli Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 12

13 Allineamento di sequenze Defn: An alignment of strings S, T is a pair of strings S, T (with spaces) s.t. 1. S = T, and ( S = length of S ) 2. removing all spaces leaves S, T Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 13

14 Alignment Scoring The score of aligning (characters or spaces) x & y is σ(x,y). Value of an alignment An optimal alignment: one of max value Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 14

15 Optimal Alignment: A Simple Algorithm for all subseqs A of S, B of T s.t. A = B do align A[i] with B[i], 1 i A align all other chars to spaces compute its value retain the max end output the retained alignment Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 15

16 Analisi Assumiamo S = T = n Il costo di valutare un allineamento è O n Il numero di allineamenti possibili sono 2n n o Prendi n caratteri di S, T insieme o Consideriamo i primi k di questi in S o Allinea questi k con k caratteri non scelti di T Tempo totale: n 2n n > 22n Per n > 3 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 16

17 Optimal Substructure (In More Detail) Optimal alignment ends in 1 of 3 ways: last chars of S&T aligned with each other last char of S aligned with space in T last char of T aligned with space in S ( never align space with space; σ(, ) < 0 ) In each case, the rest of S&T should be optimally aligned to each other. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 17

18 Optimal Alignment in O(n 2 ) via Dynamic Programming Input: S, T, S = n, T = m Output: value of optimal alignment Risolvibile tramite i problemi intermedi: V(i, j) = value of optimal alignment of S[1],, S[i] with T[1],, T[j] for all 0 i n, 0 j m. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 18

19 Base Cases Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 19

20 General Case Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 20

21 Calculating One Entry Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 21

22 Example Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 22

23 Example Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 23

24 Complexity Notes Tempo: O mn (sia per determinare il valore dell allineamento, che l allineamento stesso) Spazio: O mn (sia per determinare il valore dell allineamento, che l allineamento stesso) E facile determinare il valore della matrice in tempo O mn e spazio O min{m, n} E possibile calcolare sia il valore sia l allineamento in tempo O mn e spazio O min{m, n} (Prossimi lucidi) Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 24

25 Confronto fra sequenze Somiglianza fra stringhe Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 25

26 String Similarity How similar are two strings? o ocurrance o occurrence o c u r r a n c e - o c c u r r e n c e 6 mismatches, 1 gap Rispetto all allineamento di seguenze, questo è un caso più generale. osi considerano gap (allinemaenti con lo spazio, odisaccoppiamenti (mistmatch) oaccoppiamenti o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 26

27 Edit Distance Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] o Gap penalty ; o mismatch penalty pq. o Cost = sum of gap and mismatch penalties. C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T C C T G A C - T A C A T TC + GT + AG + 2 CA 2 + CA Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 27

28 Sequence Alignment Goal: Given two strings X = x 1 x 2... x m and Y = y 1 y 2... y n find alignment of minimum cost. Def. An alignment M is a set of ordered pairs x i y j such that each item occurs in at most one pair and no crossings. Def. The pair x i y j and x i y j cross if i < i, but j > j. cost( M ) xi y ( x, y ) M i j mismatch j i: x unmatched i j Ex: CTACCG vs. TACATG. Sol: M = x 2 y 1, x 3 y 2, x 4 y 3, x 5 y 4, x 6 y 6. gap j: y unmatched x 1 x 2 x 3 x 4 x 5 x 6 C T A C C - - T A C A T G G y 1 y 2 y 3 y 4 y 5 y 6 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 28

29 Designing the Dynamic Programming FACT. Let M be any Alignment of X and Y. IF (m, n) is not in M THEN either x is not matched in M or y m n is not matched in M. Proof. Otherwise, a cross would occur!!!! Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 29

30 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... xi and y 1 y 2... y j. o Case 1: OPT matches x i y j. pay mismatch for x i y j + min cost of aligning two strings x 1 x 2... x i 1 and y 1 y 2... y j 1 o Case 2a: OPT leaves x i unmatched. pay gap for x i and min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j o Case 2b: OPT leaves y j unmatched. pay gap for y j and min cost of aligning x 1 x 2... x i and y 1 y 2... y j-1 ì ï ï OPT(i, j) = í ï î ï jd if i = 0 ì a xi y j +OPT(i -1, j -1) ï min í d +OPT(i -1, j) otherwise ï î d +OPT(i, j -1) id if j = 0 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 30

31 Sequence Alignment: Algorithm Sequence-Alignment(m, n, x 1 x 2...x m, y 1 y 2...y n,, ) { for i = 0 to m M[i, 0] = i for j = 0 to n M[0, j] = j } for i = 1 to m for j = 1 to n M[i, j] = min( [x i, y j ] + M[i-1, j-1], + M[i-1, j], + M[i, j-1]) return M[m, n] Analysis. (mn) time and space. English words or sentences: m, n 10. Computational biology: m = n = 100, billions ops OK, but 10GB array? 31 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze

32 Sequence comparison Sequence Alignment in Linear Space Lezione 5 - Allineamento di sequenze 32

33 Sequence Alignment: Linear Space Q. Can we avoid using quadratic space? Easy. Optimal value in O(m + n) space and O(mn) time. o Compute OPT(i, ) from OPT(i-1, ). o No longer a simple way to recover alignment itself. Theorem. [Hirschberg 1975] Optimal alignment in O(m + n) space and O(mn) time. o Clever combination of divide-and-conquer and dynamic programming. o Inspired by idea of Savitch from complexity theory. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 33

34 Sequence Alignment: Linear Space Edit distance graph. o Let f(i, j) be shortest path from (0, 0) to (i, j). o Observation: f(i, j) = OPT(i, j). y 1 y 2 y 3 y 4 y 5 y x 1 a xi y j x 2 i-j x 3 m-n Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 34

35 Sequence Alignment: Linear Space Edit distance graph. o Let f(i, j) be shortest path from (0, 0) to (i, j). o Can compute f (, j) for any j in O(mn) time and O(m + n) space. (utilizzando la colonna precedente) j y 1 y 2 y 3 y 4 y y 6 x 1 x 2 i-j x 3 m-n Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 35

36 Sequence Alignment: Linear Space Edit distance graph. o Let g(i, j) be shortest path from (i, j) to (m, n). o Can compute by reversing the edge orientations and inverting the roles of (0, 0) and (m, n) 0-0 y 1 y 2 y 3 y 4 y 5 y 6 x 1 i-j a xi y j x 2 x 3 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 36 m-n

37 Sequence Alignment: Linear Space Edit distance graph. o Let g(i, j) be shortest path from (i, j) to (m, n). o Can compute g(, j) for any j in O(mn) time and O(m + n) space. y 1 j y 2 y 3 y 4 y 5 y x 1 i-j x 2 x 3 Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze m-n 37

38 Sequence Alignment: Linear Space Observation 1. The cost of the shortest path that uses (i, j) is f(i, j) + g(i, j). y 1 y 2 y 3 y 4 y 5 y x 1 i-j x 2 x 3 m-n Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 38

39 Sequence Alignment: Linear Space Proof. l ij the length of the shortest corner-to-corner path in G XY that passes through (i, j). o any such path must get from (0, 0) to (i, j) and from i, j to (m, n). o Its length is at least f(i, j) + g(i, j). o l ij > f(i, j) + g(i, j). o On the other hand, the corner-to-corner path that consists of a minimum-length path from (0, 0) to (i, j), followed by a minimum-length path from i, j to (m, n). o This path has length f(i, j) + g(i, j). o and so we have l ij f(i, j) + g(i, j). l ij = f(i, j) + g(i, j) Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 39

40 Sequence Alignment: Linear Space Observation 2. let q be an index that minimizes f(q, n/2) + g(q, n/2). Then, the shortest path from (0, 0) to (m, n) uses (q, n/2). n / 2 y 1 y 2 y 3 y 4 y 5 y x 1 i-j q x 2 x 3 m-n Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 40

41 Sequence Alignment: Linear Space Divide: find index q that minimizes f(q, n/2) + g(q, n/2) using DP. o Align x q and y n/ 2. Conquer: recursively compute optimal alignment in each piece. n / 2 y 1 y 2 y 3 y 4 y 5 y x 1 i-j q x 2 x 3 m-n Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 41

42 Sequence Alignment: Running Time Analysis Warmup Theorem. Let T m, n length at most m and n. T(m, n) = O(mn log n). = max running time of algorithm on strings of T ( m, n) 2T ( m, n / 2) O( mn) T ( m, n) O( mnlog n) Remark. Analysis is not tight because two sub-problems are of size (q, n/2) and (m q, n/2). In next slide, we save log n factor. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 42

43 Sequence Alignment: Running Time Analysis Theorem. Let T(m, n) = max running time of algorithm on strings of length at most m and n. T(m, n) = O(mn ). Pf. (by induction on n) o O(mn) time to compute f(, n/2) and g(, n/2) and find index q. o T(q, n/2) + T(m q, n/2) time for two recursive calls. o Choose constant c so that: T ( m, 2) cm T (2, n) cn T ( m, n) cmn T ( q, n / 2) T ( m q, n / 2) o Base cases: m = 2 or n = 2. o Inductive hypothesis: T(m, n) 2cmn. T ( m, n) T ( q, n / 2) T ( m q, n / 2) cmn 2cqn / 2 2c( m q) n / 2 cmn cqn cmn cqn cmn 2cmn Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 43

44 Sequence Comparison Local alignments & gaps Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 44

45 Variations Local Alignment Preceding gives global alignment, i.e. full length of both strings; Might well miss strong similarity of part of strings amidst dissimilar flanks Gap Penalties 10 adjacent spaces cost 10 x one space? Many others Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 45

46 Local Alignment: Motivations Interesting (evolutionarily conserved, functionally related) segments may be a small part of the whole Active site of a protein Scattered genes or exons amidst junk, e.g. retroviral insertions, large deletions Don t have whole sequence Global alignment might miss them if flanking junk outweighs similar regions Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 46

47 Local Alignment Optimal local alignment of strings S & T: Find substrings A of S and B of T having max value global alignment Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 47

48 The Obvious Local Alignment Algorithm Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 48

49 Local Alignment in O(nm) via Dynamic Programming Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 49

50 Base Cases Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 50

51 General Case Recurrences Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 51

52 Scoring Local Alignments Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 52

53 Finding Local Alignmentsv Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 53

54 Notes Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 54

55 Alignment With Gap Penalties Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 55

56 Alignment with Gaps AAC-AATTAAG-ACTAC-GTTCATGAC A-CGA-TTA-GCAC-ACTG-T-C-GA AACAATTAAGACTACGTTCATGAC--- AACAATT GTTCATGACGCA Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 56

57 Gap Penalties Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 57

58 Global Alignment with Affine Gap Penalties Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 58

59 Affine Gap Algorithm Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 59

60 Gaps Both alignments have the same number of matches and spaces but alignment II seems better. Definition: A gap is any maximal, consecutive run of spaces in a single string. The length of the gap will be the number of spaces in it. Example I has 11 gaps while example II has only 2 gaps. Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 60

61 Biological motivation Number of mutational events A single gap - due to single event that removed a number of residues. Each separate gap - due to distinct independent events. Protein structure Protein secondary structure consists of alpha helixes, beta sheets and loops Loops of varying size can lead to very similar structure Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 61

62 Alignment in Real Life One of the major uses of alignments is to find sequences in a database Such collections contain massive number of sequences (order of 10 6 ) Finding homologies in these databases with dynamic programming can take too long Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 62

63 Heuristic Search Instead, most searches relay on heuristic procedures these are not guaranteed to find the best match Sometimes, they will completely miss a highscoring match We now describe the main ideas used by some of these procedures Actual implementations often contain additional tricks and hacks Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 63

64 Basic Intuition Almost all heuristic search procedure are based on the observation that real-life matches often contain long strings with gapless matches These heuristic try to find significant gapless matches and then extend them Algoritmi e strutture di dati 2 Lezione 5 - Allineamento di sequenze 64

CS 580: Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis CS 58: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 28 Announcement: Homework 3 due February 5 th at :59PM Midterm Exam: Wed, Feb 2 (8PM-PM) @ MTHW 2 Recap: Dynamic Programming

More information

CSE 202 Dynamic Programming II

CSE 202 Dynamic Programming II CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,

More information

6.6 Sequence Alignment

6.6 Sequence Alignment 6.6 Sequence Alignment String Similarity How similar are two strings? ocurrance o c u r r a n c e - occurrence First model the problem Q. How can we measure the distance? o c c u r r e n c e 6 mismatches,

More information

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing

More information

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem

More information

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications lgorithmic Paradigms hapter Dynamic Programming reedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into sub-problems, solve each

More information

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,. lgorithmic Paradigms hapter Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each

More information

6. DYNAMIC PROGRAMMING II

6. DYNAMIC PROGRAMMING II 6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison

More information

Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18

Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18 /8/8 Objec&ves Dynamic Programming Ø Review Knapsack Ø Sequence Alignment Mar 8, 8 CSCI - Sprenkle Review What is the knapsack problem? What is our solu&on? Mar 8, 8 CSCI - Sprenkle /8/8 Dynamic Programming:

More information

Chapter 6. Dynamic Programming. CS 350: Winter 2018

Chapter 6. Dynamic Programming. CS 350: Winter 2018 Chapter 6 Dynamic Programming CS 350: Winter 2018 1 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 /9/ lgorithmic Paradigms hapter Dynamic Programming reed. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into two sub-problems, solve

More information

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

Dynamic Programming 1

Dynamic Programming 1 Dynamic Programming 1 lgorithmic Paradigms Divide-and-conquer. Break up a problem into two sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure CSE Weighted Interval Scheduling, Knapsack, RNA Secondary Structure Shayan Oveis haran Weighted Interval Scheduling Interval Scheduling Job j starts at s(j) and finishes at f j and has weight w j Two jobs

More information

6. DYNAMIC PROGRAMMING I

6. DYNAMIC PROGRAMMING I 6. DYNAMIC PRORAMMIN I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing some

More information

RNA Secondary Structure. CSE 417 W.L. Ruzzo

RNA Secondary Structure. CSE 417 W.L. Ruzzo RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider

More information

More Dynamic Programming

More Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?

More information

Lecture 5: September Time Complexity Analysis of Local Alignment

Lecture 5: September Time Complexity Analysis of Local Alignment CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb Outline DP paradigm Discrete optimisation Viterbi algorithm DP: Knapsack Dynamic Programming Georgy Gimel farb (with basic contributions by Michael J. Dinneen) COMPSCI 69 Computational Science / Outline

More information

Sequence Comparison. mouse human

Sequence Comparison. mouse human Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic

More information

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Pairwise alignment, Gunnar Klau, November 9, 2005, 16: Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length

More information

Local Alignment: Smith-Waterman algorithm

Local Alignment: Smith-Waterman algorithm Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Bio nformatics. Lecture 3. Saad Mneimneh

Bio nformatics. Lecture 3. Saad Mneimneh Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per

More information

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

More information

Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity

Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches in Sequence Comparison Doolittle et al. (Science, July 1983) searched for platelet-derived growth factor (PDGF)

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

Lecture 4: September 19

Lecture 4: September 19 CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

6. DYNAMIC PROGRAMMING I

6. DYNAMIC PROGRAMMING I lgorithmic paradigms 6. DYNMI PRORMMIN I weighted interval scheduling segmented least squares knapsack problem RN secondary structure reedy. Build up a solution incrementally, myopically optimizing some

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1 CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines Problem 1 (a) Consider the situation in the figure, every edge has the same weight and V = n = 2k + 2. Easy to check, every simple

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 18 Dynamic Programming (Segmented LS recap) Longest Common Subsequence Adam Smith Segmented Least Squares Least squares. Foundational problem in statistic and numerical

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Moreover, the circular logic

Moreover, the circular logic Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027

Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027 Sequence Alignment Evolution CT Amemiya et al. Nature 496, 311-316 (2013) doi:10.1038/nature12027 Evolutionary Rates next generation OK OK OK X X Still OK? Sequence conservation implies function Alignment

More information

String Matching Problem

String Matching Problem String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm

More information

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

More information

Algorithms for biological sequence Comparison and Alignment

Algorithms for biological sequence Comparison and Alignment Algorithms for biological sequence Comparison and Alignment Sara Brunetti, Dipartimento di Ingegneria dell'informazione e Scienze Matematiche University of Siena, Italy, sara.brunetti@unisi.it 1 A piece

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Dynamic programming. Curs 2017

Dynamic programming. Curs 2017 Dynamic programming. Curs 2017 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else

More information

6. DYNAMIC PROGRAMMING I

6. DYNAMIC PROGRAMMING I 6. DYNAMIC PROGRAMMING I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley Copyright 2013

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Lecture 2: Divide and conquer and Dynamic programming

Lecture 2: Divide and conquer and Dynamic programming Chapter 2 Lecture 2: Divide and conquer and Dynamic programming 2.1 Divide and Conquer Idea: - divide the problem into subproblems in linear time - solve subproblems recursively - combine the results in

More information

Lecture 5,6 Local sequence alignment

Lecture 5,6 Local sequence alignment Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh Computational Biology Lecture 5: ime speedup, General gap penalty function Saad Mneimneh We saw earlier that it is possible to compute optimal global alignments in linear space (it can also be done for

More information

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Supplementary Material for CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Chuong B Do, Daniel A Woods, and Serafim Batzoglou Stanford University, Stanford, CA 94305, USA, {chuongdo,danwoods,serafim}@csstanfordedu,

More information

Introduction to Bioinformatics Algorithms Homework 3 Solution

Introduction to Bioinformatics Algorithms Homework 3 Solution Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Similarity Search. The String Edit Distance. Nikolaus Augsten. Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Dynamic programming. Curs 2015

Dynamic programming. Curs 2015 Dynamic programming. Curs 2015 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else

More information

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

CSE 549 Lecture 3: Sequence Similarity & Alignment. slides (w/*) courtesy of Carl Kingsford

CSE 549 Lecture 3: Sequence Similarity & Alignment. slides (w/*) courtesy of Carl Kingsford CSE 549 Lecture 3: Sequence Similarity & Alignment slides (w/*) courtesy of Carl Kingsford Relatedness of Biological Sequence https://en.wikipedia.org/wiki/phylogenetic_tree Relatedness of Biological Sequence

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence Danlin Cai, Daxin Zhu, Lei Wang, and Xiaodong Wang Abstract This paper presents a linear space algorithm for finding

More information

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

CMPSCI 311: Introduction to Algorithms Second Midterm Exam CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance Jingbo Shang, Jian Peng, Jiawei Han University of Illinois, Urbana-Champaign May 6, 2016 Presented by Jingbo Shang 2 Outline

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While

More information

Dynamic Programming. Cormen et. al. IV 15

Dynamic Programming. Cormen et. al. IV 15 Dynamic Programming Cormen et. al. IV 5 Dynamic Programming Applications Areas. Bioinformatics. Control theory. Operations research. Some famous dynamic programming algorithms. Unix diff for comparing

More information

CSE 427 Comp Bio. Sequence Alignment

CSE 427 Comp Bio. Sequence Alignment CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming Algorithm 2 Sequence Alignment Goal: position characters in two strings to best line up identical/similar ones with

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information