Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18

Similar documents
6.6 Sequence Alignment

CSE 202 Dynamic Programming II

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS 580: Algorithm Design and Analysis

Chapter 6. Dynamic Programming. CS 350: Winter 2018

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Copyright 2000, Kevin Wayne 1

6. DYNAMIC PROGRAMMING II

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

More Dynamic Programming

More Dynamic Programming

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Algorithm Design and Analysis

Algoritmi e strutture di dati 2

Bio nformatics. Lecture 3. Saad Mneimneh

Dynamic Programming. Cormen et. al. IV 15

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Objec&ves. Review. Divide and conquer algorithms

Sequence Comparison. mouse human

Lecture 4: September 19

Lecture 5: September Time Complexity Analysis of Local Alignment

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Lecture 2: Pairwise Alignment. CG Ron Shamir

Pair Hidden Markov Models

Review Asympto&c Bounds

Analysis and Design of Algorithms Dynamic Programming

Dynamic Programming: Interval Scheduling and Knapsack

Copyright 2000, Kevin Wayne 1

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

CS 580: Algorithm Design and Analysis

CSE 421 Dynamic Programming

Solving and Graphing Inequalities Joined by And or Or

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Dynamic Programming( Weighted Interval Scheduling)

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Approximation: Theory and Algorithms

Objec&ves. Review. Data structure: Heaps Data structure: Graphs. What is a priority queue? What is a heap?

Dynamic Programming. Prof. S.J. Soni

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Pairwise sequence alignment

6. DYNAMIC PROGRAMMING I

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Dynamic programming. Curs 2015

Algorithms in Bioinformatics

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov

6. DYNAMIC PROGRAMMING I

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure

A Survey of the Longest Common Subsequence Problem and Its. Related Problems

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Computa(on of Similarity Similarity Search as Computa(on. Stoyan Mihov and Klaus U. Schulz

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Implementing Approximate Regularities

CS325: Analysis of Algorithms, Fall Final Exam

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

Sequence Alignment (chapter 6)

Dynamic programming. Curs 2017

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence

Algorithms: COMP3121/3821/9101/9801

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Lecture 2: Divide and conquer and Dynamic programming

Exclusive Disjunction

Activity selection. Goal: Select the largest possible set of nonoverlapping (mutually compatible) activities.

1 The Knapsack Problem

Collected Works of Charles Dickens

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Algorithms in Bioinformatics: A Practical Introduction. Sequence Similarity

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Unsupervised Vocabulary Induction

Define M to be a binary n by m matrix such that:

Dynamic Programming 1

Pairwise & Multiple sequence alignments

Linear Algebra I Lecture 8

Rela%ons and Their Proper%es. Slides by A. Bloomfield

Sections 6.1 and 6.2: Systems of Linear Equations

Part V. Intractable Problems

CS173 Running Time and Big-O. Tandy Warnow

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

CSE 549 Lecture 3: Sequence Similarity & Alignment. slides (w/*) courtesy of Carl Kingsford

Lecture 3: Nondeterministic Finite Automata

6. DYNAMIC PROGRAMMING II

6. DYNAMIC PROGRAMMING II

Introduction to Bioinformatics Algorithms Homework 3 Solution

String Search. 6th September 2018

EECS730: Introduction to Bioinformatics

Lesson 76 Introduction to Complex Numbers

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

CSE 421. Dynamic Programming Shortest Paths with Negative Weights Yin Tat Lee

7-7 Multiplying Polynomials

Do not turn this page until you have received the signal to start. Please fill out the identification section above. Good Luck!

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Transcription:

/8/8 Objec&ves Dynamic Programming Ø Review Knapsack Ø Sequence Alignment Mar 8, 8 CSCI - Sprenkle Review What is the knapsack problem? What is our solu&on? Mar 8, 8 CSCI - Sprenkle

/8/8 Dynamic Programming: Adding a New Variable Def. OPT(i, w) = max profit subset of items,, i with weight limit w Ø Case : OPT does not select item i OPT selects best of {,,, i- } using weight limit w Ø Case : OPT selects item i new weight limit = w w i OPT selects best of {,,, i } using new weight limit, w w i # if i = % OPT(i, w) = $ OPT(i, w) if w i > w % max OPT(i, w), v Mar 8, 8 & { CSCI - Sprenkle i + OPT(i, w w i )} otherwise Knapsack Problem: Bo^om-Up Fill up an n-by-w array Input: W, N, w,,w N, v,,v N for w = to W M[, w] = for i = to N # for all items for w = to W # for all possible weights if w i > w : # item s weight is more than available M[i, w] = M[i-, w] else M[i, w] = max{ M[i-, w], v i + M[i-, w-w i ] } return M[N, W] Mar 8, 8 CSCI - Sprenkle

/8/8 Knapsack Input W = Item Value Weight 8 8 Mar 8, 8 CSCI - Sprenkle Knapsack Algorithm i = W + 8 9 φ n + { } {, } {,, } {,,, } 8 8 9 8 {,,,, } OPT: Solution = W = Value 8 Mar 8, 8 CSCI - Sprenkle Item 8 Weight

/8/8 Knapsack Algorithm i = W + 8 9 φ n + { } {, } {,, } {,,, } {,,,, } 8 8 8 9 8 8 Observations? Questions from last time? W = Value 8 Mar 8, 8 CSCI - Sprenkle Item 8 Weight Knapsack Algorithm i = W + 8 9 φ n + { } {, } {,, } {,,, } {,,,, } 8 8 8 9 8 8 OPT: Solution = What is the optimal solution? W = Value 8 Mar 8, 8 CSCI - Sprenkle 8 Item 8 Weight

/8/8 Knapsack Algorithm W + 8 9 φ n + { } {, } {,, } {,,, } {,,,, } 8 8 8 9 8 8 OPT: = + 8 Solution={, } W = Value 8 Mar 8, 8 CSCI - Sprenkle 9 Item 8 Weight SEQUENCE ALIGNMENT Mar 8, 8 CSCI - Sprenkle

/8/8 String Similarity How similar are two strings? Ø ocurrance Ø occurrence Measurements Ø Gap (-): add a le^er Ø Mismatch o c u r r a n c e - o c c u r r e n c e mismatches, gap o c - u r r a n c e o c c u r r e n c e mismatch, gap Which is the best alignment? o c - u r r - a n c e o c c u r r e - n c e mismatches, gaps Mar 8, 8 CSCI - Sprenkle Edit Distance [Levenshtein 9, Needleman-Wunsch 9] Ø Gap penalty: δ Parameters allow us to tweak cost Ø Mismatch penalty: α pq If p and q are the same, then mismatch penalty is Ø Cost = sum of gap and mismatch penal&es C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T α TC + α GT + α AG + α CA C C T G A C - T A C A T δ + α CA Mar 8, 8 CSCI - Sprenkle

/8/8 Sequence Alignment Goal: Given two strings X = x x... x m and Y = y y... y n find alignment of minimum cost An alignment M is a set of ordered pairs x i -y j such that each item occurs in at most one pair and no crossings The pair x i -y j and x i' -y j' cross if i < i', but j > j. o c c u r e r n c e o c c u r e r n c e o c c u r r e n c e o c c u r r e n c e crossing mismatches Mar 8, 8 CSCI - Sprenkle Sequence Alignment Example X = CTACCG Y = TACATG Solu&on: M = x -y, x -y, x -y, x -y, x -y x x x x x x C T A C C - G - T A C A T G y y y y y y cost( M ) = α xi y j (x i, y j ) M mismatch + δ + δ i : x i unmatched j : y j unmatched gap Recall: mismatch penalty is if x i and y j are the same Mar 8, 8 CSCI - Sprenkle

/8/8 Sequence Alignment Case Analysis Consider last character of the strings X and Y: x M and y N Ø M and N are not necessarily equal i.e., strings are not necessarily the same length What are the possibili&es for x M and y N in terms of the alignment? x y Mar 8, 8 CSCI - Sprenkle Sequence Alignment Case Analysis Consider last character of strings X and Y: x M and y N Ø Case : x M and y N are aligned Ø Case : x M is not matched Ø Case : y N is not matched x y Formulate the optimal solution s value Mar 8, 8 CSCI - Sprenkle 8

/8/8 Sequence Alignment Case Analysis Consider last character of strings X and Y: x M and y N Ø Case : x M and y N are aligned Ø Case : x M is not matched Ø Case : y N is not matched x y OPT(i, j) = min cost of aligning strings x x... x i and y y... y j What are the costs for these cases? Mar 8, 8 CSCI - Sprenkle Sequence Alignment Cost Analysis Consider last character of strings X and Y: x M and y N Ø Case : x M and y N are aligned Pay mismatch for x M -y N + min cost of aligning rest of strings OPT(M, N) = α XmYn + OPT(M-, N-) Ø Case : x M is not matched Pay gap for x M + min cost of aligning rest of strings OPT(M, N) = δ + OPT(M-, N) Ø Case : y N is not matched Pay gap for y N + min cost of aligning rest of strings OPT(M, N) = δ + OPT(M, N-) Mar 8, 8 CSCI - Sprenkle 8 9

/8/8 Sequence Alignment Cost Analysis Base costs? à i or j is Ø What happens when we run out of le^ers in one string before the other? X = CTACCG Y = TACTG Mar 8, 8 CSCI - Sprenkle 9 Sequence Alignment: Problem Structure Gaps for remainder of Y " $ $ OPT(i, j) = # $ % $ jδ if i = " α xi y j + OPT(i, j ) $ min # δ + OPT(i, j) otherwise $ % δ + OPT(i, j ) iδ if j = Ran out of st string Ran out of nd string Gaps for remainder of X Mar 8, 8 CSCI - Sprenkle

/8/8 Sequence Alignment: Algorithm Cost parameters Sequence-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m M[i, ] = iδ for j = to n M[, j] = jδ for i = to m for j = to n M[i, j] = min(α[x i, y j ] + M[i-, j-], δ + M[i-, j], δ + M[i, j-]) return M[m, n] Mar 8, 8 CSCI - Sprenkle Example α =, for vowel mismatch α =, for other mismatches δ = i X = bait b o o t j Y = boot b a i t Mar 8, 8 CSCI - Sprenkle

/8/8 Example X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i b o o t 8 j b a i t 8 Mar 8, 8 CSCI - Sprenkle Example X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i b a i t 8 b o o t 8 j Mar 8, 8 CSCI - Sprenkle

/8/8 Example X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i b a i t 8 b o o t 8 j Mar 8, 8 CSCI - Sprenkle Example X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i b a i t 8 b o o t 8 j Mar 8, 8 CSCI - Sprenkle

/8/8 Example What is the value for the problem? What is the solution? X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i j b a i t 8 b o o t 8 Mar 8, 8 CSCI - Sprenkle Example X = bait Y = boot α =, for vowel mismatch α =, for other mismatches δ = i j b a i t 8 b o o t 8 Mar 8, 8 CSCI - Sprenkle 8

/8/8 Sequence Alignment: Analysis Sequence-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m M[, i] = iδ for j = to n M[j, ] = jδ O(mn) for i = to m for j = to n M[i, j] = min(α[x i, y j ] + M[i-, j-], δ + M[i-, j], δ + M[i, j-]) return M[m, n] Costs? Mar 8, 8 CSCI - Sprenkle Sequence Alignment: Algorithm Sequence-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m M[, i] = iδ for j = to n M[j, ] = jδ for i = to m for j = to n M[i, j] = min(α[x i, y j ] + M[i-, j-], δ + M[i-, j], δ + M[i, j-]) return M[m, n] What are the space costs? When computing M[i,j], which entries in M are used? Mar 8, 8 CSCI - Sprenkle

/8/8 Sequence Alignment: Analysis Sequence-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m M[, i] = iδ for j = to n M[j, ] = jδ Space Cost: O(mn) for i = to m for j = to n M[i, j] = min(α[x i, y j ] + M[i-, j-], δ + M[i-, j], δ + M[i, j-]) return M[m, n] Observation: to calculate the current value, we only need the row above us and the entry to the left Mar 8, 8 CSCI - Sprenkle SEQUENCE ALIGNMENT IN LINEAR SPACE Mar 8, 8 CSCI - Sprenkle

/8/8 Sequence Alignment: O(m) Space Collapse into an m x array Ø M[i,] represents previous row; M[i,] -- current Space-Efficient-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m # initialize first row M[i, ] = iδ for j = to n M[, ] = jδ # first gap for i = to m M[i, ] = min(α[x i, y j ] + M[i-, ], δ + M[i, ], δ + M[i-, ]) for i = to m # copy current row into previous M[i, ] = M[i, ] return M[m, ] Any drawbacks? Mar 8, 8 CSCI - Sprenkle Sequence Alignment: O(m) Space Collapse into an m x array Ø M[i,] represents previous row; M[i,] -- current Space-Efficient-Alignment(m, n, x x...x m, y y...y n, δ, α) for i = to m # initialize first row M[i, ] = iδ for j = to n M[, ] = jδ # first gap for i = to m M[i, ] = min(α[x i, y j ] + M[i-, ], δ + M[i, ], δ + M[i-, ]) for i = to m # copy current row into previous M[i, ] = M[i, ] return M[m, ] Finds optimal value but will not be able to find alignment Mar 8, 8 CSCI - Sprenkle

/8/8 Why Do We Care About Space? For English words or sentences, probably doesn t ma^er Ma^ers for Biological sequence alignment Ø Consider: strings with, symbols each Processor can do billion primi&ve opera&ons BUT dealing with a GB array Mar 8, 8 CSCI - Sprenkle Sequence Alignment: Linear Space Can we avoid using quadra&c space? Ø Op&mal value in O(m) space and O(mn) &me. Compute OPT(i, ) from OPT(i-, ) BUT, no simple way to recover alignment itself Theorem. [Hirschberg 9] Op&mal alignment in O(m + n) space and O(mn) &me. Ø Clever combina&on of divide-and-conquer and dynamic programming Ø Sec&on. Mar 8, 8 CSCI - Sprenkle 8

/8/8 Looking Ahead PS8 Mar 8, 8 CSCI - Sprenkle 9