Algorithm Design and Analysis

Similar documents
CSE 202 Dynamic Programming II

Dynamic Programming 1

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Copyright 2000, Kevin Wayne 1

Chapter 6. Dynamic Programming. CS 350: Winter 2018

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Copyright 2000, Kevin Wayne 1

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

CS Lunch. Dynamic Programming Recipe. 2 Midterm 2! Slides17 - Segmented Least Squares.key - November 16, 2016

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Partha Sarathi Mandal

CS473 - Algorithms I

CS 580: Algorithm Design and Analysis

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

Approximation: Theory and Algorithms

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Introduction. I Dynamic programming is a technique for solving optimization problems. I Key element: Decompose a problem into subproblems, solve them

6.6 Sequence Alignment

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Graduate Algorithms CS F-09 Dynamic Programming

6. DYNAMIC PROGRAMMING I

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Dynamic programming. Curs 2017

6. DYNAMIC PROGRAMMING I

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

CS 580: Algorithm Design and Analysis

General Methods for Algorithm Design

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

Dynamic programming. Curs 2015

Lecture 13. More dynamic programming! Longest Common Subsequences, Knapsack, and (if time) independent sets in trees.

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 13. More dynamic programming! Longest Common Subsequences, Knapsack, and (if time) independent sets in trees.

Dynamic Programming. Cormen et. al. IV 15

Dynamic Programming. Prof. S.J. Soni

More Dynamic Programming

More Dynamic Programming

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Algorithms Design & Analysis. Dynamic Programming

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

6. DYNAMIC PROGRAMMING II

Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18

CSE 421 Dynamic Programming

6. DYNAMIC PROGRAMMING I

Analysis and Design of Algorithms Dynamic Programming

Lecture 4: September 19

Data Structures and Algorithms

CS483 Design and Analysis of Algorithms

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure

What is Dynamic Programming

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Pairwise sequence alignment

Lecture 2: Divide and conquer and Dynamic programming

Dynamic Programming: Matrix chain multiplication (CLRS 15.2)

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Bio nformatics. Lecture 3. Saad Mneimneh

Matching Residents to Hospitals

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027

Lecture 9. Greedy Algorithm

Lecture 7: Dynamic Programming I: Optimal BSTs

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence

Structure-Based Comparison of Biomolecules

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM

Computing a Longest Common Palindromic Subsequence

A Survey of the Longest Common Subsequence Problem and Its. Related Problems

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Dynamic Programming (CLRS )

Multiple Sequence Alignment (MAS)

Dynamic Programming ACM Seminar in Algorithmics

Dynamic Programming. Credits: Many of these slides were originally authored by Jeff Edmonds, York University. Thanks Jeff!

Motivating the need for optimal sequence alignments...

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

CS60007 Algorithm Design and Analysis 2018 Assignment 1

We have seen that for a function the partial derivatives whenever they exist, play an important role. This motivates the following definition.

Data Structures in Java

Dynamic Programming( Weighted Interval Scheduling)

Implementing Approximate Regularities

Integer Programming. Wolfram Wiesemann. December 6, 2007

Module 9: Tries and String Matching

Dynamic Programming. Reading: CLRS Chapter 15 & Section CSE 2331 Algorithms Steve Lai

Activity selection. Goal: Select the largest possible set of nonoverlapping (mutually compatible) activities.

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Decision Diagrams and Dynamic Programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

Computational Genomics and Molecular Biology, Fall

Weighted Activity Selection

Dynamic Programming. p. 1/43

1 Assembly Line Scheduling in Manufacturing Sector

On Pattern Matching With Swaps

Local Alignment: Smith-Waterman algorithm

CS583 Lecture 11. Many slides here are based on D. Luebke slides. Review: Dynamic Programming

CS Data Structures and Algorithm Analysis

Chapter 8 Dynamic Programming

Sequence analysis and Genomics

Unsupervised Vocabulary Induction

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

INF4130: Dynamic Programming September 2, 2014 DRAFT version

Transcription:

Algorithm Design and Analysis LECTURE 18 Dynamic Programming (Segmented LS recap) Longest Common Subsequence Adam Smith

Segmented Least Squares Least squares. Foundational problem in statistic and numerical analysis. Given n points in the plane: (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Find a line y = ax + b that minimizes the sum of the squared error: n SSE = ( y i ax i b) 2 i=1 y x Solution. Calculus min error is achieved when a = n x i i y i ( i x i ) ( i y i ), b = n x 2 i ( x i ) 2 i i i y i a i x i n O(n) time

Segmented Least Squares Segmented least squares. Points lie roughly on a sequence of several line segments. Given n points in the plane (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) with x 1 < x 2 <... < x n, find a sequence of lines that minimizes: E = sum of the sums of the squared errors in all segments L = number of lines Tradeoff via objective function: E + c L, for some constant c > 0. Note: discontinuous functions permitted! y x

Dynamic Programming: Multiway Choice Notation. OPT(j) = minimum cost for points p 1, p i+1,..., p j. e(i, j) = minimum sum of squares for points p i, p i+1,..., p j. To compute OPT(j): Last segment uses points p i, p i+1,..., p j for some i. Cost = e(i, j) + c + OPT(i-1). 0 if j = 0 OPT( j) = min 1 i j { e(i, j) + c + OPT(i 1) } otherwise

Segmented Least Squares: Algorithm INPUT: n, p 1,,p N, c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error e ij for the segment p i,, p j for j = 1 to n M[j] = min 1 i j (e ij + c + M[i-1]) } return M[n] Running time. O(n 3 ). can be improved to O(n 2 ) by pre-computing various statistics Bottleneck = computing e(i, j) for O(n 2 ) pairs, O(n) per pair using previous formula.

Least Common Subsequence A.k.a. sequence alignment edit distance

Longest Common Subsequence (LCS) Given two sequences x[1.. m] and y[1.. n], find a longest subsequence common to them both. a not the x: A B C B D A B y: B D C A B A BCBA = LCS(x, y)

Longest Common Subsequence (LCS) Given two sequences x[1.. m] and y[1.. n], find a longest subsequence common to them both. a not the x: A B C B D A B y: B D C A B A BCAB BCBA = LCS(x, y)

Motivation Approximate string matching [Levenshtein, 1966] Search for occurance, get results for occurrence Computational biology [Needleman-Wunsch, 1970 s] Simple measure of genome similarity cgtacgtacgtacgtacgtacgtatcgtacgt acgtacgtacgtacgtacgtacgtacgtacgt

Motivation Approximate string matching [Levenshtein, 1966] Search for occurance, get results for occurrence Computational biology [Needleman-Wunsch, 1970 s] Simple measure of genome similarity acgtacgtacgtacgtcgtacgtatcgtacgt aacgtacgtacgtacgtcgtacgta cgtacgt n length(lcs(x,y)) is called the edit distance

Brute-force LCS algorithm Check every subsequence of x[1.. m] to see if it is also a subsequence of y[1.. n]. Analysis Checking = O(n) time per subsequence. 2 m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2 m ) = exponential time.

Dynamic programming algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Notation: Denote the length of a sequence s by s. Strategy: Consider prefixes of x and y. Define c[i, j] = LCS(x[1.. i], y[1.. j]). Then, c[m, n] = LCS(x, y).

Recursive formulation c[i 1, j 1] + 1 if x[i] = y[j], c[i, j] = max{c[i 1, j], c[i, j 1]} otherwise. Base case: c[i,j]=0 if i=0 or j=0. Case x[i] = y[ j]: 1 2 i m x: 1 2 = j n y: The second case is similar.

Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.

Recursive algorithm for LCS LCS(x, y, i, j) // ignoring base cases if x[i] = y[ j] then c[i, j] LCS(x, y, i 1, j 1) + 1 else c[i, j] max{lcs(x, y, i 1, j), LCS(x, y, i, j 1)} return c[i, j] Worse case: x[i] y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

Recursion tree m = 7, n = 6: 7,6 6,6 7,5 5,6 6,5 6,5 7,4 m+n 4,6 5,5 5,5 6,4 5,5 6,4 6,4 7,3 Height = m + n work potentially exponential.

Recursion tree m = 7, n = 6: 7,6 5,6 same subproblem 6,6 7,5 6,5 6,5 7,4 m+n 4,6 5,5 5,5 6,4 5,5 6,4 6,4 7,3 Height = m + n work potentially exponential., but we re solving subproblems already solved!

Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a small number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only m n.

Memoization algorithm Memoization: After computing a solution to a subproblem, store it in a table. Subsequent calls check the table to avoid redoing work. LCS(x, y, i, j) if c[i, j] = NIL then if x[i] = y[j] then c[i, j] LCS(x, y, i 1, j 1) + 1 else c[i, j] max{ LCS(x, y, i 1, j), LCS(x, y, i, j 1)} Time = (m n) = constant work per table entry. Space = (m n). same as before

Dynamic-programming algorithm IDEA: Compute the table bottom-up. B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. A B C B D A B 0 0 0 0 0 0 0 0 B 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. Multiple solutions are possible. B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. Space = (m n). KT book: O(min{m, n}) A B C B D A B 0 0 0 0 0 0 0 0 B 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4