Algorithm Design and Analysis - PDF Free Download

Algorithm Design and Analysis LECTURE 18 Dynamic Programming (Segmented LS recap) Longest Common Subsequence Adam Smith

Segmented Least Squares Least squares. Foundational problem in statistic and numerical analysis. Given n points in the plane: (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Find a line y = ax + b that minimizes the sum of the squared error: n SSE = ( y i ax i b) 2 i=1 y x Solution. Calculus min error is achieved when a = n x i i y i ( i x i ) ( i y i ), b = n x 2 i ( x i ) 2 i i i y i a i x i n O(n) time

Segmented Least Squares Segmented least squares. Points lie roughly on a sequence of several line segments. Given n points in the plane (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) with x 1 < x 2 <... < x n, find a sequence of lines that minimizes: E = sum of the sums of the squared errors in all segments L = number of lines Tradeoff via objective function: E + c L, for some constant c > 0. Note: discontinuous functions permitted! y x

Dynamic Programming: Multiway Choice Notation. OPT(j) = minimum cost for points p 1, p i+1,..., p j. e(i, j) = minimum sum of squares for points p i, p i+1,..., p j. To compute OPT(j): Last segment uses points p i, p i+1,..., p j for some i. Cost = e(i, j) + c + OPT(i-1). 0 if j = 0 OPT( j) = min 1 i j { e(i, j) + c + OPT(i 1) } otherwise

Segmented Least Squares: Algorithm INPUT: n, p 1,,p N, c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error e ij for the segment p i,, p j for j = 1 to n M[j] = min 1 i j (e ij + c + M[i-1]) } return M[n] Running time. O(n 3 ). can be improved to O(n 2 ) by pre-computing various statistics Bottleneck = computing e(i, j) for O(n 2 ) pairs, O(n) per pair using previous formula.

Least Common Subsequence A.k.a. sequence alignment edit distance

Longest Common Subsequence (LCS) Given two sequences x[1.. m] and y[1.. n], find a longest subsequence common to them both. a not the x: A B C B D A B y: B D C A B A BCBA = LCS(x, y)

Longest Common Subsequence (LCS) Given two sequences x[1.. m] and y[1.. n], find a longest subsequence common to them both. a not the x: A B C B D A B y: B D C A B A BCAB BCBA = LCS(x, y)

Motivation Approximate string matching [Levenshtein, 1966] Search for occurance, get results for occurrence Computational biology [Needleman-Wunsch, 1970 s] Simple measure of genome similarity cgtacgtacgtacgtacgtacgtatcgtacgt acgtacgtacgtacgtacgtacgtacgtacgt

Motivation Approximate string matching [Levenshtein, 1966] Search for occurance, get results for occurrence Computational biology [Needleman-Wunsch, 1970 s] Simple measure of genome similarity acgtacgtacgtacgtcgtacgtatcgtacgt aacgtacgtacgtacgtcgtacgta cgtacgt n length(lcs(x,y)) is called the edit distance

Brute-force LCS algorithm Check every subsequence of x[1.. m] to see if it is also a subsequence of y[1.. n]. Analysis Checking = O(n) time per subsequence. 2 m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2 m ) = exponential time.

Dynamic programming algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Notation: Denote the length of a sequence s by s. Strategy: Consider prefixes of x and y. Define c[i, j] = LCS(x[1.. i], y[1.. j]). Then, c[m, n] = LCS(x, y).

Recursive formulation c[i 1, j 1] + 1 if x[i] = y[j], c[i, j] = max{c[i 1, j], c[i, j 1]} otherwise. Base case: c[i,j]=0 if i=0 or j=0. Case x[i] = y[ j]: 1 2 i m x: 1 2 = j n y: The second case is similar.

Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.

Recursive algorithm for LCS LCS(x, y, i, j) // ignoring base cases if x[i] = y[ j] then c[i, j] LCS(x, y, i 1, j 1) + 1 else c[i, j] max{lcs(x, y, i 1, j), LCS(x, y, i, j 1)} return c[i, j] Worse case: x[i] y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

Recursion tree m = 7, n = 6: 7,6 6,6 7,5 5,6 6,5 6,5 7,4 m+n 4,6 5,5 5,5 6,4 5,5 6,4 6,4 7,3 Height = m + n work potentially exponential.

Recursion tree m = 7, n = 6: 7,6 5,6 same subproblem 6,6 7,5 6,5 6,5 7,4 m+n 4,6 5,5 5,5 6,4 5,5 6,4 6,4 7,3 Height = m + n work potentially exponential., but we re solving subproblems already solved!

Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a small number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only m n.

Memoization algorithm Memoization: After computing a solution to a subproblem, store it in a table. Subsequent calls check the table to avoid redoing work. LCS(x, y, i, j) if c[i, j] = NIL then if x[i] = y[j] then c[i, j] LCS(x, y, i 1, j 1) + 1 else c[i, j] max{ LCS(x, y, i 1, j), LCS(x, y, i, j 1)} Time = (m n) = constant work per table entry. Space = (m n). same as before

Dynamic-programming algorithm IDEA: Compute the table bottom-up. B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. A B C B D A B 0 0 0 0 0 0 0 0 B 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. Multiple solutions are possible. B A B C B D A B 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4

Dynamic-programming algorithm IDEA: Compute the table bottom-up. Time = (m n). Reconstruct LCS by tracing backwards. Space = (m n). KT book: O(min{m, n}) A B C B D A B 0 0 0 0 0 0 0 0 B 0 0 1 1 1 1 1 1 D 0 0 1 1 1 2 2 2 C 0 0 1 2 2 2 2 2 A 0 1 1 2 2 2 3 3 B 0 1 2 2 3 3 3 4 A 0 1 2 2 3 3 4 4