lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original problem Dynamic programming Break up a problem into a series of overlapping sub-problems, and build up solutions to larger and larger sub-problems lgorithm Design by Éva Tardos and Jon Kleinberg Portions opyright 25 ddison Wesley Slides by Kevin Wayne and Neil Rhodes 2 Dynamic Programming Steps haracterize structure of optimal solution Weighted Interval Scheduling Define a recurrence for the value of an optimal solution ompute the value of an optimal solution Either bottom-up Top-down with memoization onstruct the optimal solution from optimal value and intermediate information 3
Weighted Interval Scheduling nweighted Interval Scheduling Review Weighted interval scheduling problem Job j starts at s j, finishes at f j, and has weight or value v j Two jobs compatible if they don't overlap oal: find maximum weight subset of mutually compatible jobs reedy algorithm works if all weights are onsider jobs in ascending order of finish time dd job to subset if it is compatible with previously chosen jobs b a c Observation reedy algorithm can fail spectacularly if arbitrary weights are allowed d e f g h 2 3 4 5 8 9 Time weight = 999 weight = a b 2 3 4 5 8 9 Time 5 Weighted Interval Scheduling Dynamic Programming: Binary hoice Notation Label jobs by finishing time: f f 2 f n Def p(j) = largest index i < j such that job i is compatible with j Ex: p(8) = 5, p() = 3, p(2) = 2 3 4 Notation OPT(j) = value of optimal solution to the problem consisting of job requests, 2,, j ase : OPT selects job j can't use incompatible jobs { p(j), p(j) 2,, j - must include optimal solution to problem consisting of remaining compatible jobs, 2,, p(j) ase 2: OPT does not select job j must include optimal solution to problem consisting of remaining compatible jobs, 2,, j- optimal substructure 5 8 2 3 4 5 8 9 Time # if j = OPT( j) = $ % max v j OPT( p( j)), OPT( j ") { otherwise 8
Weighted Interval Scheduling: Brute Force Weighted Interval Scheduling: Brute Force Brute force algorithm Observation Recursive algorithm fails spectacularly because of redundant sub-problems $ exponential algorithms Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f 2 f n Ex Number of recursive calls for family of "layered" instances grows like Fibonacci sequence 5 ompute p(), p(2),, p(n) 4 3 ompute-opt(j) { if (j = ) return else return max(v j ompute-opt(p(j)), ompute-opt(j-)) 2 3 4 p() =, p(j) = j-2 5 3 2 2 2 9 Weighted Interval Scheduling: Memoization Weighted Interval Scheduling: Running Time Memoization Store results of each sub-problem in a cache; lookup as needed laim Memoized version of algorithm takes O(n log n) time Sort by finish time: O(n log n) omputing p(") : O(n) after sorting by start time Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f 2 f n ompute p(), p(2),, p(n) for j = to n M[j] = empty M[j] = global array M-ompute-Opt(j) { if (M[j] is empty) M[j] = max(w j M-ompute-Opt(p(j)), M-ompute-Opt(j-)) return M[j] M-ompute-Opt(j): each invocation takes O() time and either (i) returns an existing value M[j] (ii) fills in one new entry M[j] and makes two recursive calls Progress measure # = # nonempty entries of M[] initially # =, throughout # n (ii) increases # by $ at most 2n recursive calls Overall running time of M-ompute-Opt(n) is O(n) Remark O(n) if jobs are pre-sorted by start and finish times 2
utomated Memoization Weighted Interval Scheduling: Bottom-p utomated memoization Many functional programming languages (eg, Lisp) have library or built-in support for memoization Bottom-up dynamic programming nwind recursion (def-memo-fun F (n) (if (<= n ) n ( (F (- n )) (F (- n 2))))) Lisp (efficient) static int F(int n) { if (n <= ) return n; else return F(n-) F(n-2); Java (exponential) F(4) Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f 2 f n ompute p(), p(2),, p(n) Iterative-ompute-Opt { M[] = for j = to n M[j] = max(v j M[p(j)], M[j-]) F(39) F(38) F(38) F(3) F(3) F(3) F(3) F(3) F(3) F(35) F(3) F(35) F(35) F(34) 3 4 Weighted Interval Scheduling: Finding a Solution Q Dynamic programming algorithms computes optimal value What if we want the solution itself? Do some post-processing Knapsack Problem Run M-ompute-Opt(n) Run Find-Solution(n) Find-Solution(j) { if (j = ) output nothing else if (v j M[p(j)] > M[j-]) print j Find-Solution(p(j)) else Find-Solution(j-) # of recursive calls n $ O(n) 5
Knapsack Problem Dynamic Programming: False Start Knapsack problem iven n objects and a "knapsack" Item i weighs w i > kilograms and has value v i > Knapsack has capacity of W kilograms oal: choose subset of items to fill knapsack so as to maximize total value Ex: { 3, 4 has value 4 Item Value Weight W = 2 2 3 8 5 4 22 5 28 Def OPT(i) = max profit subset of items,, i ase : OPT does not select item i OPT selects best of {, 2,, i- ase 2: OPT selects item i accepting item i does not immediately imply that we will have to reject other items without knowing what other items were selected before i, we don't even know if we have enough room for i onclusion Need more sub-problems reedy: repeatedly add item with maximum ratio v i / w i Ex: { 5, 2, achieves only value = 35 $ greedy not optimal 8 Dynamic Programming: dding a New Variable Knapsack Problem: Bottom-p Def OPT(i, w) = max profit subset of items,, i with weight limit w Knapsack Fill up an n-by-w array ase : OPT does not select item i OPT selects best of {, 2,, i- using weight limit w ase 2: OPT selects item i new weight limit = w w i OPT selects best of {, 2,, i using this new weight limit # if i = % OPT(i, w) = $ OPT(i ", w) if w i > w % & max{ OPT(i ", w), v i OPT(i ", w " w i ) otherwise Input: n, w,,w N, v,,v N for w = to W M[, w] = for i = to n M[i, ] = for i = to n for w = to W if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i M[i-, w-w i ] return M[n, W] 9 2
Knapsack lgorithm Knapsack Problem: Running Time W Running time %(n W) Not polynomial in input size 2 3 4 5 8 9 "Pseudo-polynomial" & Decision version of Knapsack is NP-complete { n {, 2 {, 2, 3 8 9 24 25 25 25 25 {, 2, 3, 4 8 22 24 28 29 29 4 {, 2, 3, 4, 5 8 22 28 29 34 34 4 Item Value Weight OPT: { 4, 3 value = 22 8 = 4 W = 2 3 2 8 5 4 22 5 28 2 22 Finding Optimal Substructure solution to the problem makes a choice Leaves one or more subproblems to be solved RN Secondary Structure For example, choose whether to use an item Figure out which subproblems to use if you knew the choice that led to an optimal solution Try all possible choices and choose the one that provides the optimal solution Show that each subproblem in an optimal solution is itself optimal Optimal substructure How many subproblems are needed in an optimal solution to the original problem For the examples so far, this has been one How many choices must be considered 23
RN Secondary Structure RN Secondary Structure RN String B = b b 2 b n over alphabet {,,, Secondary structure RN is single-stranded so it tends to loop back and form base pairs with itself This structure is essential for understanding behavior of molecule Ex: Secondary structure set of pairs S = { (b i, b j ) that satisfy: [Watson-rick] S is a matching and each pair in S is a Watson- rick complement: -, -, -, or - [No sharp turns] The ends of each pair are separated by at least 4 intervening bases If (b i, b j ) ' S, then i < j - 4 [Non-crossing] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l Free energy sual hypothesis is that an RN molecule will form the secondary structure with the optimum total free energy approximate by number of base pairs oal iven an RN molecule B = b b 2 b n, find a secondary structure S that maximizes the number of base pairs complementary base pairs: -, - 25 2 RN Secondary Structure: Examples RN Secondary Structure: Subproblems Examples First attempt OPT(j) = maximum number of base pairs in a secondary structure of the substring b b 2 b j match b t and b n base pair t n Difficulty Results in two sub-problems 4 Finding secondary structure in: b b 2 b t- Finding secondary structure in: b t b t2 b n- OPT(t-) need more sub-problems ok sharp turn crossing 2 28
Dynamic Programming Over Intervals Bottom p Dynamic Programming Over Intervals Notation OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i b i b j Q What order to solve the sub-problems? Do shortest intervals first ase If i ( j - 4 OPT(i, j) = by no-sharp turns condition ase 2 Base b j is not involved in a pair OPT(i, j) = OPT(i, j-) RN(b,,b n ) { for k = 5,,, n- for i =, 2,, n-k j = i k ompute M[i, j] i 4 3 2 ase 3 Base b j pairs with b t for some i t < j - 4 return M[, n] using recurrence 5 8 j non-crossing constraint decouples resulting sub-problems OPT(i, j) = max t { OPT(i, t-) OPT(t, j-) take max over t such that i t < j-4 and b t and b j are Watson-rick complements Running time O(n 3 ) Remark Same core idea in KY algorithm to parse context-free grammars 29 3 Dynamic Programming Summary Recipe haracterize structure of problem Sequence lignment Recursively define value of optimal solution ompute value of optimal solution onstruct optimal solution from computed information Dynamic programming techniques Binary choice: weighted interval scheduling dding a new variable: knapsack Dynamic programming over intervals: RN secondary structure Top-down vs bottom-up: different people have different intuitions 3
String Similarity Edit Distance How similar are two strings? ocurrance occurrence o c u r r a n c e - o c c u r r e n c e mismatches, gap o c - u r r a n c e pplications Basis for nix diff Speech recognition omputational biology Edit distance [Levenshtein 9, Needleman-Wunsch 9, Smith-Waterman 98] ap penalty ); mismatch penalty * pq ost = sum of gap and mismatch penalties o c c u r r e n c e mismatch, gap T T T - T T T o c - u r r - a n c e T T T T - T T o c c u r r e - n c e mismatches, 3 gaps * T * T * 2* 2) * 33 34 Sequence lignment Sequence lignment: Problem Structure oal: iven two strings X = x x m and Y = y y n find alignment of minimum cost Def n alignment M is a set of ordered pairs x i -y j such that each item occurs in at most one pair and no crossings Def The pair x i -y j and x i' -y j' cross if i < i', but j > j' cost( M ) = $ " xi y j $ % $ % (x i, y j ) # M i : x 42 43 i unmatched j : y j unmatched 444 42 444443 mismatch Ex: T vs TT Sol: M = -y, -, x 4 -, x 5 -, x -y gap x x 4 x 5 x T - - T T Def OPT(i, j) = min cost of aligning strings x x i and y y j ase : OPT matches x i -y j pay mismatch for x i -y j min cost of aligning two strings x x i- and y y j- ase 2a: OPT leaves x i unmatched pay gap for x i and min cost of aligning x x i- and y y j ase 2b: OPT leaves y j unmatched pay gap for y j and min cost of aligning x x i and y y j- " j& if i = $ " ' xi y j OPT(i (, j () $ $ OPT(i, j) = # min # & OPT(i (, j) otherwise $ $ & OPT(i, j () % % $ i& if j = y y 35 3
Sequence lignment: lgorithm Sequence-lignment(m, n, x x m, y y n, ), *) { for i = to m M[, i] = i) for j = to n M[j, ] = j) Sequence lignment in Linear Space for i = to m for j = to n M[i, j] = min(*[x i, y j ] M[i-, j-], ) M[i-, j], ) M[i, j-]) return M[m, n] nalysis %(mn) time and space English words or sentences: m, n omputational biology: m = n =, billions ops OK, but B array? 3 Sequence lignment: Linear Space Sequence lignment: Linear Space Q an we avoid using quadratic space? Easy Optimal value in O(m n) space and O(mn) time ompute OPT(i, ) from OPT(i-, ) Edit distance graph Let f(i, j) be shortest path from (,) to (i, j) Observation: f(i, j) = OPT(i, j) No longer a simple way to recover alignment itself Theorem [Hirschberg, 95] Optimal alignment in O(m n) space and O(mn) time - y y lever combination of divide-and-conquer and dynamic programming Inspired by idea of Savitch from complexity theory x " xi y j ) ) 39 4
Sequence lignment: Linear Space Sequence lignment: Linear Space Edit distance graph Let f(i, j) be shortest path from (,) to (i, j) an compute f (, j) for any j in O(mn) time and O(m n) space Edit distance graph Let g(i, j) be shortest path from (i, j) to (m, n) an compute by reversing the edge orientations and inverting the roles of (, ) and (m, n) j y y y y - - x x ) " xi y j ) 4 42 Sequence lignment: Linear Space Sequence lignment: Linear Space Edit distance graph Let g(i, j) be shortest path from (i, j) to (m, n) an compute g(, j) for any j in O(mn) time and O(m n) space Observation The cost of the shortest path that uses (i, j) is f(i, j) g(i, j) j y y y y - - x x 43 44
Sequence lignment: Linear Space Sequence lignment: Linear Space Observation 2 let q be an index that minimizes f(q, n/2) g(q, n/2) Then, the shortest path from (, ) to (m, n) uses (q, n/2) Divide: find index q that minimizes f(q, n/2) g(q, n/2) using DP lign x q and y n/2 onquer: recursively compute optimal alignment in each piece n / 2 n / 2 - y y - y y x q x q 45 4 Sequence lignment: Running Time nalysis Sequence lignment in Linear Space Theorem Let T(m, n) = max running time of algorithm on strings of length at most m and n T(m, n) = %(mn) T(m, n) = T(q, n/2) T(m-q, n/2) %(mn) T(m, n) = T(q, n/2) T(m-q, n/2) cmn Note that (q*n/2) (m-q)*n/2 = (qm-q)n/2 = mq/2 Level ost cmn nother way to look at the solution Quadratic: OPT(i, j) = min cost of aligning strings x x i and y y j The first part of x matches with the first part of y Linear: OPT(i) = min cost of aligning strings x x i and y y n/2 and x i x i2 x m and y n/2 y n/22 y n The first half of y matches with some initial part of x and the second half of y matches with the rest of x Linear space algorithm due to Hirschberg, 95 2 cmn/2 i cmn/2 i Total time is %(mn) 4 48