Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

Similar documents
Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

CSE 202 Dynamic Programming II

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Copyright 2000, Kevin Wayne 1

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Dynamic Programming 1

CS 580: Algorithm Design and Analysis

6. DYNAMIC PROGRAMMING I

Chapter 6. Dynamic Programming. CS 350: Winter 2018

Copyright 2000, Kevin Wayne 1

6. DYNAMIC PROGRAMMING I

CS 580: Algorithm Design and Analysis

6. DYNAMIC PROGRAMMING I

Dynamic Programming. Cormen et. al. IV 15

Dynamic Programming: Interval Scheduling and Knapsack

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure

CSE 421 Dynamic Programming

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

6.6 Sequence Alignment

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

6. DYNAMIC PROGRAMMING II

Lecture 2: Divide and conquer and Dynamic programming

1.1 First Problem: Stable Matching. Six medical students and four hospitals. Student Preferences. s6 h3 h1 h4 h2. Stable Matching Problem

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Dynamic Programming( Weighted Interval Scheduling)

CS Lunch. Dynamic Programming Recipe. 2 Midterm 2! Slides17 - Segmented Least Squares.key - November 16, 2016

RNA Secondary Structure. CSE 417 W.L. Ruzzo

Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18

Algorithm Design and Analysis

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Algoritmiek, bijeenkomst 3

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Matching Residents to Hospitals

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Lecture 2: Pairwise Alignment. CG Ron Shamir

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

Dynamic programming. Curs 2015

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Dynamic programming. Curs 2017

Design and Analysis of Algorithms

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

Algorithm Design and Analysis

More Dynamic Programming

Greedy Algorithms My T. UF

More Dynamic Programming

Algorithm Design and Analysis

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

CS 580: Algorithm Design and Analysis

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

Dynamic Programming. Prof. S.J. Soni

DAA Unit- II Greedy and Dynamic Programming. By Mrs. B.A. Khivsara Asst. Professor Department of Computer Engineering SNJB s KBJ COE, Chandwad

Lecture 9. Greedy Algorithm

Activity selection. Goal: Select the largest possible set of nonoverlapping (mutually compatible) activities.

Dynamic Programming. Credits: Many of these slides were originally authored by Jeff Edmonds, York University. Thanks Jeff!

13 Dynamic Programming (3) Optimal Binary Search Trees Subset Sums & Knapsacks

1 Assembly Line Scheduling in Manufacturing Sector

Knapsack and Scheduling Problems. The Greedy Method

CSE 202 Homework 4 Matthias Springer, A

CS473 - Algorithms I

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CSE 421. Dynamic Programming Shortest Paths with Negative Weights Yin Tat Lee

Partha Sarathi Mandal

Reductions. Reduction. Linear Time Reduction: Examples. Linear Time Reductions

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

CS 6901 (Applied Algorithms) Lecture 3

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

Pair Hidden Markov Models

CS 6783 (Applied Algorithms) Lecture 3

Copyright 2000, Kevin Wayne 1

CSE 421 Greedy Algorithms / Interval Scheduling

Algorithm Design and Analysis

2. ALGORITHM ANALYSIS

Algorithms: COMP3121/3821/9101/9801

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

This means that we can assume each list ) is

CS781 Lecture 3 January 27, 2011

2.1 Computational Tractability. Chapter 2. Basics of Algorithm Analysis. Computational Tractability. Polynomial-Time

General Methods for Algorithm Design

COMP 355 Advanced Algorithms

Matrix Multiplication. Data Structures and Algorithms Andrei Bulatov

CS 580: Algorithm Design and Analysis

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

CS Analysis of Recursive Algorithms and Brute Force

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Chapter 8 Dynamic Programming

CS483 Design and Analysis of Algorithms

COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background

Algoritmi e strutture di dati 2

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CSE 417. Chapter 4: Greedy Algorithms. Many Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Algorithms: Lecture 12. Chalmers University of Technology

Copyright 2000, Kevin Wayne 1

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Transcription:

lgorithmic Paradigms hapter Dynamic Programming reedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original problem. Dynamic programming. Break up a problem into a series of overlapping sub-problems, and build up solutions to larger and larger sub-problems. Slides by Kevin Wayne. opyright Pearson-ddison Wesley. ll rights reserved. The original and official versions are at http://www.cs.princeton.edu/~wayne/ Dynamic Programming pplications reas. Bioinformatics. ontrol theory. Information theory. Operations research. omputer science: theory, graphics, I, compilers, systems,. Weighted Interval Scheduling Some famous dynamic programming algorithms. nix diff for comparing two files. Viterbi for hidden Markov models. Smith-Waterman for genetic sequence alignment. Bellman-Ford for shortest path routing in networks. ocke-kasami-younger for parsing context free grammars.

Weighted Interval Scheduling nweighted Interval Scheduling Review Weighted interval scheduling problem. Job j starts at s j, finishes at f j, and has weight or value v j. Two jobs compatible if they don't overlap. oal: find maximum weight subset of mutually compatible jobs. Recall. reedy algorithm works if all weights are. onsider jobs in ascending order of finish time. dd job to subset if it is compatible with previously chosen jobs. b a c Observation. reedy algorithm can fail spectacularly if arbitrary weights are allowed. d e f g h 4 8 9 Time weight = 999 weight = a b 4 8 9 Time Weighted Interval Scheduling Dynamic Programming: Binary hoice Notation. Label jobs by finishing time: f f... f n. Def. p(j) = largest index i < j such that job i is compatible with j. Ex: p(8) =, p() =, p() =. Notation. OPT(j) = value of optimal solution to the problem consisting of job requests,,..., j. ase : OPT selects job j. collect profit v j can't use incompatible jobs { p(j) +, p(j) +,..., j - must include optimal solution to problem consisting of remaining compatible jobs,,..., p(j) optimal substructure 4 ase : OPT does not select job j. must include optimal solution to problem consisting of remaining compatible jobs,,..., j- 8 4 8 9 Time if j OPT(j) max v j OPT(p(j)), OPT(j ) otherwise 8

Weighted Interval Scheduling: Brute Force Weighted Interval Scheduling: Brute Force Brute force algorithm. Observation. Recursive algorithm fails spectacularly because of redundant sub-problems exponential algorithms. Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f... f n. ompute p(), p(),, p(n) Ex. Number of recursive calls for family of "layered" instances grows like Fibonacci sequence. ompute-opt(j) { if (j = ) return else return max(v j + ompute-opt(p(j)), ompute-opt(j-)) 4 p() =, p(j) = j- 4 9 Weighted Interval Scheduling: Memoization Weighted Interval Scheduling: Running Time Memoization. Store results of each sub-problem in a cache; lookup as needed. laim. Memoized version of algorithm takes O(n log n) time. Sort by finish time: O(n log n). omputing p() : O(n log n) via sorting by start time. Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f... f n. ompute p(), p(),, p(n) for j = to n M[j] = empty M[] = global array M-ompute-Opt(j) { if (M[j] is empty) M[j] = max(v j + M-ompute-Opt(p(j)), M-ompute-Opt(j-)) return M[j] M-ompute-Opt(j): each invocation takes O() time and either (i) returns an existing value M[j] (ii) fills in one new entry M[j] and makes two recursive calls Progress measure = # nonempty entries of M[]. initially =, throughout n. (ii) increases by at most n recursive calls. Overall running time of M-ompute-Opt(n) is O(n). Remark. O(n) if jobs are pre-sorted by start and finish times.

Weighted Interval Scheduling: Finding a Solution Weighted Interval Scheduling: Bottom-p Q. Dynamic programming algorithms computes optimal value. What if we want the solution itself?. Do some post-processing. Run M-ompute-Opt(n) Run Find-Solution(n) Find-Solution(j) { if (j = ) output nothing else if (v j + M[p(j)] > M[j-]) print j Find-Solution(p(j)) else Find-Solution(j-) Bottom-up dynamic programming. nwind recursion. Input: n, s,,s n, f,,f n, v,,v n Sort jobs by finish times so that f f... f n. ompute p(), p(),, p(n) Iterative-ompute-Opt { M[] = for j = to n M[j] = max(v j + M[p(j)], M[j-]) # of recursive calls n O(n). 4 Segmented Least Squares Segmented Least Squares Least squares. Foundational problem in statistic and numerical analysis. iven n points in the plane: (x, y ), (x, y ),..., (x n, y n ). Find a line y = ax + b that minimizes the sum of the squared error: n SSE (y i ax i b) i y x Solution. alculus min error is achieved when a n x i iy i ( ix i ) ( i y i ) i y i a i x i, b n x i ( x i ) n i i

Segmented Least Squares Segmented Least Squares Segmented least squares. Points lie roughly on a sequence of several line segments. iven n points in the plane (x, y ), (x, y ),..., (x n, y n ) with x < x <... < x n, find a sequence of lines that minimizes f(x). Q. What's a reasonable choice for f(x) to balance accuracy and parsimony? goodness of fit Segmented least squares. Points lie roughly on a sequence of several line segments. iven n points in the plane (x, y ), (x, y ),..., (x n, y n ) with x < x <... < x n, find a sequence of lines that minimizes: the sum of the sums of the squared errors E in each segment the number of lines L Tradeoff function: E + c L, for some constant c >. number of lines y y x x 8 Dynamic Programming: Multiway hoice Segmented Least Squares: lgorithm Notation. OPT(j) = minimum cost for points p, p i+,..., p j. e(i, j) = minimum sum of squares for points p i, p i+,..., p j. To compute OPT(j): Last segment uses points p i, p i+,..., p j for some i. ost = e(i, j) + c + OPT(i-). INPT: n, p,,p N, c Segmented-Least-Squares() { M[] = for j = to n for i = to j compute the least square error e ij for the segment p i,, p j if j OPT(j) min e(i,j) copt(i ) otherwise i j for j = to n M[j] = min i j (e ij + c + M[i-]) return M[n] Running time. O(n ). can be improved to O(n ) by pre-computing various statistics Bottleneck = computing e(i, j) for O(n ) pairs, O(n) per pair using previous formula. 9

Knapsack Problem Knapsack Problem Knapsack problem. iven n objects and a "knapsack." Item i weighs w > kilograms and has value v i i >. Knapsack has capacity of W kilograms. oal: fill knapsack so as to maximize total value. Ex: {, 4 has value 4. # value weight W = 8 4 8 reedy: repeatedly add item with maximum ratio v i / w i. Ex: {,, achieves only value = greedy not optimal. Dynamic Programming: False Start Dynamic Programming: dding a New Variable Def. OPT(i) = max profit subset of items,, i. ase : OPT does not select item i. OPT selects best of {,,, i- ase : OPT selects item i. accepting item i does not immediately imply that we will have to reject other items without knowing what other items were selected before i, we don't even know if we have enough room for i onclusion. Need more sub-problems! Def. OPT(i, w) = max profit subset of items,, i with weight limit w. ase : OPT does not select item i. OPT selects best of {,,, i- using weight limit w ase : OPT selects item i. new weight limit = w w i OPT selects best of {,,, i using this new weight limit if i OPT(i,w) OPT(i,w) if w i w max OPT(i,w), v i OPT(i,w w i ) otherwise 4

Knapsack Problem: Bottom-p Knapsack lgorithm Knapsack. Fill up an n-by-w array. W + 4 8 9 Input: n, W, w,,w N, v,,v N for w = to W M[, w] = n + { {, for i = to n for w = to W if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ] {,, {,,, 4 {,,, 4, 8 8 8 9 4 4 8 8 9 Item 9 9 4 4 4 4 Value Weight return M[n, W] OPT: { 4, value = + 8 = 4 W = 8 4 8 Knapsack Problem: Running Time Running time. (n W). Not polynomial in input size! "Pseudo-polynomial." Decision version of Knapsack is NP-complete. [hapter 8] RN Secondary Structure Knapsack approximation algorithm. There exists a poly-time algorithm that produces a feasible solution that has value within.% of optimum. [Section.8]

RN Secondary Structure RN Secondary Structure RN. String B = b b b n over alphabet {,,,. Secondary structure. RN is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. Ex: Secondary structure. set of pairs S = { (b i, b j ) that satisfy: [Watson-rick.] S is a matching and each pair in S is a Watson- rick complement: -, -, -, or -. [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (b i, b j ) S, then i < j - 4. [Non-crossing.] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l. Free energy. sual hypothesis is that an RN molecule will form the secondary structure with the optimum total free energy. approximate by number of base pairs oal. iven an RN molecule B = b b b n, find a secondary structure S that maximizes the number of base pairs. complementary base pairs: -, - 9 RN Secondary Structure: Examples RN Secondary Structure: Subproblems Examples. First attempt. OPT(j) = maximum number of base pairs in a secondary structure of the substring b b b j. match b t and b n base pair t n Difficulty. Results in two sub-problems. 4 Finding secondary structure in: b b b t-. Finding secondary structure in: b t+ b t+ b n-. OPT(t-) need more sub-problems ok sharp turn crossing

Dynamic Programming Over Intervals Bottom p Dynamic Programming Over Intervals Notation. OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i b i+ b j. Q. What order to solve the sub-problems?. Do shortest intervals first. ase. If i j - 4. OPT(i, j) = by no-sharp turns condition. ase. Base b j is not involved in a pair. OPT(i, j) = OPT(i, j-) RN(b,,b n ) { for k =,,, n- for i =,,, n-k j = i + k ompute M[i, j] i 4 ase. Base b j pairs with b t for some i t < j - 4. non-crossing constraint decouples resulting sub-problems return M[, n] using recurrence 8 9 j OPT(i, j) = + max t { OPT(i, t-) + OPT(t+, j-) take max over t such that i t < j-4 and b t and b j are Watson-rick complements Running time. O(n ). Remark. Same core idea in KY algorithm to parse context-free grammars. 4 Dynamic Programming Summary Recipe. haracterize structure of problem. Recursively define value of optimal solution. ompute value of optimal solution. onstruct optimal solution from computed information. Sequence lignment Dynamic programming techniques. Binary choice: weighted interval scheduling. Multi-way choice: segmented least squares. dding a new variable: knapsack. Viterbi algorithm for HMM also uses DP to optimize a maximum likelihood tradeoff between parsimony and accuracy Dynamic programming over intervals: RN secondary structure. KY parsing algorithm for context-free grammar has similar structure Top-down vs. bottom-up: different people have different intuitions.

String Similarity Edit Distance How similar are two strings? ocurrance occurrence o c u r r a n c e - o c c u r r e n c e pplications. Basis for nix diff. Speech recognition. omputational biology. o c - mismatches, gap u r r a n c e Edit distance. [Levenshtein 9, Needleman-Wunsch 9] ap penalty ; mismatch penalty pq. ost = sum of gap and mismatch penalties. o c c u r r e n c e mismatch, gap T T T - T T T o c - u r r - a n c e T T T T - T T o c c u r r e - n c e T + T + + + mismatches, gaps 8 Sequence lignment Sequence lignment: Problem Structure oal: iven two strings X = x x... x m and Y = y y... y n find alignment of minimum cost. Def. n alignment M is a set of ordered pairs x i -y j such that each item occurs in at most one pair and no crossings. Def. The pair x i -y j and x i' -y j' cross if i < i', but j > j'. Def. OPT(i, j) = min cost of aligning strings x x... x i and y y... y j. ase : OPT matches x i -y j. pay mismatch for x i -y j + min cost of aligning two strings x x... x i- and y y... y j- ase a: OPT leaves x i unmatched. pay gap for x i and min cost of aligning x x... x i- and y y... y j ase b: OPT leaves y j unmatched. pay gap for y j and min cost of aligning x x... x i and y y... y j- cost(m ) xi y j (x i,y j ) M i :x i unmatched j :y j unmatched mismatch gap Ex: T vs. TT. Sol: M = x -y, x -y, x 4 -y, x -y 4, x -y. x x x x 4 x x T - - T T j if i xi y j OPT(i, j ) OPT(i, j) min OPT(i, j) otherwise OPT(i, j ) i if j y y y y 4 y y 9 4

Sequence lignment: lgorithm Sequence-lignment(m, n, x x...x m, y y...y n,, ) { for i = to m M[i, ] = i for j = to n M[, j] = j Sequence lignment in Linear Space for i = to m for j = to n M[i, j] = min([x i, y j ] + M[i-, j-], + M[i-, j], + M[i, j-]) return M[m, n] nalysis. (mn) time and space. English words or sentences: m, n. omputational biology: m = n =,. billions ops OK, but B array? 4 Sequence lignment: Linear Space Sequence lignment: Linear Space Q. an we avoid using quadratic space? Easy. Optimal value in O(m + n) space and O(mn) time. - No longer a simple way to recover alignment itself. Edit distance graph. Let f(i, j) be shortest path from (,) to (i, j). Observation: f(i, j) = OPT(i, j). Theorem. [Hirschberg 9] Optimal alignment in O(m + n) space and O(mn) time. - y y y y 4 y y lever combination of divide-and-conquer and dynamic programming. Inspired by idea of Savitch from complexity theory. x xi y j x x 4 44

Sequence lignment: Linear Space Sequence lignment: Linear Space Edit distance graph. Let f(i, j) be shortest path from (,) to (i, j). Edit distance graph. Let g(i, j) be shortest path from (i, j) to (m, n). an compute by reversing the edge orientations and inverting the roles of (, ) and (m, n) j y y y y 4 y y y y y y 4 y y - - x x xi y j x x x x 4 4 Sequence lignment: Linear Space Sequence lignment: Linear Space Edit distance graph. Let g(i, j) be shortest path from (i, j) to (m, n). Observation. The cost of the shortest path that uses (i, j) is f(i, j) + g(i, j). j y y y y 4 y y y y y y 4 y y - - x x x x x x 4 48

Sequence lignment: Linear Space Sequence lignment: Linear Space Observation. let q be an index that minimizes f(q, n/) + g(q, n/). Then, the shortest path from (, ) to (m, n) uses (q, n/). Divide: find index q that minimizes f(q, n/) + g(q, n/) using DP. lign x q and y n/. onquer: recursively compute optimal alignment in each piece. n / n / y y y y 4 y y y y y y 4 y y - - x q x q x x x x 49 Sequence lignment: Running Time nalysis Warmup Sequence lignment: Running Time nalysis Theorem. Let T(m, n) = max running time of algorithm on strings of length at most m and n. T(m, n) = O(mn log n). Theorem. Let T(m, n) = max running time of algorithm on strings of length m and n. T(m, n) = O(mn). T(m,n) T(m,n/) O(mn) T(m,n) O(mn logn) Remark. nalysis is not tight because two sub-problems are of size (q, n/) and (m - q, n/). In next slide, we save log n factor. Pf. (by induction on n) O(mn) time to compute f( T(q, n/) + T(m - q, n/) time for two recursive calls. hoose constant c so that: T(m, ) cm T(, n) cn T(m, n) cmn T(q, n/) T(m q, n/) Base cases: m = or n =. Inductive hypothesis: T(m, n) cmn. T ( mn, ) T ( q, n / ) T ( m q, n / ) cmn cqn / c( m q) n / cmn cqn cmn cqn cmn cmn