CSE Weighted Interval Scheduling, Knapsack, RNA Secondary Structure Shayan Oveis haran
Weighted Interval Scheduling
Interval Scheduling Job j starts at s(j) and finishes at f j and has weight w j Two jobs compatible if they don t overlap. oal: find maximum weight subset of mutually compatible jobs. a b c d e f g 9 h Time
Weighted Job Scheduling by Induction Suppose,, n are all jobs. This Let idea us works use induction: for any Optimization problem, e.g., vertexcover, we can independent compute the set, optimum etc. job scheduling IH (strong ind): Suppose for < n jobs. For NP-hard problems there is no ordering to reduce # subproblems IS: oal: For any n jobs we can compute OPT. Case : Job n is not in OPT. -- Then, just return OPT of,, n. Take best of the two Case : Job n is in OPT. -- Then, delete all jobs not compatible with n and recurse. Q: Are we done? A: No, How many subproblems are there? Potentially n all possible subsets of jobs. n n- n- n- n- n- n-
Sorting to Reduce Subproblems Sorting Idea: Label jobs by finishing time f f(n) IS: For jobs,,n we want to compute OPT Case : Suppose OPT has job n. So, all jobs i that are not compatible with n are not OPT Let p(n) = largest index i < n such that job i is compatible with n. Then, we just need to find OPT of,, p(n) P(n) P(n)+ n- n- n
Sorting to reduce Subproblems Sorting Idea: Label jobs by finishing time f f(n) IS: For jobs,,n we want to compute OPT Case : Suppose OPT has job n. So, all jobs i that are not compatible with n are not OPT Let p(n) = largest index i < n such that job i is compatible with n. Then, we just need to find OPT of,, p(n) Case : OPT does not select job n. Then, OPT is just the OPT of,, n Take best of the two Q: Have we made any progress (still reducing to two subproblems)? A: Yes! This time every subproblem is of the form,, i for some i So, at most n possible subproblems.
Weighted Job Scheduling by Induction Sorting Idea: Label jobs by finishing time f f(n) Def OPT(j) denote the OPT solution of,, j The most important part of a correct DP; It fixes IH To solve OPT(j): Case : OPT(j) has job j. So, all jobs i that are not compatible with j are not OPT(j) Let p(j) = largest index i < j such that job i is compatible with j. So OPT j = OPT p j j. Case : OPT(j) does not select job j. Then, OPT j = OPT(j ) if j = OPT j = ቐ max w j + OPT p j, OPT j o. w.
Algorithm Input: n, s,, s(n) and f,, f(n) and w,, w n. Sort jobs by finish times so that f f f(n). Compute p(), p(),, p(n) Compute-Opt(j) { if (j = ) return else return max(w j } + Compute-Opt(p(j)), Compute-Opt(j-))
Recursive Algorithm Fails Even though we have only n subproblems, we do not store the solution to the subproblems So, we may re-solve the same problem many many times. Ex. Number of recursive calls for family of "layered" instances grows like Fibonacci sequence p() =, p(j) = j- 9
Algorithm with Memorization Memoization. Compute and Store the solution of each sub-problem in a cache the first time that you face it. lookup as needed. Input: n, s,, s(n) and f,, f(n) and w,, w n. Sort jobs by finish times so that f f f(n). Compute p(), p(),, p(n) for j = to n M[j] = empty M[] = M-Compute-Opt(j) { if (M[j] is empty) M[j] = max(w j + M-Compute-Opt(p(j)), M-Compute-Opt(j-)) return M[j] }
Bottom up Dynamic Programming You can also avoid recusion recursion may be easier conceptually when you use induction Input: n, s,, s(n) and f,, f(n) and w,, w n. Sort jobs by finish times so that f f f(n). Compute p(), p(),, p(n) Iterative-Compute-Opt { M[] = for j = to n M[j] = max(w j + M[p(j)], M[j-]) } Output M[n] Claim: M[j] is value of OPT(j) Timing: Easy. Main loop is O(n); sorting is O(n log n)
Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9 OPT(j) p(j) w j j
Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9 OPT(j) p(j) w j j
Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9 OPT(j) p(j) w j j
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
OPT(j) p(j) w j j Example Label jobs by finishing time: f f n. p(j) = largest index i < j such that job i is compatible with j. Time 9
Knapsack Problem
Knapsack Problem iven n objects and a "knapsack. Item i weighs w i > kilograms and has value v i >. Knapsack has capacity of W kilograms. oal: fill knapsack so as to maximize total value. Item Value Weight Ex: OPT is {, } with value. W = reedy: repeatedly add item with maximum ratio v i / w i. Ex: {,, } achieves only value = greedy not optimal.
Dynamic Programming: First Attempt Let OPT(i)=Max value of subsets of items,, i of weight W. Case : OPT(i) does not select item i - In this caes OPT(i) = OPT(i ) Case : OPT(i) selects item i In this case, item i does not immediately imply we have to reject other items The problem does not reduce to OPT(i ) because we now want to pack as much value into box of weight W w i Conclusion: We need more subproblems, we need to strengthen IH.
Stronger DP (Strengthenning Hypothesis) Let OPT(i, w) = Max value of subsets of items,, i of weight w Case : OPT(i, w) selects item i In this case, OPT i, w = v i + OPT(i, w w i ) Case : OPT i, w does not select item i In this case, OPT i, w = OPT(i, w). Take best of the two Therefore, OPT i, w = ቐOPT i, w max(opt i, w, v i + OPT i, w w i If i = If w i > w o.w.,
DP for Knapsack Compute-OPT(i,w) if M[i,w] == empty if (i==) M[i,w]= recursive else if (w i > w) M[i,w]=Comp-OPT(i-,w) else M[i,w]= max {Comp-OPT(i-,w), v i + Comp-OPT(i-,w-w i )} return M[i, w] for w = to W M[, w] = for i = to n for w = to W if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Non-recursive return M[n, W]
DP for Knapsack W + 9 { } n + {, } {,, } {,,, } {,,,, } W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight
DP for Knapsack W + 9 { } n + {, } {,, } {,,, } {,,,, } W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight
DP for Knapsack W + 9 { } n + {, } {,, } {,,, } {,,,, } OPT: {, } value = + = W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight 9
DP for Knapsack W + 9 { } n + {, } {,, } 9 {,,, } {,,,, } OPT: {, } value = + = W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight
DP for Knapsack W + 9 { } n + {, } {,, } 9 {,,, } 9 {,,,, } OPT: {, } value = + = W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight
DP for Knapsack W + 9 { } n + {, } {,, } 9 {,,, } 9 9 {,,,, } 9 OPT: {, } value = + = W = if (w i > w) M[i, w] = M[i-, w] else M[i, w] = max {M[i-, w], v i + M[i-, w-w i ]} Item Value Weight
Knapsack Problem: Running Time Running time: Θ(n W) Not polynomial in input size! "Pseudo-polynomial. Decision version of Knapsack is NP-complete. Knapsack approximation algorithm: There exists a polynomial algorithm that produces a feasible solution that has value within.% of optimum in time Poly(n, log W).
DP Ideas so far You may have to define an ordering to decrease #subproblems You may have to strengthen DP, equivalently the induction, i.e., you have may have to carry more information to find the Optimum. This means that sometimes we may have to use two dimensional or three dimensional induction
RNA Secondary Structure
RNA Secondary Structure RNA: A String B = b b b n over alphabet { A, C,, U }. Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. U C A A A C A U A U U A A C A A C U A U C A U C C C Ex: UCAUUACAAUUAACAACUCUACCAA complementary base pairs: A-U, C-
RNA Secondary Structure (Formal) Secondary structure. A set of pairs S = { (b i, b j ) } that satisfy: [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick pair: A-U, U-A, C-, or -C. [No sharp turns.]: The ends of each pair are separated by at least intervening bases. If (b i, b j ) S, then i < j -. [Non-crossing.] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l. Free energy: Usual hypothesis is that an RNA molecule will maximize total free energy. approximate by number of base pairs oal: iven an RNA molecule B = b b b n, find a secondary structure S that maximizes the number of base pairs.
Secondary Structure (Examples) C U C U C C C U A U A U A U A U A U A base pair A U U C C A U A U C A U A U U C C A U ok sharp turn crossing
DP: First Attempt First attempt. Let OPT(n) = maximum number of base pairs in a secondary structure of the substring b b b n. Suppose b n is matched with b t in OPT n. What IH should we use? match b t and b n t n Diffuclty: This naturally reduces to two subproblems Finding secondary structure in b,, b t, i.e., OPT(t-) Finding secondary structure in b t+,, b n,??? 9
DP: Second Attempt Definition: OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i, b i+,, b j The most important part of a correct DP; It fixes IH Case : If j i. OPT(i, j) = by no-sharp turns condition. Case : Base b j is not involved in a pair. OPT(i, j) = OPT(i, j-) Case : Base b j pairs with b t for some i t < j non-crossing constraint decouples resulting sub-problems OPT(i, j) = + max t { OPT(i, t ) + OPT(t +, j ) }