More Dynamic Programming

Similar documents
More Dynamic Programming

Even More on Dynamic Programming

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

6.6 Sequence Alignment

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CSE 202 Dynamic Programming II

Deterministic Finite Automata (DFAs)

Dynamic Programming: Shortest Paths and DFA to Reg Exps

6. DYNAMIC PROGRAMMING II

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

Objec&ves. Review. Dynamic Programming. What is the knapsack problem? What is our solu&on? Ø Review Knapsack Ø Sequence Alignment 3/28/18

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

CS 580: Algorithm Design and Analysis

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

CS483 Design and Analysis of Algorithms

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

Dynamic programming. Curs 2017

Algorithms Exam TIN093 /DIT602

CSE 202 Homework 4 Matthias Springer, A

Chapter 6. Dynamic Programming. CS 350: Winter 2018

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

Lecture 7: Dynamic Programming I: Optimal BSTs

Deterministic Finite Automata (DFAs)

Data Structures in Java

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov

Dynamic Programming. Cormen et. al. IV 15

Lecture 2: Divide and conquer and Dynamic programming

Dynamic Programming. Reading: CLRS Chapter 15 & Section CSE 2331 Algorithms Steve Lai

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Deterministic Finite Automata (DFAs)

Lecture 2: Pairwise Alignment. CG Ron Shamir

Algorithm Design and Analysis

Midterm 2 for CS 170

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Lecture 1 : Data Compression and Entropy

CS173 Lecture B, November 3, 2015

INF4130: Dynamic Programming September 2, 2014 DRAFT version

Lecture 13. More dynamic programming! Longest Common Subsequences, Knapsack, and (if time) independent sets in trees.

CMPSCI 611 Advanced Algorithms Midterm Exam Fall 2015

Dynamic programming. Curs 2015

Partha Sarathi Mandal

CS Data Structures and Algorithm Analysis

Deterministic Finite Automata (DFAs)

Lecture 13. More dynamic programming! Longest Common Subsequences, Knapsack, and (if time) independent sets in trees.

Chapter 4: Computation tree logic

CS173 Running Time and Big-O. Tandy Warnow

Nearest Neighbor Search with Keywords

CS473 - Algorithms I

Dynamic Programming. Reading: CLRS Chapter 15 & Section CSE 6331: Algorithms Steve Lai

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

More on NP and Reductions

Computational Models Lecture 8 1

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

1 Review of Vertex Cover

Binary Decision Diagrams. Graphs. Boolean Functions

More Approximation Algorithms

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM

1 Closest Pair of Points on the Plane

Copyright 2000, Kevin Wayne 1

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number:

Computational Models Lecture 8 1

Turing Machine Recap

Computational Models Lecture 8 1

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Properties of Context-Free Languages

Define M to be a binary n by m matrix such that:

1 Primals and Duals: Zero Sum Games

Friday Four Square! Today at 4:15PM, Outside Gates

6. DYNAMIC PROGRAMMING I

Binary Search Trees. Motivation

Module 9: Tries and String Matching

Covering Linear Orders with Posets

Non-deterministic Finite Automata (NFAs)

Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm

Computational Complexity of Bayesian Networks

arxiv: v1 [cs.ds] 9 Apr 2018

Approximate Voronoi Diagrams

Dynamic Programming. Prof. S.J. Soni

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Implementing Approximate Regularities

Analysis and Design of Algorithms Dynamic Programming

CS5371 Theory of Computation. Lecture 18: Complexity III (Two Classes: P and NP)

A An Overview of Complexity Theory for the Algorithm Designer

General Methods for Algorithm Design

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Tree Decompositions and Tree-Width

Bio nformatics. Lecture 3. Saad Mneimneh

Chapter 5 Arrays and Strings 5.1 Arrays as abstract data types 5.2 Contiguous representations of arrays 5.3 Sparse arrays 5.4 Representations of

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Theory of Computation 1 Sets and Regular Expressions

Complexity Theory Part I

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

CSE 421 Dynamic Programming

Transcription:

Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48

What is the running time of the following? Consider computing f (x, y) by recursive function + memoization. f (x, y) = x+y 1 i =1 x f (x + y i, i 1), f (0, y) = y f (x, 0) = x. The resulting algorithm when computing f (n, n) would take: (A) O(n) (B) O(n log n) (C) O(n 2 ) (D) O(n 3 ) (E) The function is ill defined - it can not be computed. Sariel Har-Peled (UIUC) CS374 2 Fall 2017 2 / 48

Recipe for Dynamic Programming 1 Develop a recursive backtracking style algorithm A for given problem. 2 Identify structure of subproblems generated by A on an instance I of size n 1 Estimate number of different subproblems generated as a function of n. Is it polynomial or exponential in n? 2 If the number of problems is small (polynomial) then they typically have some clean structure. 3 Rewrite subproblems in a compact fashion. 4 Rewrite recursive algorithm in terms of notation for subproblems. 5 Convert to iterative algorithm by bottom up evaluation in an appropriate order. 6 Optimize further with data structures and/or additional ideas. Sariel Har-Peled (UIUC) CS374 3 Fall 2017 3 / 48

A variation Example Input A string w Σ and access to a language L Σ via function IsStringinL(string x) that decides whether x is in L, and non-negative integer k Goal Decide if w L k using IsStringinL(string x) as a black box sub-routine Suppose L is English and we have a procedure to check whether a string/word is in the English dictionary. Is the string isthisanenglishsentence in English 5? Is the string isthisanenglishsentence in English 4? Is asinineat in English 2? Is asinineat in English 4? Is zibzzzad in English 1? Sariel Har-Peled (UIUC) CS374 4 Fall 2017 4 / 48

Recursive Solution Output NO Sariel Har-Peled (UIUC) CS374 5 Fall 2017 5 / 48 When is w L k? k = 0: w L k iff w = ɛ k = 1: w L k iff w L k > 1: w L k if w = uv with u L and v L k 1 Assume w is stored in array A[1..n] IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES

Recursive Solution Output NO Sariel Har-Peled (UIUC) CS374 5 Fall 2017 5 / 48 When is w L k? k = 0: w L k iff w = ɛ k = 1: w L k iff w L k > 1: w L k if w = uv with u L and v L k 1 Assume w is stored in array A[1..n] IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES

Recursive Solution Output NO Sariel Har-Peled (UIUC) CS374 5 Fall 2017 5 / 48 When is w L k? k = 0: w L k iff w = ɛ k = 1: w L k iff w L k > 1: w L k if w = uv with u L and v L k 1 Assume w is stored in array A[1..n] IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES

Analysis IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES Output NO How many distinct sub-problems are generated by IsStringinLk(A[1..n], k)? O(nk) How much space? O(nk) pause Running time? O(n 2 k) Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 48

Analysis IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES Output NO How many distinct sub-problems are generated by IsStringinLk(A[1..n], k)? O(nk) How much space? O(nk) pause Running time? O(n 2 k) Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 48

Analysis IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES Output NO How many distinct sub-problems are generated by IsStringinLk(A[1..n], k)? O(nk) How much space? O(nk) pause Running time? O(n 2 k) Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 48

Analysis IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES Output NO How many distinct sub-problems are generated by IsStringinLk(A[1..n], k)? O(nk) How much space? O(nk) pause Running time? O(n 2 k) Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 48

Analysis IsStringinLk(A[1..n], k): If (k = 0) If (n = 0) Output YES Else Ouput NO If (k = 1) Output IsStringinL(A[1..n]) Else For (i = 1 to n 1) do If (IsStringinL(A[1..i ]) and IsStringinLk(A[i + 1..n], k 1)) Output YES Output NO How many distinct sub-problems are generated by IsStringinLk(A[1..n], k)? O(nk) How much space? O(nk) pause Running time? O(n 2 k) Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 48

Another variant Question: What if we want to check if w L i for some 0 i k? That is, is w k i =0 Li? Sariel Har-Peled (UIUC) CS374 7 Fall 2017 7 / 48

Exercise Definition A string is a palindrome if w = w R. Examples: I, RACECAR, MALAYALAM, DOOFFOOD Problem: Given a string w find the longest subsequence of w that is a palindrome. Example MAHDYNAMICPROGRAMZLETMESHOWYOUTHEM has MHYMRORMYHM as a palindromic subsequence Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 48

Exercise Definition A string is a palindrome if w = w R. Examples: I, RACECAR, MALAYALAM, DOOFFOOD Problem: Given a string w find the longest subsequence of w that is a palindrome. Example MAHDYNAMICPROGRAMZLETMESHOWYOUTHEM has MHYMRORMYHM as a palindromic subsequence Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 48

Exercise Assume w is stored in an array A[1..n] LPS(A[1..n]): length of longest palindromic subsequence of A. Recursive expression/code? Sariel Har-Peled (UIUC) CS374 9 Fall 2017 9 / 48

Part I Edit Distance and Sequence Alignment Sariel Har-Peled (UIUC) CS374 10 Fall 2017 10 / 48

Spell Checking Problem Given a string exponen that is not in the dictionary, how should a spell checker suggest a nearby string? What does nearness mean? Question: Given two strings x 1 x 2... x n and y 1 y 2... y m what is a distance between them? Edit Distance: minimum number of edits to transform x into y. Sariel Har-Peled (UIUC) CS374 11 Fall 2017 11 / 48

Spell Checking Problem Given a string exponen that is not in the dictionary, how should a spell checker suggest a nearby string? What does nearness mean? Question: Given two strings x 1 x 2... x n and y 1 y 2... y m what is a distance between them? Edit Distance: minimum number of edits to transform x into y. Sariel Har-Peled (UIUC) CS374 11 Fall 2017 11 / 48

Spell Checking Problem Given a string exponen that is not in the dictionary, how should a spell checker suggest a nearby string? What does nearness mean? Question: Given two strings x 1 x 2... x n and y 1 y 2... y m what is a distance between them? Edit Distance: minimum number of edits to transform x into y. Sariel Har-Peled (UIUC) CS374 11 Fall 2017 11 / 48

Edit Distance Definition Edit distance between two words X and Y is the number of letter insertions, letter deletions and letter substitutions required to obtain Y from X. Example The edit distance between FOOD and MONEY is at most 4: FOOD MOOD MONOD MONED MONEY Sariel Har-Peled (UIUC) CS374 12 Fall 2017 12 / 48

Edit Distance: Alternate View Alignment Place words one on top of the other, with gaps in the first word indicating insertions, and gaps in the second word indicating deletions. F O O D M O N E Y Formally, an alignment is a set M of pairs (i, j) such that each index appears at most once, and there is no crossing : i < i and i is matched to j implies i is matched to j > j. In the above example, this is M = {(1, 1), (2, 2), (3, 3), (4, 5)}. Cost of an alignment is the number of mismatched columns plus number of unmatched indices in both strings. Sariel Har-Peled (UIUC) CS374 13 Fall 2017 13 / 48

Edit Distance: Alternate View Alignment Place words one on top of the other, with gaps in the first word indicating insertions, and gaps in the second word indicating deletions. F O O D M O N E Y Formally, an alignment is a set M of pairs (i, j) such that each index appears at most once, and there is no crossing : i < i and i is matched to j implies i is matched to j > j. In the above example, this is M = {(1, 1), (2, 2), (3, 3), (4, 5)}. Cost of an alignment is the number of mismatched columns plus number of unmatched indices in both strings. Sariel Har-Peled (UIUC) CS374 13 Fall 2017 13 / 48

Edit Distance: Alternate View Alignment Place words one on top of the other, with gaps in the first word indicating insertions, and gaps in the second word indicating deletions. F O O D M O N E Y Formally, an alignment is a set M of pairs (i, j) such that each index appears at most once, and there is no crossing : i < i and i is matched to j implies i is matched to j > j. In the above example, this is M = {(1, 1), (2, 2), (3, 3), (4, 5)}. Cost of an alignment is the number of mismatched columns plus number of unmatched indices in both strings. Sariel Har-Peled (UIUC) CS374 13 Fall 2017 13 / 48

Edit Distance Problem Problem Given two words, find the edit distance between them, i.e., an alignment of smallest cost. Sariel Har-Peled (UIUC) CS374 14 Fall 2017 14 / 48

Applications 1 Spell-checkers and Dictionaries 2 Unix diff 3 DNA sequence alignment... but, we need a new metric Sariel Har-Peled (UIUC) CS374 15 Fall 2017 15 / 48

Similarity Metric Definition For two strings X and Y, the cost of alignment M is 1 [Gap penalty] For each gap in the alignment, we incur a cost δ. 2 [Mismatch cost] For each pair p and q that have been matched in M, we incur cost α pq ; typically α pp = 0. Edit distance is special case when δ = α pq = 1. Sariel Har-Peled (UIUC) CS374 16 Fall 2017 16 / 48

Similarity Metric Definition For two strings X and Y, the cost of alignment M is 1 [Gap penalty] For each gap in the alignment, we incur a cost δ. 2 [Mismatch cost] For each pair p and q that have been matched in M, we incur cost α pq ; typically α pp = 0. Edit distance is special case when δ = α pq = 1. Sariel Har-Peled (UIUC) CS374 16 Fall 2017 16 / 48

An Example Example o c u r r a n c e o c c u r r e n c e Cost = δ + α ae Alternative: o c u r r a n c e o c c u r r e n c e Cost = 3δ Or a really stupid solution (delete string, insert other string): o c u r r a n c e o c c u r r e n c e Cost = 19δ. Sariel Har-Peled (UIUC) CS374 17 Fall 2017 17 / 48

What is the edit distance between... What is the minimum edit distance for the following two strings, if insertion/deletion/change of a single character cost 1 unit? 374 473 (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 Sariel Har-Peled (UIUC) CS374 18 Fall 2017 18 / 48

What is the edit distance between... What is the minimum edit distance for the following two strings, if insertion/deletion/change of a single character cost 1 unit? 373 473 (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 Sariel Har-Peled (UIUC) CS374 19 Fall 2017 19 / 48

What is the edit distance between... What is the minimum edit distance for the following two strings, if insertion/deletion/change of a single character cost 1 unit? 37 473 (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 Sariel Har-Peled (UIUC) CS374 20 Fall 2017 20 / 48

Sequence Alignment Input Given two words X and Y, and gap penalty δ and mismatch costs α pq Goal Find alignment of minimum cost Sariel Har-Peled (UIUC) CS374 21 Fall 2017 21 / 48

Edit distance Basic observation Let X = αx and Y = βy α, β: strings. x and y single characters. Think about optimal edit distance between X and Y as alignment, and consider last column of alignment of the two strings: α x α x αx or or β y βy β y Observation Prefixes must have optimal alignment! Sariel Har-Peled (UIUC) CS374 22 Fall 2017 22 / 48

Problem Structure Observation Let X = x 1 x 2 x m and Y = y 1 y 2 y n. If (m, n) are not matched then either the mth position of X remains unmatched or the nth position of Y remains unmatched. 1 Case x m and y n are matched. 1 Pay mismatch cost α xmy n plus cost of aligning strings x 1 x m 1 and y 1 y n 1 2 Case x m is unmatched. 1 Pay gap penalty plus cost of aligning x 1 x m 1 and y 1 y n 3 Case y n is unmatched. 1 Pay gap penalty plus cost of aligning x 1 x m and y 1 y n 1 Sariel Har-Peled (UIUC) CS374 23 Fall 2017 23 / 48

Subproblems and Recurrence Optimal Costs Let Opt(i, j) be optimal cost of aligning x 1 x i and y 1 y j. Then α xi y j + Opt(i 1, j 1), Opt(i, j) = min δ + Opt(i 1, j), δ + Opt(i, j 1) Base Cases: Opt(i, 0) = δ i and Opt(0, j) = δ j Sariel Har-Peled (UIUC) CS374 24 Fall 2017 24 / 48

Subproblems and Recurrence Optimal Costs Let Opt(i, j) be optimal cost of aligning x 1 x i and y 1 y j. Then α xi y j + Opt(i 1, j 1), Opt(i, j) = min δ + Opt(i 1, j), δ + Opt(i, j 1) Base Cases: Opt(i, 0) = δ i and Opt(0, j) = δ j Sariel Har-Peled (UIUC) CS374 24 Fall 2017 24 / 48

Recursive Algorithm Assume X is stored in array A[1..m] and Y is stored in B[1..n] Array COST stores cost of matching two chars. Thus COST [a, b] give the cost of matching character a to character b. EDIST (A[1..m], B[1..n]) If (m = 0) return nδ If (n = 0) return mδ m 1 = δ + EDIST (A[1..(m 1)], B[1..n]) m 2 = δ + EDIST (A[1..m], B[1..(n 1)])) m 3 = COST [A[m], B[n]] + EDIST (A[1..(m 1)], B[1..(n 1)]) return min(m 1, m 2, m 3 ) Sariel Har-Peled (UIUC) CS374 25 Fall 2017 25 / 48

Example DEED and DREAD Sariel Har-Peled (UIUC) CS374 26 Fall 2017 26 / 48

Memoizing the Recursive Algorithm int M[0..m][0..n] Initialize all entries of M[i ][j] to return EDIST (A[1..m], B[1..n]) EDIST (A[1..m], B[1..n]) If (M[i ][j] < ) return M[i ][j] (* return stored value *) If (m = 0) M[i ][j] = nδ ElseIf (n = 0) M[i ][j] = mδ Else m 1 = δ + EDIST (A[1..(m 1)], B[1..n]) m 2 = δ + EDIST (A[1..m], B[1..(n 1)])) m 3 = COST [A[m], B[n]] + EDIST (A[1..(m 1)], B[1..(n 1)]) M[i ][j] = min(m 1, m 2, m 3 ) return M[i ][j] Sariel Har-Peled (UIUC) CS374 27 Fall 2017 27 / 48

Removing Recursion to obtain Iterative Algorithm Analysis EDIST (A[1..m], B[1..n]) int M[0..m][0..n] for i = 1 to m do M[i, 0] = i δ for j = 1 to n do M[0, j] = jδ for i = 1 to m do for j = 1 to n do α xi y j + M[i 1][j 1], M[i ][j] = min δ + M[i 1][j], δ + M[i ][j 1] 1 Running time is O(mn). Sariel Har-Peled (UIUC) CS374 28 Fall 2017 28 / 48

Removing Recursion to obtain Iterative Algorithm Analysis EDIST (A[1..m], B[1..n]) int M[0..m][0..n] for i = 1 to m do M[i, 0] = i δ for j = 1 to n do M[0, j] = jδ for i = 1 to m do for j = 1 to n do α xi y j + M[i 1][j 1], M[i ][j] = min δ + M[i 1][j], δ + M[i ][j 1] 1 Running time is O(mn). Sariel Har-Peled (UIUC) CS374 28 Fall 2017 28 / 48

Removing Recursion to obtain Iterative Algorithm Analysis EDIST (A[1..m], B[1..n]) int M[0..m][0..n] for i = 1 to m do M[i, 0] = i δ for j = 1 to n do M[0, j] = jδ for i = 1 to m do for j = 1 to n do α xi y j + M[i 1][j 1], M[i ][j] = min δ + M[i 1][j], δ + M[i ][j 1] 1 Running time is O(mn). 2 Space used is O(mn). Sariel Har-Peled (UIUC) CS374 28 Fall 2017 28 / 48

Matrix and DAG of Computation 0, 0 αx ix j δ... δ i, j......... m, n Figure: Iterative algorithm in previous slide computes values in row order. Sariel Har-Peled (UIUC) CS374 29 Fall 2017 29 / 48

Example DEED and DREAD Sariel Har-Peled (UIUC) CS374 30 Fall 2017 30 / 48

Sequence Alignment in Practice 1 Typically the DNA sequences that are aligned are about 10 5 letters long! 2 So about 10 10 operations and 10 10 bytes needed 3 The killer is the 10GB storage 4 Can we reduce space requirements? Sariel Har-Peled (UIUC) CS374 31 Fall 2017 31 / 48

Optimizing Space 1 Recall α xi y j + M(i 1, j 1), M(i, j) = min δ + M(i 1, j), δ + M(i, j 1) 2 Entries in jth column only depend on (j 1)st column and earlier entries in jth column 3 Only store the current column and the previous column reusing space; N(i, 0) stores M(i, j 1) and N(i, 1) stores M(i, j) Sariel Har-Peled (UIUC) CS374 32 Fall 2017 32 / 48

Computing in column order to save space 0, 0 αx ix j δ... δ i, j......... m, n Figure: M(i, j) only depends on previous column values. Keep only two columns and compute in column order. Sariel Har-Peled (UIUC) CS374 33 Fall 2017 33 / 48

Space Efficient Algorithm Analysis for all i do N[i, 0] = i δ for j = 1 to n do N[0, 1] = jδ (* corresponds to M(0, j) *) for i = 1 to m do α xi y j + N[i 1, 0] N[i, 1] = min δ + N[i 1, 1] δ + N[i, 0] for i = 1 to m do Copy N[i, 0] = N[i, 1] Running time is O(mn) and space used is O(2m) = O(m) Sariel Har-Peled (UIUC) CS374 34 Fall 2017 34 / 48

Analyzing Space Efficiency 1 From the m n matrix M we can construct the actual alignment (exercise) 2 Matrix N computes cost of optimal alignment but no way to construct the actual alignment 3 Space efficient computation of alignment? More complicated algorithm see notes and Kleinberg-Tardos book. Sariel Har-Peled (UIUC) CS374 35 Fall 2017 35 / 48

Part II Longest Common Subsequence Problem Sariel Har-Peled (UIUC) CS374 36 Fall 2017 36 / 48

LCS Problem Definition LCS between two strings X and Y is the length of longest common subsequence between X and Y. Example LCS between ABAZDC and BACBAD is 4 via ABAD Derive a dynamic programming algorithm for the problem. Sariel Har-Peled (UIUC) CS374 37 Fall 2017 37 / 48

LCS Problem Definition LCS between two strings X and Y is the length of longest common subsequence between X and Y. Example LCS between ABAZDC and BACBAD is 4 via ABAD Derive a dynamic programming algorithm for the problem. Sariel Har-Peled (UIUC) CS374 37 Fall 2017 37 / 48

LCS Problem Definition LCS between two strings X and Y is the length of longest common subsequence between X and Y. Example LCS between ABAZDC and BACBAD is 4 via ABAD Derive a dynamic programming algorithm for the problem. Sariel Har-Peled (UIUC) CS374 37 Fall 2017 37 / 48

Part III Maximum Weighted Independent Set in Trees Sariel Har-Peled (UIUC) CS374 38 Fall 2017 38 / 48

Maximum Weight Independent Set Problem Input Graph G = (V, E) and weights w(v) 0 for each v V Goal Find maximum weight independent set in G 2 A 5 B 15 C F 2 10 E D 20 Maximum weight independent set in above graph: {B, D} Sariel Har-Peled (UIUC) CS374 39 Fall 2017 39 / 48

Maximum Weight Independent Set Problem Input Graph G = (V, E) and weights w(v) 0 for each v V Goal Find maximum weight independent set in G 2 A 5 B 15 C F 2 10 E D 20 Maximum weight independent set in above graph: {B, D} Sariel Har-Peled (UIUC) CS374 39 Fall 2017 39 / 48

Maximum Weight Independent Set in a Tree Input Tree T = (V, E) and weights w(v) 0 for each v V Goal Find maximum weight independent set in T r 10 5 a b 8 4 c d e f g 11 4 9 3 2 h i 7 j 8 Maximum weight independent set in above tree:?? Sariel Har-Peled (UIUC) CS374 40 Fall 2017 40 / 48

Towards a Recursive Solution For an arbitrary graph G: 1 Number vertices as v 1, v 2,..., v n 2 Find recursively optimum solutions without v n (recurse on G v n ) and with v n (recurse on G v n N(v n ) & include v n ). 3 Saw that if graph G is arbitrary there was no good ordering that resulted in a small number of subproblems. What about a tree? Natural candidate for v n is root r of T? Sariel Har-Peled (UIUC) CS374 41 Fall 2017 41 / 48

Towards a Recursive Solution For an arbitrary graph G: 1 Number vertices as v 1, v 2,..., v n 2 Find recursively optimum solutions without v n (recurse on G v n ) and with v n (recurse on G v n N(v n ) & include v n ). 3 Saw that if graph G is arbitrary there was no good ordering that resulted in a small number of subproblems. What about a tree? Natural candidate for v n is root r of T? Sariel Har-Peled (UIUC) CS374 41 Fall 2017 41 / 48

Towards a Recursive Solution For an arbitrary graph G: 1 Number vertices as v 1, v 2,..., v n 2 Find recursively optimum solutions without v n (recurse on G v n ) and with v n (recurse on G v n N(v n ) & include v n ). 3 Saw that if graph G is arbitrary there was no good ordering that resulted in a small number of subproblems. What about a tree? Natural candidate for v n is root r of T? Sariel Har-Peled (UIUC) CS374 41 Fall 2017 41 / 48

Towards a Recursive Solution Natural candidate for v n is root r of T? Let O be an optimum solution to the whole problem. Case r O : Then O contains an optimum solution for each subtree of T hanging at a child of r. Case r O : None of the children of r can be in O. O {r} contains an optimum solution for each subtree of T hanging at a grandchild of r. Subproblems? Subtrees of T rooted at nodes in T. How many of them? O(n) Sariel Har-Peled (UIUC) CS374 42 Fall 2017 42 / 48

Towards a Recursive Solution Natural candidate for v n is root r of T? Let O be an optimum solution to the whole problem. Case r O : Then O contains an optimum solution for each subtree of T hanging at a child of r. Case r O : None of the children of r can be in O. O {r} contains an optimum solution for each subtree of T hanging at a grandchild of r. Subproblems? Subtrees of T rooted at nodes in T. How many of them? O(n) Sariel Har-Peled (UIUC) CS374 42 Fall 2017 42 / 48

Towards a Recursive Solution Natural candidate for v n is root r of T? Let O be an optimum solution to the whole problem. Case r O : Then O contains an optimum solution for each subtree of T hanging at a child of r. Case r O : None of the children of r can be in O. O {r} contains an optimum solution for each subtree of T hanging at a grandchild of r. Subproblems? Subtrees of T rooted at nodes in T. How many of them? O(n) Sariel Har-Peled (UIUC) CS374 42 Fall 2017 42 / 48

Towards a Recursive Solution Natural candidate for v n is root r of T? Let O be an optimum solution to the whole problem. Case r O : Then O contains an optimum solution for each subtree of T hanging at a child of r. Case r O : None of the children of r can be in O. O {r} contains an optimum solution for each subtree of T hanging at a grandchild of r. Subproblems? Subtrees of T rooted at nodes in T. How many of them? O(n) Sariel Har-Peled (UIUC) CS374 42 Fall 2017 42 / 48

Towards a Recursive Solution Natural candidate for v n is root r of T? Let O be an optimum solution to the whole problem. Case r O : Then O contains an optimum solution for each subtree of T hanging at a child of r. Case r O : None of the children of r can be in O. O {r} contains an optimum solution for each subtree of T hanging at a grandchild of r. Subproblems? Subtrees of T rooted at nodes in T. How many of them? O(n) Sariel Har-Peled (UIUC) CS374 42 Fall 2017 42 / 48

Example r 10 5 a b 8 4 c d e f g 11 4 9 3 2 h i 7 j 8 Sariel Har-Peled (UIUC) CS374 43 Fall 2017 43 / 48

A Recursive Solution T (u): subtree of T hanging at node u OPT (u): max weighted independent set value in T (u) { v child of u OPT (v), OPT (u) = max w(u) + v grandchild of u OPT (v) Sariel Har-Peled (UIUC) CS374 44 Fall 2017 44 / 48

A Recursive Solution T (u): subtree of T hanging at node u OPT (u): max weighted independent set value in T (u) { v child of u OPT (v), OPT (u) = max w(u) + v grandchild of u OPT (v) Sariel Har-Peled (UIUC) CS374 44 Fall 2017 44 / 48

Iterative Algorithm 1 Compute OPT (u) bottom up. To evaluate OPT (u) need to have computed values of all children and grandchildren of u 2 What is an ordering of nodes of a tree T to achieve above? Post-order traversal of a tree. Sariel Har-Peled (UIUC) CS374 45 Fall 2017 45 / 48

Iterative Algorithm 1 Compute OPT (u) bottom up. To evaluate OPT (u) need to have computed values of all children and grandchildren of u 2 What is an ordering of nodes of a tree T to achieve above? Post-order traversal of a tree. Sariel Har-Peled (UIUC) CS374 45 Fall 2017 45 / 48

Iterative Algorithm MIS-Tree(T ): Let v 1, v 2,..., v n be a post-order traversal of nodes of T for i = 1 to n do ( ) v M[v i ] = max j child of v i M[v j ], w(v i ) + v j grandchild of v i M[v j ] return M[v n ] (* Note: v n is the root of T *) Space: O(n) to store the value at each node of T Running time: 1 Naive bound: O(n 2 ) since each M[v i ] evaluation may take O(n) time and there are n evaluations. 2 Better bound: O(n). A value M[v j ] is accessed only by its parent and grand parent. Sariel Har-Peled (UIUC) CS374 46 Fall 2017 46 / 48

Iterative Algorithm MIS-Tree(T ): Let v 1, v 2,..., v n be a post-order traversal of nodes of T for i = 1 to n do ( ) v M[v i ] = max j child of v i M[v j ], w(v i ) + v j grandchild of v i M[v j ] return M[v n ] (* Note: v n is the root of T *) Space: O(n) to store the value at each node of T Running time: 1 Naive bound: O(n 2 ) since each M[v i ] evaluation may take O(n) time and there are n evaluations. 2 Better bound: O(n). A value M[v j ] is accessed only by its parent and grand parent. Sariel Har-Peled (UIUC) CS374 46 Fall 2017 46 / 48

Iterative Algorithm MIS-Tree(T ): Let v 1, v 2,..., v n be a post-order traversal of nodes of T for i = 1 to n do ( ) v M[v i ] = max j child of v i M[v j ], w(v i ) + v j grandchild of v i M[v j ] return M[v n ] (* Note: v n is the root of T *) Space: O(n) to store the value at each node of T Running time: 1 Naive bound: O(n 2 ) since each M[v i ] evaluation may take O(n) time and there are n evaluations. 2 Better bound: O(n). A value M[v j ] is accessed only by its parent and grand parent. Sariel Har-Peled (UIUC) CS374 46 Fall 2017 46 / 48

Iterative Algorithm MIS-Tree(T ): Let v 1, v 2,..., v n be a post-order traversal of nodes of T for i = 1 to n do ( ) v M[v i ] = max j child of v i M[v j ], w(v i ) + v j grandchild of v i M[v j ] return M[v n ] (* Note: v n is the root of T *) Space: O(n) to store the value at each node of T Running time: 1 Naive bound: O(n 2 ) since each M[v i ] evaluation may take O(n) time and there are n evaluations. 2 Better bound: O(n). A value M[v j ] is accessed only by its parent and grand parent. Sariel Har-Peled (UIUC) CS374 46 Fall 2017 46 / 48

Iterative Algorithm MIS-Tree(T ): Let v 1, v 2,..., v n be a post-order traversal of nodes of T for i = 1 to n do ( ) v M[v i ] = max j child of v i M[v j ], w(v i ) + v j grandchild of v i M[v j ] return M[v n ] (* Note: v n is the root of T *) Space: O(n) to store the value at each node of T Running time: 1 Naive bound: O(n 2 ) since each M[v i ] evaluation may take O(n) time and there are n evaluations. 2 Better bound: O(n). A value M[v j ] is accessed only by its parent and grand parent. Sariel Har-Peled (UIUC) CS374 46 Fall 2017 46 / 48

Example r 10 5 a b 8 4 c d e f g 11 4 9 3 2 h i 7 j 8 Sariel Har-Peled (UIUC) CS374 47 Fall 2017 47 / 48

Takeaway Points 1 Dynamic programming is based on finding a recursive way to solve the problem. Need a recursion that generates a small number of subproblems. 2 Given a recursive algorithm there is a natural DAG associated with the subproblems that are generated for given instance; this is the dependency graph. An iterative algorithm simply evaluates the subproblems in some topological sort of this DAG. 3 The space required to evaluate the answer can be reduced in some cases by a careful examination of that dependency DAG of the subproblems and keeping only a subset of the DAG at any time. Sariel Har-Peled (UIUC) CS374 48 Fall 2017 48 / 48