CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

Similar documents
Assignment 5: Solutions

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

CS 4407 Algorithms Lecture: Shortest Path Algorithms

More Dynamic Programming

More Dynamic Programming

Design and Analysis of Algorithms

Analysis of Algorithms. Outline. Single Source Shortest Path. Andres Mendez-Vazquez. November 9, Notes. Notes

CS 580: Algorithm Design and Analysis

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CSE 202 Dynamic Programming II

Practice Final Solutions. 1. Consider the following algorithm. Assume that n 1. line code 1 alg(n) { 2 j = 0 3 if (n = 0) { 4 return j

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Exam EDAF May Thore Husfeldt

Enumeration Schemes for Words Avoiding Permutations

Topics in Approximation Algorithms Solution for Homework 3

Proof Techniques (Review of Math 271)

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

BFS Dijkstra. Oct abhi shelat

CSE 555 Homework Three Sample Solutions

5.5 Deeper Properties of Continuous Functions

On improving matchings in trees, via bounded-length augmentations 1

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Single Source Shortest Paths

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

Counting independent sets of a fixed size in graphs with a given minimum degree

Single Source Shortest Paths

Partha Sarathi Mandal

6.2 Deeper Properties of Continuous Functions

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

CS173 Lecture B, November 3, 2015

k-protected VERTICES IN BINARY SEARCH TREES

Languages, regular languages, finite automata

Even Cycles in Hypergraphs.

HOMEWORK #2 - MATH 3260

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

Introduction to Algorithms

Maximal and Maximum Independent Sets In Graphs With At Most r Cycles

Introduction to Algorithms

Maximising the number of induced cycles in a graph

Locating-Total Dominating Sets in Twin-Free Graphs: a Conjecture

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk

CSE 202 Homework 4 Matthias Springer, A

Augmenting Outerplanar Graphs to Meet Diameter Requirements

Cleaning Interval Graphs

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Tree sets. Reinhard Diestel

Query Processing in Spatial Network Databases

Divide-and-Conquer Algorithms Part Two

Pattern Popularity in 132-Avoiding Permutations

CS Data Structures and Algorithm Analysis

Examination paper for TDT4120 Algorithms and Data Structures

On Minimal Words With Given Subword Complexity

Introduction to Algorithms

Lecture 2 September 4, 2014

Cographs; chordal graphs and tree decompositions

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk. 10/8/12 CMPS 2200 Intro.

CS1800: Mathematical Induction. Professor Kevin Gold

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty

All-Pairs Shortest Paths

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Santa Claus Schedules Jobs on Unrelated Machines

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction

Lecture 13: Spectral Graph Theory

Packing and decomposition of graphs with trees

Computational Models Lecture 8 1

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions

Design and Analysis of Algorithms

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Algorithm Design and Analysis

2 P vs. NP and Diagonalization

Breadth-First Search of Graphs

Double domination edge removal critical graphs

A Questionable Distance-Regular Graph

Join Ordering. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H :

Breadth First Search, Dijkstra s Algorithm for Shortest Paths

Greedy Algorithms My T. UF

k-degenerate Graphs Allan Bickle Date Western Michigan University

CSE 417. Chapter 4: Greedy Algorithms. Many Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

NAOMI NISHIMURA Department of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada.

Homework Assignment 4 Solutions

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Nowhere 0 mod p dominating sets in multigraphs

Perfect matchings in highly cyclically connected regular graphs

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Pigeonhole Principle and Ramsey Theory

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

THE RADIO NUMBERS OF ALL GRAPHS OF ORDER n AND DIAMETER n 2

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

A misère-play -operator

Chapter 4: Computation tree logic

An Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Transcription:

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines Problem 1 (a) Consider the situation in the figure, every edge has the same weight and V = n = 2k + 2. Easy to check, every simple path from s to t is a shortest path and the number of such paths is 2 k+1 = 2 n/2 which grows fast than any polynomial function of n. There are millions of possible answers as well. (b) We just a need small modification for Dijkstra s algorithm. Modified Dijkstra s algorithm Let dis[] be distance vector Let num[] be a vector for number of shortest paths Let Q be a priority queue for each vertex v do dis[v] = num[v] = 0 end for dis[s] = 0 num[s] = 1 Add every vertex to Q while Q is not empty do u = vertex in Q with min dis[u] Remove u from Q for each neighbor v of u do if dis[u] + length(u, v) < dis[v] then 1

dis[v] = dis[u] + length(u, v) num[v] = num[u] else if dis[u] + length(u, b) == dis[v] then num[v]+ = num[u] end if end for end while Output num[t] num[t] is the number of shortest s t paths. The algorithm works correctly since we update num[v] only when the current path length is shortest. Also, if v s predecessor, u, is also on v s shortest path, we should add num[u] to num[v] by property of optimal substructure. The complexity of the algorithm is the same as original Dijkstra s algorithm. Problem 2 Recall from class that given an RNA sequence x 1 x n, we computed the largest number of matching pairs opt(i, j) for each consecutive subsequence x i x j for 1 i < j n by dynamic programming working from the smallest values of j i to the largest. Now we want the largest κ numbers of matching pairs in a solution; in the problem statement, κ = 10 but we will state it generally. We keep track of the best κ sizes for each choice of i and j, so that opt(i, j) is now a multiset of size at most κ. The only change we make is in the recurrence for opt(i, j). We compute it as follows. When j i 4, the multiset opt(i, j) = {0}, because sharp turns are not permitted. For two multisets A and B, let A B be the multiset {a+b : a A, b B}. To compute opt(i, j), we form a set X of candidates. Initially X is empty. For every i t < j 4 for which {x j, x t } = {C, G} or {x j, x t } = {A, U}, add the elements of opt(i, t 1) opt(t + 1, j 1) to X. Remove all but the κ largest elements in X; then add 1 to each entry. Keeping track of the actual sets of matching pairs is now straightforward; each size in one of the multisets can be associated with a specific set of matching pairs. The run time is (up to O( )) the same as the original algorithm when κ is fixed. Problem 3 In order to avoid kinks, we use different states. Let opt(i, j) be the optimal number of matching pairs without kinks, sharp turns and crossings between position i and j. In addition, we let f(i, j) be the optimal number of pairs between position i and i if x i and x j could be matched and here we force x i matched to x j and all other inside matching pairs are valid. Then, we can derive recurrence relation between opt(i, j) and f(i, j). if x j is not involved in any pair, then Otherwise, opt(i, j) = opt(i, j 1) 2

opt(i, j) = max{opt(i, t 1) + f(t, j)} for all t such that x t is matched to x j and j > t + 4. The idea is to enumerate the pair that j is involved and by definition, it is f(t, j). The rest sequence becomes a subproblem which is opt(i, t 1). Then let s look at structure of f(i, j). As we define, x i is matched to x j and we need to handle subsequence from x i+1 to x j 1. If we use opt(i + 1, j 1) directly, we may fall to the trap since opt(i + 1, j 1) may be obtained from f(i + 1, j 2) or f(i + 2, j 1) which forms a kink. In order to avoid that, we can enumerate matching pairs that x i+1 and x j 1 are involved. Initially, we set every f(i, j) be 1, and we have following recursion: First, we focus on x i+1 f(i, j) = max{f(i, j), 1 + max{f(i + 1, t) + opt(t + 1, j 1)}} here we check all t > i + 1 + 4 such that x t could be paired with x i and t j 2. Again, we are enumerating the pair where x i is matched and it won t create a kink. Similarly, we could handle x j 1 f(i, j) = max{f(i, j), 1 + max{opt(i + 1, t 1) + f(t, j 1)}} for all t < j 1 4 except t = i + 2. Finally, we may also consider the case where neither x i+1 nor x j 1 is in the substructure, then we have For edge cases, we have f(i, j) = max{f(i, j), 1 + opt(i + 2, j 2)} f(i, j) = 1 if j = i + 5 opt(i, j) = 0 if i j 4 All entries can be computed in O(n 3 ) time where n is the length of whole sequence. Problem 4 Let V = n and w(e) = 1 for every edge. Select arbitrarily a vertex r and treat it as the root of the tree. The distance from r to every vertex can be calculated using, for example, breadth-first search in time linear in the number of edges (and hence linear in the number of vertices because G is a tree). The total of the distances from r can be calculated, and dividing this total by n yields the average distance from 3

r. Our objective, therefore, is to select a root r that minimizes the total of the distances from r. Suppose that we rooted the tree at r. We calculate, for each vertex v, its number δ v of descendants in the tree as follows. First compute the degree d v of each vertex in O(n) time; in the process form a list L of non-root vertices of degree 1. Initialize δ v = 1 for each vertex. Now while L is not empty, choose v L, let w be the parent of v, add δ v to δ w, delete v, and if w now has degree 1 (i.e., has no other children), add w to L unless it is the root. When L becomes empty, for every vertex v we have that δ v is the count of its descendants. We claim that r is a correct vertex at which to place the CA if and only if every child c of r has δ c n/2. To see this, first suppose that some child has c of r has δ c > n/2. Moving the root from r to c adds 1 to the distance to n δ c vertices but subtracts 1 from the distance to δ c vertices. Hence the total distance decreases by moving the root to c, and r is not the correct vertex to choose. In the other direction, suppose that every child c of r has δ c n/2. To the contrary suppose that there is a vertex for which the total of the distances is less than from r; choose the one f that is closest to r. Now f must be a descendant of a child c of r, and hence f has at most n/2 descendants. But by the argument above, moving from f to its parent cannot increase the total of the distances, a contradiction. So r is indeed a correct choice. This underlies the algorithm. Having tried r, we check whether any child of r has more than n/2 descendants. If none does, we respond with r. Otherwise we choose such a child c, and move the root to c from r. To update the numbers of descendants, only two changes are needed. If r had n descendants and c had δ c, r will now have n δ c, and c will have n. This takes constant time. We can move the root no more than n times, so after O(n) moves each taking constant time, we report a correct vertex. Problem 5 Suppose that the given English word is W = w 1 w 2 w n and Bengali word is B = b 1 b 2 b m. We suppose that the standard alphabet in which each is written is Σ. We are to find a word P = p 1 p l for which max{ed(w, P ), ed(b, P )}} = min max{ed(w, Q), ed(b, Q)}} Q Σ In defining edit distance ed(x, y), we assume that the gap penalty is δ > 0; the mismatch penalty α aa = 0 for a Σ, and the mismatch penalty α ab = α ba > 0 for distinct a, b Σ. As in class we can form a directed graph G on (n + 1)(m + 1) vertices, say {0,..., n} {0,..., m}. Draw a directed edge from (i, j) to (i + 1, j) with cost δ; a directed edge from (i, j) to (i, j + 1) with cost δ, and a directed edge from (i, j) to (i + 1, j + 1) with cost α wi+1 b j+1. Then ed(w, B) is precisely the length of a shortest path from (0, 0) to (n, m) in G. Intuitively we want to split this path in half to find a good candidate for PIE, but we need to be careful. 4

Consider a particular shortest path from (0,0) to (n, m) in G. Let L be the list of insertions, deletions, and substitutions performed on W in following this path. The total cost of the operations in L is denoted by t. No operation in L can cost more than 2δ; for if so it is a substitution and can be replaced by an insertion and a deletion, lowering the total cost. Choose a subset L of L whose costs total t t/2, as close to t/2 as possible. Applying the operations in L to W yields a word P with ed(w, P ) = t and ed(p, B) = t t. We note that edit distance satisfies the triangle inequality that ed(w, B) ed(w, P ) + ed(p, B), so that P appears to be a good candidate if t and t t are as equal as possible. This is a good start, but we cannot be sure that we have selected the right path, and have not said how to find L. I would be very happy if anyone had gotten to a similar point in developing an answer. To get an exact answer, the idea is to build paths starting from (0,0) in G, keeping track at each vertex (i, j) of a set of ordered pairs, each of which specifies a distance from w 1 w i and from b 1 b j to a closest candidate in PIE. The concern is that there appear to be too many pairs at each vertex to keep track of; but note that if one pair has both entries at least as large as the corresponding entry in another pair, we do not need to keep it. Together with some plausible assumptions about the mismatch penalties, we can then ensure that the list of pairs at each vertex has polynomial length. Needless to say, I have omitted many details. And how might one find all? Sketch: Enumerate paths in G of close to shortest length; consider each way to split its operations into two sets of approximately equal cost. 5