Lecture 4: An FPTAS for Knapsack, and K-Center

Comp 260: Advanced Algorithms Tufts University, Spring 2016 Prof. Lenore Cowen Scribe: Eric Bailey Lecture 4: An FPTAS for Knapsack, and K-Center 1 Introduction Definition 1.0.1. The Knapsack problem (restated) Given n objects {a 1,..., a n }, with sizes {s 1,..., s n }, and profits {p 1,..., p n }, and a knapsack with capacity B, where a i, s i, p i, B N and a i B i, find a subset of objects whose total size is bounded by B and whose total profit is maximized. 2 Hardness Of Approximation Theorem 2.0.2. If P NP, then no polynomial-time algorithm can solve the Knapsack Problem with a p k solution for and fixed constant k. Proof. Assume there exists a polynomial-time algorithm A with performance guarantee k > 0 for all instances of the Knapsack Problem. We show that A can be used to construct a solution with value p in polynomial-time. Suppose we are given an instance I = {< a i, p i, s i >} of knapsack of size n and capacity B. Let I = {< a i, p i, s i >} where a i = a i, s i = s i, p i = (k + 1)p i, and B = B = m. Definition 2.0.3. A solution is feasible if it can fit in the knapsack. Remark 2.0.4. A feasible solution for I, i.e. a set of objects that fit in the knapsack, is equivalent to a feasible solution for I. 1

Run algorithm A on I which yields solution A(I ) such that A(I ) p I k. Considering the same solution M on I yields (k + 1)M (k + 1)p I k. So dividing by k + 1, M p I k k + 1 < 1. But since all solutions are integral, M p I = 0. Therefore M is an optimal solution to the Knapsack Problem. Definition 2.0.5. Let π be an optimization problem with objective function f π and optimal solution S. A is an approximation scheme for π if on input (I, ɛ), where I is an instance of π and ɛ > 0 is an error parameter, it outputs a solution S such that: { (1 + ɛ)s if π is a minimization problem f π (I, ɛ) (1 ɛ)s if π is a maximization problem Definition 2.0.6. A is said to be a PTAS (Polynomial Time Approximation Scheme) if for each fixed ɛ > 0, its running time is polynomial in the size of of instance I. Definition 2.0.7. A is said to be an FPTAS (Fully PTAS) if the running time of A is bounded by a polynomial in the size of I and 1 ɛ. Claim 2.0.8. For ɛ < 1, there exists an algorithm giving a (1 ɛ)p solution in O(n 1 3 ) time for the Knapsack Problem. ɛ 2

3 An NP-hard dynamic programming algorithm for Knapsack Let p max be the profit of the most profitable object. p max = max i n p i and let p denote the optimal solution, the most profit we can take home in the knapsack. Then it follows that n p max p This is obvious since p is less then the sum of all n p i s. And this sum is in turn at most n p max. For each i {1,..., n} and p {1,..., n p max }, let S i,p denote a subset of {a 1,..., a i } whose total profit is exactly p and whose total size is minimized. Let A(i, p) denote the size of S i,p, where A(i, p) = if no such set S i,p exists. Thus, the p can be expressed as: p = max{p A(n, p) B} We can use a dynamic programming algorithm which runs in O(n 2 p max ) to compute all A(i, p) s, and then select the instance with the smallest size and maximum profit, thus solving the knapsack problem. Wait! I thought this was a NP-Hard problem? Didn t you contradict yourself by stating a polynomial running time? Actually no. It would be polynomial 3

if p max were polynomial wrt n. But we are not guaranteed this. If we were, then yes, this algorithm runs in polynomial time. Instead, this algorithm is called pseudo-polynomial, as the actual value of p max is of size O(2 n ) with regard to the input to the problem. This is because p max is written in binary in the input, thus n O(log p max ) when actually input into the problem. 3.1 Dynamic Programming Algorithm for Knapsack Goal: Compute A(i, p) for i {1,..., n}, p {1,..., n p max } in time O(n 2 p max ) using dynamic programming. First, compute A(1, p) for each p in {1,..., n p max }. That s simply: A(1, p) = s 1 if p 1 = p A(1, p) = if p 1! = p To demonstrate, here is an example knapsack with objects, sizes and profits as specified: Object A B C D E Size 7 2 9 3 1 Profit 3 2 3 1 2 We can construct a table where we store the results from our dynamic programming algorithm. The number of columns in the table is determined by n p max (in this case, 5 * 3). The first row is as follows: 1 2 3 4 5 6 7 8... 15 A(1, p) 7... 4

To calculate A(2,p) and so on, we use the following recurrence: A(i + 1) = min(a(i, p), s i+1 + A(i, p p i+1 )) if p i+1 < p = A(i, p) if otherwise Using this recurrence, we can fill in the next few rows of the table like so: 1 2 3 4 5 6 7 8... 15 A(1, p) 7... A(2, p) 2 7 9... A(2, p) 2 7 9 16 18... Given the position in question, the recurrence gives a choice between the value in the column directly above (calculated without taking into consideration the i + 1st element), or the value gotten from using the i + 1st element plus whatever is in the table using the first i elements to generate profit p p i+1. For example, to calculate A(3, 5) we notice that p 3 < p, or 3 < 5, so we have a choice between the value in A(2, 5), which is 9 or s 3 + A(2, 2) which equals 11. Clearly 9 is the minimum of the two and gets assigned as the value of A(3, 5). Once the table is completely filled, we scan the profit columns right to left looking for the first occurrence of a size B. That gives us our p. END Also, note that we must store backpointers to explain where each entry came from, in order to choose the actual set of items responsible for the actual values of the matrix we decided to fill. As values can only be filled from the space directly above the actual value A(i, j) or to the top left of this value, backpointers may only point up or to the top left. Pointers that point to the top left item indicate that we chose item i, and pointers that just point up indicate that we did not choose item i. The problem with this method is that there could potentially be many columns given a large enough p max. So, the next question is: how to turn this into an approximation algorithm which runs in polynomial time regardless of p max? 5

4 An FPTAS for Knapsack In this section we construct a FPTAS for knapsack; we ll refer to it as KNAP- SACK FTPAS. To make this algorithm run in polynomial time, we will simply ignore a certain number of the least significant digits, so we will get a pretty good approximation (by only looking at the most important digits), but still not perfect, as we are losing information. Steps: 1. Given ɛ > 0, let k = ɛ pmax n 2. For each a i, define p i = p i k 3. Let I = (a i, s i, p i) where a i = a i, s i = s i, and p i is as shown above. The dynamic algorithm for solving Knapsack is then applied to the new instance I and outputs max{s max, S }, where S max is the smallest object of profit p max if S max B. Lemma 4.0.1. Let A denote the set output by KNAPSACK FPTAS. profit(a) p 1 1 + ɛ Proof. Let O denote the set with profit p on an instance I of knapsack. We now reason about I and the associated rounded profit instance I defined above. Note that any feasible (meaning the items fit in the knapsack) instance of knapsack in I corresponds to a feasible instance of knapsack in I and vice versa, since the objects and sizes haven t changed, only profits have changed. For an instance N of knapsack we denote by P rofit(n) its profit under the original instance, and by P rofit (N) its profit using the new rounded profits. 6

For all objects a, p a k can be smaller than p a (because of the floor function) but not by more than k. This follows from the definition of p i. Restated: p a p a k k Thus: P rofit(o) k P rofit (O) nk (1) Now, P rofit (S ) is optimal, which implies P rofit (S ) P rofit (Y ), for any Y that fits in the knapsack. (2) Therefore: P rofit (S ) P rofit (O) (3) multiplying both sides by k, we get: k P rofit (S ) k P rofit (O) (4) So by (2) we get: P rofit(s ) k P rofit (S ) (by defn of P rofit ) k P rofit (O) (by 3) P rofit(o) nk (by 1) p nk (by defn of P rofit(0)) p ɛ p max (by defn of k) 7

We also know that P rofit(a) p max (5) and also that P rofit(a) P rofit(s ) (6) since A returns the max of these two values. Therefore: P rofit(a) profit(s ) (since the profits of S are lower than A s) p ɛ p max p ɛ P rofit(a) By simple algebra, we get: P rofit(a) 1 1 + ɛ p which completes our proof of the lemma. 4.1 Proof of Polynomial Time Execution Theorem 4.1.1. KNAPSACK FPTAS is an FPTAS Proof. By the lemma, the solution is within 1 - ɛ of p. By the definition of k, the running time is: 8

O(n 2 p max k ) = O(n2 n ɛ ) QED. Note: the smaller the ɛ, or the closer you want to get to p, the more the running time inflates. Definition 4.1.2. A problem Π is strongly NP-Hard if every problem in NP can be polynomially reduced to Π so that all numbers in the reduced instance can be written in unary. Note: If a problem has an FPTAS, it can t be strongly NP-Hard. 5 k-center Problem Imagine we have a complete, undirected graph, where each node is a city, and edges represent the shortest distance between these cities. We have funds to build exactly k emergency centers. The k-center problem with triangle inequality is to place our k emergency centers such that no one has to go too far to get to their closest center. k-center Problem: Input: Given G = (V, E), a complete undirected graph whose edges are shortest paths between each pair of nodes. Let D ij denote the path distance between nodes i and j. (Remember, if we start with an incomplete graph, we can make it complete by adding edges where D ij is the length of the shortest existing path between i and j.) Output: A subset of nodes S V with S = k, such that the longest distance of a node to its closest node in S is minimized. Specifically, we want to minimize cost(s) = max j V min i S D ij. Our Approximation Algorithm: 9

We assume that G satisfies the triangle inequality (i.e. D ij + D jk D ik, i, j, k V ). So first, we reorder the edges e 1, e 2..., e m, in order of cost, such that cost(e 1 ) cost(e 2 )... cost(e m ). Then, we add the lightest edge e 1 and look at that graph, then add the edges e 1, e 2, and look at that graph, then add edges e 1, e 2, e 3... Let G i = (V, E i ), where E i = {e 1, e 2,..., e i }. Note that our original graph G is now G m. Definition 5.0.3. A dominating set of G is a subset S V such that every node in V S is adjacent to a vertex in S. (That is, for each node, either you re in, or you have a neighbor who is in the dominating set.) Claim 5.0.4. The optimal solution to a k-center problem is a dominating set in G. (Note that this is a trick question because G is complete. Any one vertex or set of vertices in a complete graph is a dominating set!) Claim 5.0.5. The optimal solution to a k-center problem in G is a dominating set in G i for some i i 0. This claim is trivially true (when i = m). Look at graphs G 1, G 2, G 3,..., G m 2, G m 1, G m. As we move backward from G m, at which point do we no longer have a dominating set? Let C be the cost of an optimal solution to k-center in G. Let e c be the LAST edge of cost c. Remember, this edge is not unique! Multiple edges could have the same cost, so e c is the LAST edge, such that for all e c+1...e i, cost(e c+1...) > cost(g c ). If we consider G c, we know we have a graph that only includes edges up to cost c. For example, if we want to get everyone to the emergency center in 20 minutes or less, we ignore all edges that take more than 20 minutes, and we are now considering G 20. Claim 5.0.6. There is a dominating set in G c of size k or less, and if we can find it, we have our solution to k-center. Claim 5.0.7. There is no dominating set in G c 1 of size k or less. For ease of argument, let s assume all edges have distinct costs. This claim is trivially true by contradiction. Suppose otherwise, that is, there is a dominating set in G c 1 of size k or less. This feasible, dominating 10

set is a solution to k-center of cost < C, which is a contradiction since we assume C is optimal. According to the two claims above, the k-center problem with triangle inequality is equivalent to finding the smallest index i, such that G i has a dominating set of size k. Since finding the dominating set like k-center is NP-hard, we can use this fact to approximate k-center by lower-bounding the size of the dominating set in G i. Definition 5.0.8. The square of graph G = (V, E), denoted G 2 = (V, E 2 ) has an edge between i and j, if and only if there is a path of length 1 or 2 between i and j. Notes: We can compute the square of a graph by multiplying the adjacency matrices. The cube of graph G, denoted G 3, adds edges between i and j if there exists a path of length 1, 2, or 3 between i and j. This can be extended to create G 4, G 5,... This makes no difference for G (which is complete), but we are also going to be looking at G c, for different c s, which are certainly not complete graphs. Definition 5.0.9. An independent set in a graph G = (V, E) is a set S V, such that i S, if (i, j) E, then j S. Definition 5.0.10. A maximal independent set (MIS) in a graph G = (V, E) is an independent set such that v V, either v S, or u such that (u, v) E, and u S. Finding a maximum independent set of a graph is NP-hard, but finding a maximal independent set of a graph can be solved in polynomial time using a simple greedy algorithm. Lemma 5.0.11. Let H = (V, E). Let I be an independent set in H 2. Then I dom(h), where dom(h) denotes the size of a minimum cardinality dominating set in H. (Note: dom(h) is NP-hard to compute, but any independent set is smaller than this.) 11

Proof. Let D be a minimum cardinality dominating set in H. For each vertex d D, its neighborhood forms a clique in H 2. So H 2 contains D cliques spanning all the vertices, which implies any independent set in H 2 can pick at most 1 vertex per clique. So I D. If we start with a vertex and its neighbors and square it, we have a clique! Therefore, we can only take 1 vertex from each of these cliques / neighborhoods. 6 2-Approximation Algorithm for k-center Algorithm A: 1. Construct G 2 1, G 2 2,...,G 2 m. 2. Compute a maximal independent set (MIS) L i in each graph G 2 i. 3. Find the smallest index i, such that L i k, and call that MIS L j 4. Return L j Lemma 6.0.12. For j as defined in the algorithm above, the cost(e j ) C. Note that cost(e j ) is the most expensive edge in L j. Proof. For every i < j, we have L i > k since dom(g i ) L i by Lemma 1. (That is, if we go through and look at G 2 1, G 2 2, G 2 3,..., G 2 m, we run out of our k centers before we reach G 2 m. That implies dom(g i ) > k. So the first index for which the k-center problem forms a dominating set > i, so C > cost(e i ).) Theorem 6.0.13. Algorithm A returns a solution of cost at most 2 OP T. Proof. Observe that a maximal independent set in H 2 is also a dominating set in H 2 (any maximal independent set is a dominating set, but not vice versa). Thus, if we have a maximal independent set that is equal to the dominating set in G 2 i (let s call it D), then every vertex is on a path of length at most 2 to a vertex in D in the original graph G i. 12

Since i < C by lemma 2, then each edge e G i, cost(e) < cost(c ), the path of length 2 in G has edges of cost less than cost(c ). Thus, by triangle inequality, cost 2C, vertex to their closest vertex in D. We have shown a 2-approximation to the k-center problem, but can we do any better? The answer is NO, as we show in the following section. 7 Hardness of Approximation Theorem 7.0.14. Approximating the k-center problem with triangle inequality within a factor of 2 ɛ is NP-hard for any ɛ > 0. Proof by reduction from dominating set. Given a graph G = (V, E), we construct an instance of k-center satisfying the triangle inequality, such that if G has a dominating set of size k, then the optimal cost of the k-center is 1. Otherwise the optimal cost of the k- center is 2. We put the following weights on the edges of the complete graph (note that they satisfy the triangle inequality): w(e) = { 1 if e E; 2 if e E. Thus, the approximation algorithm will output a solution of cost 1 if there is a dominating set in G and output a solution of cost 2 otherwise. So we can use the approximation algorithm to decide whether there is a dominating set in G. Therefore, the approximation algorithm is also NP-hard. 13