Submodular and Linear Maximization with Knapsack Constraints. Ariel Kulik

Submodular and Linear Maximization with Knapsack Constraints Ariel Kulik

Submodular and Linear Maximization with Knapsack Constraints Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Ariel Kulik Submitted to the Senate of the Technion Israel Institute of Technology Tevet 5771 Haifa January 2011

The research thesis was done under the supervision of Prof. Hadas Shachnai in the Computer Science Department. The generous financial support of the Technion is gratefully acknowledged.

Contents Abstract 1 Abbreviations and Notations 3 1 Introduction 4 1.1 Submodular Optimization.................... 4 1.2 Variants of Maximum Coverage and GAP........... 6 1.3 The d-dimensional Knapsack Problem............. 8 1.4 Overview of the Thesis...................... 9 2 Related Work 10 3 Maximizing Submodular Functions 13 3.1 Preliminaries........................... 13 3.2 A Probabilistic Theorem..................... 14 3.3 Rounding Instances with No Big Elements........... 17 3.4 Approximation Algorithm for SUB............... 18 3.5 Derandomization......................... 20 4 Maximum Coverage with Multiple Packing and Cost Constraints 24 4.1 Reduction to the Semi-fractional Problem........... 26 4.2 Solving the Semi-fractional Problem.............. 28 4.2.1 A Submodular Point of View.............. 28 4.2.2 Obtaining a Distribution................. 29 5 The Budgeted Generalized Assignment Problem 34 5.1 Small Assignments Instances of BSAP............. 35 5.2 Reduction to Small Assignments BSAP Instances....... 37 5.3 Small Items BSAP Instances.................. 40 i

5.4 General Inputs.......................... 41 6 Hardness Results for d-dimensional Knapsack 43 7 Discussion 48 A Basic Properties of Submodular Functions 50 B Maximum Coverage with Packing Constraint 52 ii

Abstract Submodular maximization generalizes many fundamental problems in discrete optimization, including Max-Cut in directed/undirected graphs, maximum coverage, maximum facility location and marketing over social networks. In this work we consider the problem of maximizing any submodular function subject to d knapsack constraints, where d is a fixed constant. For short, we call this problem SUB. We establish a strong relation between the discrete problem and its continuous relaxation, obtained through extension by expectation of the submodular function. Formally, we show that, for any non-negative submodular function, an α-approximation algorithm for the continuous relaxation implies a randomized (α ε)-approximation algorithm for SUB. We use this relation to improve the best known approximation ratio for the problem to 1/4 ε, for any ε > 0, and to obtain a nearly optimal (1 e 1 ε) approximation ratio for the monotone case, for any ε > 0. We further show that the probabilistic domain defined by a continuous solution can be reduced to yield a polynomial size domain, given an oracle for the extension by expectation. This leads to a deterministic version of our technique. Our approach has a potential of wider applicability, which we demonstrate on the examples of the Generalized Assignment Problem and Maximum Coverage with additional knapsack constraints. We also consider the special case of SUB in which the objective function is linear. In this case, our problem reduces to the classic d-dimensional knapsack problem. It is known that, unless P = N P, there is no fully polynomial time approximation scheme for d-dimensional knapsack, already for d = 2. The best known result is a polynomial time approximation scheme (PTAS) due to Frieze and Clarke (European J. of Operational Research, 100 109, 1984 ) for the case where d 2 is some fixed constant. A fundamental open question is whether the problem admits an efficient PTAS (EPTAS). We resolve this question by showing that there is no EPTAS for d- 1

dimensional knapsack, already for d = 2, unless W [1] = F P T. Furthermore, we show that unless all problems in SNP are solvable in sub-exponential time, there is no approximation scheme for two-dimensional knapsack whose running time is f(1/ε) I o( 1/ε), for any function f. Together, the two results suggest that a significant improvement over the running time of the scheme of Frieze and Clarke is unlikely to exist. 2

Abbreviations and Notations SUB The problem of maximizing a submodular function subject to d knapsack constraints MC Maximum Coverage with Multiple Packing and Cost Constraints MC 1 Maximum Coverage with Packing Constraint GAP Generalized Assignment Problem SAP Separable Assignment Problem BGAP Budgeted Generalized Assignment Problem BSAP Budgeted linear constrained Separable Assignment Problem PTAS Polynomial Time Approximation Scheme FPTAS Fully Polynomial Time Approximation Scheme EPTAS Efficient Polynomial Time Approximation Scheme 3

Chapter 1 Introduction 1.1 Submodular Optimization A real-valued function f, whose domain is all the subsets of a universe U, is called submodular if, for any S, T U, f(s) + f(t ) f(s T ) + f(s T ). The concept of submodularity, which can be viewed as a discrete analog of convexity, plays a central role in combinatorial theorems and algorithms (see, e.g., [20] and the references therein, and the comprehensive surveys in [16, 41, 35]). Submodular maximization generalizes many fundamental problems in discrete optimization, including Max-Cut in directed/undirected graphs, maximum coverage, maximum facility location and marketing over social networks (see, e.g., [24]). In many settings, including set covering or matroid optimization, the underlying submodular functions are monotone, meaning that f(s) f(t ) whenever S T. In other settings, the function f(s) is not necessarily monotone. A classic example of such a submodular function is f(s) = e δ(s) w(e), where δ(s) is a cut in a graph (or hypergraph) induced by a set of vertices S, and w(e) is the weight of an edge e. An example for a monotone submodular function is f G, p : 2 L R, defined on a subset of vertices in a bipartite graph G = (L, R, E). For any S L, f G, p (S) = v N(S) p v, where N(S) is the neighborhood function (i.e., N(S) is the set of neighbors of S), and p v 0 is the profit of v, for any v R. The problem max{f G, p (S) S k} is classical maximum coverage. We consider the following problem of maximizing a non-negative submodular set function subject to d knapsack constraints (SUB). Given a 4

d-dimensional budget vector L, for some d 1, and an oracle for a nonnegative submodular set function f over a universe U, where each element i U is associated with a d-dimensional cost vector c(i), we seek a subset of elements S U whose total cost is at most L, such that f(s) is maximized. Several fundamental algorithms for submodular maximization (see, e.g., [1, 4, 41, 35]) use a continuous extension of submodular function, to which we refer as extension by expectation. Given a submodular function f : 2 U R, we define F : [0, 1] U R. For any ȳ [0, 1] U, let R U be a random variable such that i R with probability y i (we say that R ȳ). Then F (ȳ) = E[f(R)] = ( f(r) ) y i (1 y i ). R U i R The general framework of these algorithms is to obtain first a fractional solution for the continuous extension, followed by rounding which yields a solution for the discrete problem. Using the definition of F, we define the continuous relaxation of our problem, called continuous SUB. Let P = {ȳ [0, 1] U i U y i c(i) L} be the polytope of the instance, then the problem is to find ȳ P for which F (ȳ) is maximized. For α (0, 1], an algorithm A yields α-approximation for the continuous problem with respect to a submodular function f, if for any assignment of non-negative costs to the elements, and for any nonnegative budget, A finds a feasible solution for the continuous problem of value at least αo, where O is the value of an optimal solution for (discrete) SUB given the costs and budget. We establish a strong relation between the problem of maximizing any submodular function subject to d knapsack constraints and its continuous relaxation. 1 Formally, we show (in Theorem 3.10 of Chapter 3) that for any non-negative submodular function, an α-approximation algorithm for the continuous relaxation implies a randomized (α ε)-approximation algorithm for the discrete problem. We use this relation to obtain approximation ratio of (1/4 ε) for SUB, for any ε > 0, thus improving the best known result for the problem, due to Lee et al. [35]. For the case where the objective function is monotone, we use this relation to obtain a nearly optimal (1 e 1 ε) approximation, for any ε > 0. An important consequence of the above relation is that for any class of submodular functions, a future improvement of the approximation ratio for the continuous problem, to a factor of α, immediately implies an approximation ratio of (α ε) for the i/ R 1 The result appears in [33] (A preliminary version appeared in [34]). 5

original instance. Our technique applies random sampling on the solution space, using a distribution defined by the fractional solution for the problem. In Section 3.5 we show how to convert a feasible solution for the continuous problem to another feasible solution with up to O(log U ) fractional entries, given an oracle to the extension by expectation. This facilitates the usage of exhaustive search instead of sampling, which leads to a deterministic version of our technique. Specifically, we obtain a deterministic (1/4 ε)-approximation for general instances and (1 e 1 ε)-approximation for instances where the submodular function is monotone. For the special case of maximum coverage with d knapsack constraints, that is, SUB where the objective function is f = f G, p for a given bipartite graph G and profits p, this result leads to a deterministic (1 e 1 ε) approximation algorithm, since the extension by expectation of f G, p can be deterministically evaluated. 1.2 Variants of Maximum Coverage and GAP Our study of maximizing submodular functions encompasses also a generalization of maximum coverage and a budgeted variant of the generalized assignment problem (GAP). These two problems can be cast as submodular optimization problems, however, the resulting universe sizes are nonpolynomial in the input size. As our algorithms cannot be directly applied to these problems, more specialized techniques need to be used to obtain approximation algorithm for each of the problems. The problem of maximum coverage with multiple packing and cost constraints (MC) is the following generalization of maximum coverage. Given is a collection of sets S = {S 1,..., S m } over a ground set A = {a 1,..., a n }. Each element a j has a profit p j 0 and a d 1 -dimensional size vector s j = (s j,1,..., s j,d1 ), such that s j,r 0 for all 1 r d 1. Each set S i has d 2 -dimensional weight vector w i = (w i,1,..., w i,d2 ). Also, given is a d 1 - dimensional capacity vector B = (B 1,..., B d1 ), and a d 2 -dimensional weight limit vector W = (W 1,..., W d2 ). A solution for the problem is a collection of subsets H S and a subset of elements E A, such that for any a j E there is S i H such that a j S i. A solution is feasible if the total weight of subsets in H is bounded by W, and the total size of the elements in E is bounded by B. The profit of a solution (H, E) is the total profit of the elements in E. The objective is to find a feasible solution with maximal profit. MC has a natural application in Video-on-Demand systems. Consider a 6

server which has several resources. Movie files can be stored at the server: this requires purchasing the movies, as well as allocating some storage space. The system offers video services to a large set of customers. Each customer is willing to pay a certain amount for viewing a movie on her individually chosen list. The transmission of a movie to the customer requires some resources, such as bandwidth and manpower to handle the request. The objective is to select a collection of movies, and a subset of customers to be serviced, such that the total profit is maximized. Indeed, the above problem can be modeled as an instance of MC, where each movie is represented by a set containing all the customers willing to view this movie; the elements to be covered are the customers. The problem of maximum coverage with multiple packing and cost constraints can be cast as an instance of SUB, however, with a non-polynomial size universe. In Chapter 4 we show how the ideas of Chapter 3 can be applied also for MC, despite the large size of the universe. The resulting algorithm yields an (α f ε) approximation for the problem, where f is the maximal number of sets to which a single element belongs, and ( ) f α f = 1 1 1 f (note that for any f, αf > 1 e 1 ). The initial motivation for this research is a special case of MC that we call maximum coverage with packing constraint (MC 1 ). In this problem there is a single cardinality constraint over the sets, and a single knapsack constraints over the elements. In Appendix B we show that a simple greedy algorithm attains a (1 e 1 )-approximation for MC 1. The second problem that we consider is a budgeted variant of the wellknown GAP. We first give a few definitions. An instance of the separable assignment problem (SAP) consists of n items A = {a 1,..., a n } and m bins. Each bin i has an associated collection of feasible sets (of items) I i which is down-closed (i.e., S I i implies S I i for any S S). Also, a profit p i,j 0 is gained from assigning the item a j to bin i. The goal is to choose disjoint feasible sets S i I i so as to maximize m i=1 a j S i p i,j. The collection of inputs for GAP is the restricted class of inputs for SAP in which the sets I i are defined by a knapsack constraint. The best approximation ratio for GAP is (1 e 1 + ε), for some ε > 0, due to [15]. Our algorithm, however, uses the (1 e 1 )-approximation for the problem given in [17]. We consider budgeted GAP (BGAP), where each item a j has a d 1 - dimensional cost vector c i,j 0, incurred when a j is assigned to bin i. Also, given is a global d 1 -dimensional budget vector L. The objective is to find a maximal profit solution whose total cost is at most L. BGAP arises in 7

many real-life scenarios, in particular, in operations research (e.g., inventory planning with delivery costs). We consider the slightly more general budgeted linear constrained separable assignment problem (BSAP). The difference between BGAP and BSAP is that in the latter the set of feasible assignments for each bin is defined by d 2 knapsack constraints, rather than a single constraint. Similarly to MC, BGAP can also be cast as an instance of SUB with non-polynomial size universe. In Chapter 5 we give a (1 e 1 ε) approximation algorithm for the problem for any ε > 0, based on the ideas of Chapter 3. In deriving our approximation algorithms for MC and BGAP, we apply our general technique while exploiting special properties of these problems. 1.3 The d-dimensional Knapsack Problem Finally, we consider the special case of SUB in which the objective function is linear. In this case, the problem reduces to the classic d-dimensional knapsack. Given is a set of n items {1,..., n}, where each item i has a d-dimensional size vector s i 0, and a profit p i > 0. Also, given is a d- dimensional bin whose capacity is B = (B 1,..., B d ). A feasible solution is a subset of the items A A such that the total size of items in A in dimension r, is bounded by B r, for all 1 r d. The objective is to find a feasible solution of maximum total profit. The special case where d = 1 is the classic 0-1 knapsack problem. We study the efficiency of finding (1 ε)-approximations for d-dimensional knapsack. A maximization problem Π admits a polynomial- time approximation scheme (PTAS) if there is an algorithm A(I, ε) such that, for any ε > 0 and any instance I of Π, A(I, ε) outputs a (1 ε)-approximate solution in time I f(1/ε) for some function f. As ε gets smaller, the exponent of the polynomial I f(1/ε) may become very large. Two important restricted classes of approximation schemes were defined to eliminate this dependence. An efficient polynomial-time approximation scheme (EPTAS) is a PTAS whose running time is f(1/ε) I O(1), whereas a fully polynomial time approximation scheme (FPTAS) runs in time (1/ε) O(1) I O(1). It is well known that, unless P = N P, there is no FPTAS for d- dimensional knapsack, already for d = 2 [27, 36] (see also [29],[22]). The best known result is a PTAS due to Frieze and Clarke [19], for the case where d is a fixed constant. As d-dimensional knapsack does not admit an FPTAS, a fundamental 8

open question is whether there exists an EPTAS. In Chapter 6 we resolve this question by showing that there is no EPTAS for two-dimensional knapsack, unless W [1] = F P T. 2 Furthermore, we show that unless all problems in SNP are solvable in sub-exponential time, there is no approximation scheme for two-dimensional knapsack whose running time is f(1/ε) I o( 1/ε), for any function f (based on results of [10]). 3 Together, the two results suggest that a significant improvement over the running time of the scheme of [19] is unlikely to exist. We note that d-dimensional knapsack can be viewed as a dual to the problem of covering integer program with multiplicity constraints (CIP). In this problem, we must fill up a d-dimensional bin by selecting (with bounded number of repetitions) from a set of d-dimensional items, such that the overall cost is minimized. Our hardness proof for d-dimensional knapsack applied also for CIP, as shown in [32]. 1.4 Overview of the Thesis Our main result for maximizing submodular function subject to multiple knapsack constraints is given in Chapter 3. Next, Chapter 4 presents our results for maximum coverage with packing and cost constraints, and Chapter 5 presents the results for the budgeted generalized assignment problem. The hardness results for d-dimensional knapsack are given in Chapter 6. We conclude with a discussion in Chapter 7. Some basic properties of submodular functions are given in Appendix A, a simple greedy algorithm for maximum coverage with a single packing constraint (MC 1 ) is given in Appendix B. 2 This result appeared in [31] 3 For the recent theory of fixed-parameter algorithms and parameterized complexity, see, e.g., [18, 12]. 9

Chapter 2 Related Work There has been extensive work on maximizing submodular monotone functions subject to matroid constraint. 1 For the special case of uniform matroid, i.e., the problem {max f(s) : S k}, for some k > 1, Nemhauser et. al showed in [38] that a greedy algorithm yields a ratio of 1 e 1 to the optimum. Later works presented greedy algorithms that achieve this ratio for other special matroids or for variants of maximum coverage (see, e.g., [1, 28, 40, 7]). For a general matroid constraint, Calinescu et al. showed in [4] that a scheme based on solving a continuous relaxation of the problem followed by pipage rounding (a technique introduced by Ageev and Sviridenko [1]) achieves the ratio of 1 e 1 for maximizing submodular monotone functions that can be expressed as a sum of weighted rank functions of matroids. Subsequently, this result was extended by Vondrák [41] to general monotone submodular functions. The bound of 1 e 1 is the best possible for all of the above problems. This follows from the lower bound of Nemhauser and Wolsey [37] in the oracle model, and the later result of Feige [14] for the specific case of maximum coverage, under the assumption that P NP. Other variants of monotone submodular optimization were also considered. In [2], Bansal et al. studied the problem of maximizing a monotone submodular function subject to n knapsack constraints, for arbitrary n 1, where each element appears in up to k constraints, and k is fixed. The paper presents a 8ek e 1 and e2 k e 1 + o(k) approximations for this problem. Demaine and Zadimoghaddam [11] studied bi-criteria approximations for monotone submodular set function optimization. 1 A (weighted) matroid is a system of independent subsets of a universe, which satisfies certain hereditary and exchange properties [39]. 10

The problem of maximizing a non-monotone submodular function has been studied as well. Feige et al. [16] considered the (unconstrained) maximization of a general non-monotone submodular function. The paper gives several (randomized and deterministic) approximation algorithms, as well as hardness results, also for the special case where the function is symmetric. Lee et al. [35] studied the problem of maximizing a general submodular function under linear and matroid constraints. They presented algorithms that achieve approximation ratio of 1/5 ε for the problem with d linear constraints and a ratio of 1/(d + 2 + 1/d + ε) for d matroid constraints, for any fixed integer d 1. Improved lower and upper bounds for non-constrained and constrained submodular maximization were recently derived by Gharan and Vondrák [23]. However, this paper does not consider knapsack constraints. As for continuous SUB, we first note that, for some specific families of submodular functions, linear programming can be used to derive approximation algorithms (see, e.g, [1, 4]). For monotone submodular functions, Vondrák presented in [41] a (1 e 1 o(1))-approximation algorithm for the continuous problem. Subsequently, Lee et al. [35] considered the problem of maximizing any submodular function with multiple knapsack constraints and developed a ( 1 4 o(1))-approximation algorithm for the continuous problem; however, noting that the rounding method of [34], which proved useful for monotone functions, cannot be applied in the non-monotone case, a ( 1 5 ε)-approximation was obtained for the discrete problem, by using simple randomized rounding. This gap of approximation ratio between the continuous and the discrete case led us to further develop the technique in [34], so that it can be applied also for non-monotone functions. The well studied GAP is a fundamental resource allocation problem. A (1 e 1 )-approximation for the problem is given in [17], as a special case of a more general framework for obtaining approximation algorithms for SAP. The best approximation ratio for GAP is (1 e 1 + ε), for some ε > 0, due to [15]. Also, the paper [6] shows that it is NP -hard to attain approximation ratio better than 10/11 for the GAP. As for BGAP, a ( 1 e 1 2 e 1 ε) -approximation was given in [30] for the case where d 1 = 1, as an example for the usage of a Lagrangian relaxation technique. While the classic 0-1 knapsack problem admits an FPTAS, i.e., for any ε > 0, a (1 ε)-approximation for the optimal solution can be found in O(n/ε 2 ) steps [25, 21], packing in higher dimensions (also known as d- dimensional vector packing) is substantially harder to solve, exactly or approximately. It is well known that, unless P = NP, there is no FPTAS 11

for d-dimensional knapsack, already for d = 2 [27, 36] (see also [29],[22]). The best known result is a PTAS due to Frieze and Clarke [19], for the case where d is a fixed constant. The running time of the scheme of [19] is O ( n d/ε I O(1)), where I is the input size. Subsequently, a scheme with improved running time of O(n d/ε d ) was given by Caprara et al. [5]. We note that, for the case where d = 1, an EPTAS exists also for the multiple knapsack problem (see the work of Jansen [26]). The knapsack problem and its variants have been widely studied (see, e.g the comprehensive survey in [27]). Recent Developments: Subsequent to our study of maximizing monotone submodular functions subject to multiple knapsack constraints [34], Chekuri et al. [8] showed that, by using a more sophisticated rounding technique, the algorithm in [34] can be applied to derive a (1 e 1 ε)-approximation for maximizing a submodular function subject to d knapsack constraints and a matroid constraint. Specifically, given a fractional solution for the problem, the authors define a probability distribution over the solution space, such that all elements in the domain of the distribution lie inside the matroid; also, these elements satisfy Chernoff-type concentration bounds, which can be used to prove some of the probabilistic claims in [34]. The desired approximation ratio is obtained by using the algorithm of [34], with sampling replaced by the above distribution in the rounding step. Recently, the same set of authors improved in [9] the bound of (1/4 ε) presented here to 0.325. 12

Chapter 3 Maximizing Submodular Functions In this chapter we describe our framework for maximizing a submodular set function subject to multiple constraints (SUB). 3.1 Preliminaries Notation An essential component in our framework is the distinction between elements by their costs. We say that an element i is small if c(i) ε 3 L; otherwise, the element is big. Given a universe U, we call a subset of elements S U feasible if the total cost of elements in S is bounded by L. We say that S is ε-nearly feasible (or nearly feasible, if ε is known from the context) if the total cost of the elements in S is bounded by (1 + ε) L. We refer to f(s) as the value of S. Similar to the discrete case, ȳ [0, 1] U is feasible if ȳ P. For any subset T U, we define f T : 2 U R + by f T (S) = f(s T ) f(t ). It is easy to verify that if f is a submodular set function then f T is also a submodular set function. Finally, for any set S U, we define c(s) = i S c(i), and c r(s) = i S c r(i). For a fractional solution ȳ [0, 1] U, we define c r (ȳ) = i U c r(i) y i and c(ȳ) = i U c(i) y i. Overview Our algorithm consists of two main phases, to which we refer as rounding procedure and profit enumeration. The rounding procedure yields an (α O(ε))-approximation for instances in which there are no big elements, using an α-approximate solution for the continuous problem. It 13

relies heavily on Theorem 3.1 that gives conditions on the probabilistic domain of solutions, which guarantee that the expected profit of the resulting nearly feasible solution is high. This solution is then converted to a feasible one, by using a fixing procedure. We first present a randomized version and later show how to derandomize the rounding procedure. The profit enumeration phase uses enumeration over the most profitable elements in an optimal solution, to reduce a general instance to another instance with no big elements, on which we apply the rounding procedure. Finally, we combine the above results with an algorithm for the continuous problem (e.g., the algorithm of [41], or [35]) to obtain approximation algorithm for SUB. 3.2 A Probabilistic Theorem We first prove a general probabilistic theorem that refers to a slight generalization of our problem (called generalized SUB). In addition to the standard input for the problem, there is also a collection of subsets M 2 U, such that if T M and S T then S M. The goal is to find a subset S M, such that c(s) L and f(s) is maximized. Theorem 3.1 For a given input of generalized SUB, let χ be a distribution over M and D a random variable D χ, such that 1. E [f(d)] O/5, where O is an optimal solution for the given instance. 2. For 1 r d, E[c r (D)] L r 3. For 1 r d, c r (D) = m k=1 c r(d k ), where D k χ k and D 1,..., D m are independent random variables. 4. For any 1 k m and 1 r d, it holds that either c r (D k ) ε 3 L r or c r (D k ) is fixed. Let D = D if D is ε-nearly feasible, and D = otherwise. Then D is always ε-nearly feasible, D M, and E[f(D )] (1 O(ε))E[f(D)]. In this chapter we use a special case of this theorem, as described in the next result. We use Theorem 3.1 in its full generality in developing approximation algorithms for our variants of maximum coverage and GAP (see chapters 4,5 ). 14

Lemma 3.2 Let x [0, 1] U be a feasible fractional solution such that F ( x) O/5, where O is the optimal solution for the integral problem. Define D U to be a random set such that D x (i.e., for all i U, i R with probability x i ). and let D be a random set such that D = D if D is ε-nearly feasible, and D = otherwise. Then D is always ε-nearly feasible, and E[f(D )] (1 O(ε))F ( x). Proof of Theorem 3.1: Define an indicator random variable F such that F = 1 if D is ε-nearly feasible, and F = 0 otherwise. Claim 3.3 P r[f = 0] dε. Proof: For any dimension 1 r d, it holds that E[c r (D)] = m k=1 E[c r(d k )] L r. Define V r = {k c r (D k ) is not fixed}. Then, V ar[c r (D)] = m V ar[c r (D k )] E[c 2 r(d k )] E[c r (D k )] ε 3 L r k V r k V r k=1 m ε 3 L r E[c r (D k )] ε 3 L 2 r. k=1 The first inequality holds since V ar[x] E[X 2 ], and the second inequality follows from the fact that c r (D k ) ε 3 L r for k V r. Recall that, by the Chebyshev-Cantelli inequality, for any t > 0 and a random variable Z, Thus, P r [Z E[Z] t] V ar[z] V ar[z] + t 2. P r [c r (D) (1 + ε)l r ] = P r [c r (D) E[c r (D)] (1 + ε)l r E[c r (D)]] P r [c r (D) E[c r (D)] ε L r ] ε3 L 2 r ε 2 L 2 r = ε. By the union bound, we have that P r[f = 0] d P r[c r (D) (1 + ε)l r ] dε. r=1 For any dimension 1 r d, let R r = cr(d) L r, and define R = max r R r, then R denotes the maximal relative deviation of the cost from the r-th 15

entry in the budget vector, where the maximum is taken over 1 r d. Claim 3.4 For any l > 1, P r[r > l] < dε3 (l 1) 2. Proof: By the Chebyshev-Cantelli inequality we have that, for any dimension 1 r d, P r[r r > l] = P r[c r (D) > l L r ] and by the union bound, we get that P r [c r (D) E[c r (D)] > (l 1)L r ] ε 3 L 2 r ε 3 (l 1) 2 L 2 r (l 1) 2, P r[r > l] dε3 (l 1) 2. Claim 3.5 For any integer l > 1, if R l then f(d) 2dl O. Proof: The set D can be partitioned to 2dl sets D 1,... D 2dl such that each of these sets is a feasible solution. Hence, f(d i ) O. Thus by lemma A.1, f(d) f(d 1 ) +... + f(d 2dl ) 2dlf(O). Combining the above results we have Claim 3.6 E[f(D )] (1 O(ε))E[f(D)]. 16

Proof: By Claims 3.3 and 3.4, we have that E[f(D)] = E [f(d) F = 1] P r [F = 1] + E [f(d) F = 0 R < 2] P r [F = 0 (R < 2)] [ ] + E f(d) F = 0 (2 l R 2 l+1 ) l=1 P r [ ] F = 0 (2 l R 2 l+1 ) E[f(D) F = 1] P r [F = 1] + 4d 2 ε O + d 2 ε 3 O l=1 2 l+2 (2 l 1 ) 2. Since the last summation is a constant, and E[f(D)] O/5, we have that E[F (D)] E[f(D) F = 1]P r [F = 1] + ε c E[F (D)], where c > 0 is some constant. It follows that (1 O(ε))E[f(D)] E[f(D) F = 1]P r [F = 1]. Finally, since D = D if F = 1 and D = otherwise, we have that E[f(D )] = E[f(D) F = 1] P r [F = 1] (1 O(ε))E[f(D)]. By definition, D is always ε-nearly feasible, and D M. This completes the proof of Theorem 3.1. 3.3 Rounding Instances with No Big Elements In this section we present an (α O(ε))-approximation algorithm for SUB inputs with no big elements, given an α approximate solution for the continuous problem. A main advantage of inputs with no big elements is that any nearly feasible solution can be easily converted to a feasible one with only a slight decrease in the total value. Lemma 3.7 If S U is an ε-nearly feasible solution with no big elements, then S can be converted in polynomial time to a feasible solution S S, such that f(s ) (1 O(ε)) f(s). 17

Proof: In fixing the solution S we handle each dimension separately. For any dimension 1 r d, if c r (S) L r then no modification is needed; otherwise, c r (S) > L r. Since all elements in S are small, we can partition S into l disjoint subsets S 1, S 2,..., S l such that εl r c r (S j ) < (ε + ε 3 )L r for any 1 j l, where l = Ω(ε 1 ). Since f is submodular, by Lemma A.3 we have f(s) l j=1 f S\S j (S j ). Hence, there is some 1 j l such that f S\Sj (S j ) f(s) l = f(s) O(ε) (note that f S\Sj (S j ) can have a negative value). Now, c r (S \ S j ) L r, and f(s \ S j ) (1 O(ε))f(S). We repeat this step in each dimension to obtain a feasible set S with f(s ) (1 O(ε))f(S). Combined with Theorem 3.1, we have the following rounding algorithm. Randomized rounding algorithm for SUB with no big elements Input: A SUB instance, a feasible solution x for the continuous problem, with F ( x) O/5. 1. Define a random set D x. Let D = D if D is ε-nearly feasible, and D = otherwise. 2. Convert D to a feasible set D as in the proof of Lemma 3.7 and return D. Clearly, the algorithm returns a feasible solution for the problem. By Theorem 3.1, E[f(D )] (1 O(ε))F ( x). By Lemma 3.7, E[f(D )] (1 O(ε))F ( x). Hence, we have Lemma 3.8 For any instance of SUB with no big elements, any feasible solution x for the continuous problem with F ( x) O/5 can be converted to a feasible solution for SUB in polynomial run time with expected profit at least (1 O(ε)) F ( x). 3.4 Approximation Algorithm for SUB Given an instance of SUB and a subset T U, define another instance of SUB, to which we refer as the residual problem with respect to T, with f remaining the objective function. Let L = L c(t ) be the new budget, and the universe U consists of all elements i U \ T such that c(i) ε 3 L, and all elements in T (formally, U = T { i U \ T c(i) ε 3 L } ). The new cost of element i is c (i) = c(i) for any i U \ T, and c (i) = 0 for any i T. It follows that there are no big elements in the residual problem. Let S be a feasible solution for the residual problem with respect to T. Then 18

c(s) c (S) + c(t ) L + c(t ) = L, which means that any feasible solution for the residual problem is also feasible for the original problem. Consider the following algorithm. Approximation algorithm for SUB Input: A SUB instance and an α-approximation algorithm A for the continuous problem with respect to the function f. 1. For any T U such that T h = d ε 4 (a) Use A to obtain an α-approximate solution x for the continuous residual problem with respect to T. (b) Use the rounding algorithm of Section 3.3 to convert x to a feasible solution S for the residual problem (note the residual problem has no big elements). 2. Return the best solution found. Lemma 3.9 The above approximation algorithm returns (α O(ε))-approximate solution for SUB and uses a polynomial number of invocations of algorithm A. Proof: By Lemma 3.8, in each iteration the algorithm finds a feasible solution S for the residual problem. Hence, the algorithm always returns a feasible solution for the given SUB instance. Let O = {i 1,..., i k } be an optimal solution for the input I (we use O to denote both an optimal sub-collection of elements and the optimal value). For l 1, let K l = {i 1,..., i l }, and assume that the elements are ordered by their residual profits, i.e., i l = argmax i O\Kl 1 f Kl 1 ({i}). Consider the iteration in which T = K h, and define O = O U. The set O is clearly a feasible solution for the residual problem with respect to T. We show a lower bound for f(o ). The set R = O \ O consists of elements in O \ T that are big with respect to the residual instance. The total cost of elements in R is bounded by L (since O is a feasible solution), and thus R ε 3 d. Since T = K h, for any j O \ T it holds that f T (j) f(t ) T, and by Lemma A.1 we get f T (R) j R f T ({j}) ε 3 d f(t ) T = εf(t ) εo Thus, by Lemma A.2, f O (R) f T (R) εo. Since f(o) = f(o ) + f O (R) f(o ) + εf(o), we have that f(o ) (1 ε)f(o). 19

Thus, in this iteration we get a solution x for the residual problem with F ( x) α(1 ε)f(o), and the solution S obtained after the rounding satisfies f(s) (1 O(ε))αf(O). We summarize in the next result. Theorem 3.10 Let f be a submodular function, and suppose there is a polynomial time α-approximation algorithm for the continuous problem with respect to f. Then there is a polynomial time randomized (α ε)-approximation algorithm for SUB with respect to f, for any ε > 0. Since there is a (1/4 o(1)) approximation algorithm for the continuous problem on general instances [35], we have Theorem 3.11 There is a polynomial time randomized (1/4 ε)-approximation algorithm for SUB, for any ε > 0. Since there is a (1 e 1 o(1)) approximation algorithm for SUB with monotone objective function [41] we have Theorem 3.12 There is a polynomial time randomized (1 e 1 ε)-approximation algorithm for SUB with monotone objective function, for any ε > 0. 3.5 Derandomization In this section we show how the algorithm of Section 3.3 can be derandomized, assuming we have an oracle for F, the extension by expectation of f. For some families of submodular functions F can be directly evaluated; for a general function f, F can be evaluated with very good accuracy by sampling f, as in [41]. Throughout the section we assume that the SUB instance we handle has no big elements. The main idea is to reduce the number of fractional entries in the fractional solution x, so that the number of values a random set D x can get is polynomial in the input size (for a fixed value of ε). Then, we go over all the possible values, and we are promised to obtain a solution of high value. A key ingredient in our derandomization is the pipage rounding technique of Ageev and Sviridenko [1]. We give below a brief overview of the technique. For any element i U, define the unit vector ī {0, 1} U, in which i j = 0 for any j i, and i i = 1. Given a fractional solution x for the problem and two elements i, j, such that x i and x j are both fractional, consider the vector function x i,j (δ) = x + δī δ j (Note that x i,j (δ) is equal to x in all entries 20

except i, j). Let δ + x,i,j and δ x,i,j (for short, δ+ and δ ) be the maximal and minimal value of δ for which x i,j (δ) [0, 1] U. In both x i,j (δ + ), x i,j (δ ), the entry of either i or j is integral. Define Fi,j x (δ) = F ( x i,j(δ)) over the domain [δ, δ + ]. The function Fi,j x is convex (see [3] for a detailed proof), thus x = argmax { xi,j (δ + ), x i,j (δ )}F ( x) has fewer fractional entries than x, and F ( x ) F ( x). By appropriate selection of i, j, such that x maintains feasibility (in some sense), we can repeat the above step to gradually decrease the number of fractional entries. We use the technique to prove the next result. Lemma 3.13 Let x [0, 1] U be a solution having k or less fractional entries (i.e., {i 0 < x i < 1} k), and c( x) L for some L. Then x can be ) d converted to a vector x with at most k = fractional entries, such ( 8 ln(2k) ε that c( x ) (1 + ε) L, and F ( x ) F ( x), in time polynomial in k. Proof: Let U = {i 0 < x i < 1} be the set of all fractional entries. We define a new cost function c over the elements in U. c r(i) = c r (i) i / U 0 c r (i) ε Lr 2k ε L r 2k (1 + ε/2)j ε Lr 2k (1 + ε/2)j c r (i) < ε Lr 2k (1 + ε/2)j+1 Note that for any i U, c (i) c(i), and c r (i) (1 + ε/2)c r(i) + ε L r 2k, for all 1 r d. The number of different values c r(i) can get for i U is bounded by 8 ln(2k) ε (since all elements are small, and ln(1 + x) x/2). Hence the number of different values c (i) can get for i U is bounded by ) d. k = ( 8 ln(2k) ε We start with x = x, and while there are i, j U such that x i and x j are both fractional and c (i) = c (j), define δ + = δ + x,i,j and δ = δ x,i,j. Since i and j have the same cost (by c ), it holds that c ( x i,j (δ + )) = c ( x i,j (δ )) = c ( x). If Fi,j x (δ+ ) F ( x), then set x = x i,j (δ + ), otherwise x = x i,j (δ ). In both cases F ( x ) F ( x ) and c ( x ) = c ( x ). Now, repeat the step with x = x. Since in each iteration the number of fractional entries in x decreases, the process will terminate (after at most k iterations) with a vector x such that F ( x ) F ( x), c ( x ) = c ( x) L and there are no two 21

elements i, j U with c (i) = c (j) where x i and x j are both fractional. Also, for any i / U, the entry x i is integral (since x i was integral and the entry was not modified by the process). Thus, the number of fractional entries in x is at most k. Now, for any dimension 1 r d, c r ( x ) = i/ U x ic r (i) + i U x ic r (i) (1 + ε/2) x i c r(i) + i/ U = (1 + ε/2) i U i U x i ( (1 + ε/2)c r(i) + ε L ) r 2k x i c r(i) + i U x i ε L r 2k (1 + ε)l r. This completes the proof. Using the above lemma, we can reduce the number of fractional entries in x to a number that is poly-logarithmic in k. However, the number of values D x remains super-polynomial. To reduce further the number of fractional entries, we apply the above step twice, that is, we convert x with at most U fractional entries to x with at most k = (8 ln(2 U )/ε) d. We can then apply the conversion again, to obtain x with at most k = O(log U ) fractional entries. Lemma 3.14 Let x [0, 1] U such that c( x) L, for some L, and ε > 0 a constant. Then x can be converted to a vector x with at most k = O(log U ) fractional entries, such that c( x ) (1+ε) 2 L, and F ( x ) F ( x), in polynomial time in U. The next result follows immediately from Lemma 3.2 (O is the optimum for SUB). Lemma 3.15 Given x [0, 1] U such that x is a feasible fractional solution with F ( x) O/5, there is D in the domain of the random variable D x such that D is nearly feasible, and F (D) (1 O(ε))F ( x). Consider the following rounding algorithm. Deterministic rounding algorithm for SUB with no big elements Input: A SUB instance, a feasible solution x for the continuous problem, with F ( x) O/5. 1. Define x = (1 + ε) 2 x (note that F ( x ) (1 + ε) 2 F ( x)). 22

2. Convert x to x such that x is fractionally feasible, the number of fractional entries in x is O(log U ), and F ( x) (1 + ε) 2 F ( x ) (1 e 1 O(ε))O, as in Lemma 3.14. 3. Enumerate over all possible values D for D x. For each such value if D is ε-nearly feasible convert it to a feasible solution D (see Lemma 3.7). Return the solution with maximum value among the feasible solutions found. By Theorem 3.1, the algorithm returns a feasible solution of value at least (1 O(ε))F ( x). Also, the running time of the algorithm is polynomial when ε is a fixed constant. Replacing the randomized rounding in the algorithm of Section 3.4 with the above we get the following result. Theorem 3.16 Let f be a submodular function, and assume we have an oracle for F. If there is a deterministic polynomial time α-approximation algorithm for the continuous problem with respect to f, then there is a polynomial time deterministic (α ε)-approximation algorithm for SUB with respect to f, for any ε > 0. Since, given an oracle to F, both the algorithms of [41] and [35] for the continuous problem are deterministic, we get the following. Theorem 3.17 Given an oracle for F, there is a polynomial time deterministic (1 e 1 ε)-approximation algorithm for SUB with a monotone function, for any ε > 0. Theorem 3.18 Given an oracle for F, there is a polynomial time deterministic (1/4 ε)-approximation algorithm for SUB for any ε > 0. For the restricted of maximum coverage with d knapsack constraint, that is SUB where the objective function is f = f G, p for a given bipartite graph G and profits p, the function F can be evaluated deterministically (see [1]). This leads to the following theorem. Theorem 3.19 There is a polynomial time deterministic (1 e 1 ε)- approximation algorithm for maximum coverage with d knapsack constraints. 23

Chapter 4 Maximum Coverage with Multiple Packing and Cost Constraints In this chapter we consider the problem of maximum coverage with multiple packing and cost constraints (MC). Recall that the problem of maximum coverage with multiple packing and cost constraints (MC) is the following generalization of the maximum coverage problem. Given is a collection of sets S = {S 1,..., S m } over a ground set A = {a 1,..., a n }. Each element a j has a profit p j 0 and a d 1 -dimensional size vector s j = (s j,1,..., s j,d1 ), such that s j,r 0 for all 1 r d 1. Each set S i has d 2 -dimensional weight vector w i = (w i,1,..., w i,d2 ). Also given is a d 1 -dimensional capacity vector B = (B 1,..., B d1 ), and a d 2 -dimensional weight limit vector W = (W 1,..., W d2 ). A solution for the problem is a collection of subsets H and a subset of elements E, such that for any a j E there is S i H such that a j S i. A solution is feasible if the total weight of subsets in H is bounded by W, and the total size of elements in E is bounded by B. The profit of a solution (H, E) is the total profit of elements in E. The objective of the problem is to find a feasible solution with maximal profit. Denote the maximal number of sets a single element belongs to by f, and let ( α f = 1 1 1 ) f. (4.1) f We give below an (α f ε)-approximation algorithm for the problem. (Note that, for any f 1, α f > 1 e 1.) In solving MC, our algorithm uses the following continuous version of the 24

problem. Let ȳ [0, 1] S A and x [0, 1] S. For short, we write y i,j = y Si,a j and x i = x Si. Given an input for MC, we say that (ȳ, x) is a solution if, for any S i S and a j / S i it holds that y i,j = 0, and for any S i S and a j A it holds that y i,j x i. Intuitively, x i is an indicator for the selection of the set S i into the solution, and y i,j is an indicator for the selection of the element a j by the set S i into the solution. We say that such a solution is feasible if, for any 1 r d 1 it holds that a j A s j,r S i S y i,j B r (the total size of elements does not exceed the capacity), and for any 1 r d 2 it holds that S i S x i w i,r W r (the total weight of subsets does not exceed the weight limit). The value (or profit) of the solution is defined by p(ȳ, x) = p(ȳ) = a j A min{1, S i S y i,j} p j. By the above definition, a solution consists of fractional values. We say that a solution ( x, ȳ) is semifractional if x {0, 1} S (that is, sets cannot be fractionally selected, but elements can be). Also, we say that a solution is integral if both x {0, 1} S and ȳ {0, 1} S A. Two computational problems arise from the above definitions. The first is to find a semi-fractional solution of maximal profit. We refer to this problem as the semi-fractional problem. The second is to find an integral solution of maximal profit, to which we refer as the integral problem. It is easy to see that the integral problem is equivalent to MC, then our objective is to find an optimal solution for the integral problem. Overview To obtain approximation algorithm for the integral problem, we first show how it relates to the semi-fractional problem. More specifically, we show that, given a (α f O(ε))-approximation algorithm for the semi-fractional problem we can derive approximation algorithm with the same approximation ratio for the integral problem. Next, we interpret the semi-fractional problem as a submodular optimization problem with multiple linear constraints and an infinite universe. We use the framework developed in Section 3 to solve this problem. As direct enumeration over the most profitable elements in an optimal solution is impossible here, we guess which sets are the most profitable in an optimal solution. We use this guessing to obtain a fractional solution (with polynomial number of non-zero entries), such that the conditions of Theorem 3.1 are satisfied. Together with a fixing procedure, applied to the obtained (nearly feasible) solution, this leads to our approximation algorithm. The process can be derandomized by using the same tools as in Section 3.5. 25

4.1 Reduction to the Semi-fractional Problem First, we show that a semi-fractional solution for the problem can be converted to a solution with at least the same profit and at most d 1 fractional entries. Next, we show how this property enables to enumerate over the most profitable elements in an optimal solution. Throughout this section we assume that, for some constant α (0, 1), we have an α-approximation algorithm for the semi-fractional problem. Lemma 4.1 Let (ȳ f, x f ) be a feasible semi-fractional solution. Then ȳ f can be converted in polynomial time to another feasible semi-fractional solution (ȳ, x f ) with at most d 1 fractional entries, such that p(ȳ) p(ȳ f ) in polynomial time. Proof: Let (ȳ f, x f ) be a semi-fractional feasible solution. W.l.o.g, we assume that a j A, S i S yf i,j 1, and if yf i,j 0 then for any S i S i it holds that y f i,j = 0. Note that any solution can be converted to such solution with the same profit easily. If there are more than d 1 fractional entries, let s j1,... s jk be the size vectors of the corresponding elements, and let S i1... S ik be the corresponding sets. As k > d 1 there must be a linear dependency between the vectors, w.l.o.g we can write it as λ 1 s j1 +... λ p s jp = 0 for p = d 1 + 1. We can define ȳ f (ε) by y f i l,j l (ε) = y f i l,j l + ελ l for 1 l p, and y f i,j (ε) = yf i,j for any other entry. As long as ȳf (ε) [0, 1] S A, ȳ f (ε) is a semi-fractional feasible solution. Let ε + and ε be the maximal and minimal values of ε for which ȳ f (ε) [0, 1] S A. The number of fractional entries in ȳ f (ε + ) and ȳ f (ε ) is smaller than the number of fractional entries in ȳ f. Also, p(ε) = p(ȳ f (ε)) is a linear function, thus either p(ȳ f (ε + )) p(ȳ f ) or p(ȳ f (ε )) p(ȳ f ). This means that we can convert ȳ f to a feasible solution ȳ that has less fractional entries, and p(ȳ ) p(ȳ f ). By repeating the above process as long as there are more than d 1 fractional entries, we can obtain a fractional solution with d 1 fractional entries or less. We use Lemma 4.1 to prove the next result. Lemma 4.2 Given an α-approximation algorithm for the semi-fractional problem, an α-approximation algorithm for the integral problem can be derived in polynomial time. Proof: Given a collection T of pairs (a j, S i ) of an element a j and a set S i such that a j S i, denote the collection of sets in T by T S, and the collection 26

of elements in T by T E. We define a residual instance for the problem as follows. The elements are: A T = {a j A a j / T E, for any a j T E, p j p j }, where the size of a j A T is s j and the profit of a j is p j. The sets are S T = {S 1,..., S m}, where S i = S i A T, and w i = w i if S i / T S and w i = 0 if S i T S. The weight limit of this instance is W T = W w(t S ), where w(t S ) = S i T S w i, and the capacity is B T = B s(t E ), where s(t E ) = a j T E s j. Clearly, a solution of profit v for the residual instance with respect to a collection T gives a solution of profit v+p(t E ) for the original instance, where p(t E ) = a j T E p j. Let O = ( x, ȳ) be an optimal solution for the integral problem. W.l.o.g. we assume that for all a j A, S i S y i,j 1 (that is, no element is selected by more than one set). Let R be the collection of h = d 1 1 α most profitable elements a j for which there is S i such that y i,j = 1 (note that there is a unique set S i for each a j ). Define T O = {(a j, S i ) a j R y i,j = 1}. It is easy to verify that the optimal integral solution for the residual problem with respect to T O is O p(te O). Now, assume that we have an α-approximation algorithm for the semifractional problem. Then, when applied on the residual problem with repect to T O, it returns a semi-fractional solution ( x f, ȳ f ) with p(ȳ f ) α(o p(te O)). By Lemma 4.1, this solution can be converted to a solution ( xf, z) with up to d 1 fractional entries and p( z) p(ȳ f ). Now, consider rounding down to zero the value of each fractional entry in z. This will generate a new feasible integral solution z with p( z ) p(ȳ f ) d 1 p(te O) (as the profit T O of any element in the residual solution is bounded by p(t E O) ). This implies T O a solution for the original problem of value at least p(t O E ) + p(ȳ f ) d 1 p(t O E ) T O ( p(te O ) 1 d ) 1 T O + α(o p(te O )) αo That is, an α-approximation for the optimum. To use this technique, we need to guess the correct set T, which can be done in time (n m) O(1) for a constant α. In Theorem 4.8 we show that there is a polynomial time (α f ε)- approximation algorithm for the semi-fractional problem, where α f is defined in (4.1). This leads to the following theorem. Theorem 4.3 There is a polynomial time (α f ε)-approximation algorithm 27