Maximization of Submodular Set Functions

Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author: Jamshid Sourati Reviewers: Murat Acaaya Yeganeh M. Marghi Paula Gonzalez Supervisors: Jennifer G. Dy Deniz Erdogmus January 2015 * Cite this report in the following format: Jamshid Soruati, Maximization of Submodular Set Functions, Technical Report, BSPIRAL-150111-R0, Northeastern University, 2015.

Abstract In this technical report, we aim to give a simple yet detailed analysis of several various submodular maximization algorithms. We start from analyzing the classical greedy algorithm, firstly discussed by Nemhauser et al. 1978), that guarantees a tight bound for constrained maximization of monotonically submodular set functions. We then continue by discussing two randomized algorithms proposed recently by Buchbinder et. al, for constrained and non-constrained maximization of nonnegative submodular functions that are not necessarily monotone. 1

Contents 1 Introduction 3 2 Preliminaries 4 3 Maximization Algorithms 8 3.1 Monotone Functions............................... 8 3.2 Non-Monotone Functions............................. 10 3.2.1 Unconstrained Maximization....................... 11 3.2.2 Constrained Maximization........................ 16 2

1 Introduction Combinatorial optimization plays an important role in many real applications of different fields such as machine learning. It frequently happens that the encountered combinatorial optimization is an NP-hard problem, hence there is no efficient algorithm to find its global solution. Relaxation to continuous space is a popular approach to escape from NP-hardness, however, it requires employing heuristics for discretizing bac the obtained continuous solution. Fortunately, in many cases, the combinatorial objective turns out to be submodular Krause et al., 2008; Jegela and Bilmes, 2011; Kim et al., 2011), a family of set functions that shares both convexity and concavity properties of continuous functions Lovász, 1983). Nevertheless, they do not indicate similar behavior for different optimization problems, specifically they are easy to minimize, but NP-hard to maximize Bach, 2013). There is a yet growing literature on designing efficient algorithms for the latter problem, that is optimizing for a local maximum of such functions. Throughout this paper we assume that U is a finite set and values of the set function f are provided by an oracle for any subsets of U: Definition 1. A set function f : 2 U R is said to be submodular iff fa) + fb) fa B) + fa B), A, B U 1) Monotonicity of submodular functions is an important property based on which one can dichotomize the maximization algorithms. Definition 2. The set function f : 2 U R is said to be monotone nondecreasing) iff for every S T U we have fs) ft ). Here we consider both constrained and unconstrained maximization problems. Note that there are different types of constraints in this context, but we focus only on those that bound the cardinality of the solution from above. Problem 1. Unconstrained) Let f : 2 U R be a submodular set function. Find the subset maximizing f: arg max A U fa). Problem 2. Constrained) Let f : 2 U R be a submodular set function. Find the subset with a given maximum size > 0 maximizing f: arg max A U, A fa). As solving these cases are NP-hard, we always have to approximate the solution as close as possible to the real maximum. Every approximating algorithm is to be analyzed to show that the expected) objective value at the output is no less than a fraction of the true maximum value of the set function. Clearly, this fraction is between 0 and 1, and it is desired to be as large as possible. Definition 3. Suppose f : 2 U R is a set function with its maximum occurred at A U. An algorithm is said to be α-approximation of the maximization problem where 0 α 1 is called the approximation factor), iff its output A is guaranteed to satisfy fa) αfa ). 3

For randomized algorithms, the definition above would be the same over the expected value of the output, i.e. having E[fA)] αfa ). Almost all of the proposed maximization algorithms are either deterministic or randomized iterative approaches that try to approximately locate the maximum of the objective. Existing algorithms can be categorized into the following groups, based on the properties of the objective function and the optimization constraints: 1. Constrained maximization of monotone submodular set functions Nemhauser et al. 1978) proposed a 1 1 )-approximation greedy algorithm for maximizing monotone submodular functions with cardinality constraints. This algorithm is proven to e capture the optimal approximation factor in this case, assuming that values of f are provided by an oracle for any subsets of U. This optimality is in the sense that if a better approximation is to be achieved, the set function needs to be evaluated at exponentially many subsets Nemhauser and Wolsey, 1978). 2. Maximization of non-negative submodular set functions Maximizing nonmonotone functions is harder and most of the wor made the pre-assumption of nonnegativity for the set functions. a) Without Constraints- Feige et al. 2011) made one of the first attempts to design efficient algorithms and prove the corresponding lower-bounds for unconstrained maximization of non-negative functions that are not necessarily monotone. In addition to proposing a 2/5)-approximation algorithm he also proved that the optimal approximation factor in this case is 1/2. The hard results optimal approximation factors) in both monotone and non-monotone cases are re-derived using a joint framewor by Vondrá 2013) in which the so-called multilinear continuous extension of the set functions is used. More recently Buchbinder et al. 2012) gave a 1/3)-approximation deterministic and 1/2)-approximation randomized algorithms for unconstrained maximization of nonnegative submodular functions. b) With cardinality constraints- Moreover, cardinality constrained maximization is also recently considered for nonnegative functions that are not necessarily monotone by Buchbinder et al. 2014) in form of a random discrete and a deterministic continuous algorithms. In this technical report, we provide the main theoretical results related to each item listed above. Starting by the most basic algorithm proposed by Nemhauser et al. 1978) for maximizing monotone submodular set functions, we then discuss the randomized algorithm proposed by Buchbinder et al. 2012) for unconstrained maximizing nonnegative submodular set functions. Finally we give the randomized discrete algorithm from the wor by Buchbinder et al. 2014) for constrained maximization of nonnegative submodular functions. 2 Preliminaries Before going through details of the algorithms, here, we give some general lemmas and propositions that will be useful in the later analysis. Lemma 1 shows subadditivity of submodular 4

functions; Proposition 1, together with Definition 4, establishes an equivalent definition of submodularity; and Lemmas 2 and 3 prove two other useful properties of submodular functions. Lemma 1. Let f : 2 U R be a nonnegative submodular function and A U a nonempty subset with member-wise representation A = {a 1,..., a m }. Then fa) m f{a i}). Proof. By applying submodularity inequality to fa) iteratively we get: f{a 1,..., a m }) f{a 1,..., a m 1 }) + f{a m }) f ). f{a 1,..., a m 2 }) + f{a m 1 }) + f{a m }) 2f ) m f{a i }) m 1)f ). Nonnegativity of f implies that f ) 0 and therefore f{a 1,..., a m }) m f{a i}). Generally speaing, definition 1) is not easy to use immediately. It is more common to wor with an equivalent definition based on the discrete derivative function ρ f : Definition 4. Let f : 2 U R be a set function, A U and u U. The discrete derivative of function f is defined as ρ f A, u) = fa {u}) fa). 2) Proposition 1. Let f : 2 U R be a set function. f is submodular iff we have ρ f S, u) ρ f T, u), S T U, u U T 3) Proof. ): In definition 1 tae A = S {u} and B = T : fs {u}) + ft ) fs {u} T ) + fs {u}) T ) = ft {u}) + fs) fs {u}) fs) ft {u}) ft ) ρ f S, u) ρ f T, u). ): Tae A, B U. If A B we get fa B) + fa B) = fa) + fb) and the submodularity condition shown in 1) holds. Now assume A B hence A B. As U is finite A B U has a finite number of members and can be represented by A B = {u 1,..., u A B }. Now define the sets T i and S i for i = 0,..., A B 1 as the following T i = S i = { B, i = 0 B {u 1,..., u i }, i = 1,..., A B 1 { A B, i = 0 A B) {u 1,..., u i }, i = 1,..., A B 1 5

Notice that S i 1 T i 1 and u i / T i 1 u i U T i 1 for any i = 1,..., A B hence equation 3) can be applied: ρ f S i 1, u i ) ρ f T i 1, u i ), i = 1,..., A B Summing both sides of the inequalities above yields: A B A B ρ f S i 1, u i ) fs i 1 {u i }) fs i 1 ) A B A B ρ f T i 1, u i ) ft i 1 {u i }) ft i 1 ) fs A B 1 {u A B }) fs 0 ) ft A B 1 u A B ) ft 0 ), where we used the equality S i 1 {u i } = S i and the telescopic properties of the summation of the inequalities. Hence: fa B) {u 1,..., u A B }) fa B) fb {u 1,..., u A B }) fb) fa B) A B)) ff B) fb A B)) fb) fa) + fb) fa B) + fa B). We will also require the following lemmas in our later derivations: Lemma 2. If f : 2 U R is a submodular set function and B U, then the set function g B : 2 U R, defined as g B A) = fa B), A U is also submodular for any B U. Proof. Tae a subset B U. It suffices to show that ρ gb S, u) ρ gb T, u), S T U and u / T. We consider the following two cases: I) u B : ρ gb S, u) = g B S {u}) g B S) = fs B {u}) fs B) = fs B) fs B) = 0. Similarly we can verify that ρ gb T, u) = g B T {u}) g B T ) = 0 and thus the equality over the discrete derivatives holds. Therefore, ρ gb S, u) ρ gb T, u) is satisfied. II) u / B : ρ gb S, u) ρ gb T, u) = [g B S {u}) g B S)] [g B T {u}) g B T )] = [fs B {u}) fs B)] [ft B {u}) ft B)] = ρ f S B, u) ρ f T B, u) 0, where the last inequality follows directly from submodularity of f. 6

Lemma 3. If f : 2 U R is a submodular set function, then for every A, B U we have fa) fb) + ρ f B, u) ρ f A B {u}, u) 4) u A B u B A Proof. Tae two arbitrary sets A, B U. First assume A B hence A B. As A B is finite represent it by A B = {w 1,..., w A B } and write fa B) fb) in terms of its elements using the following telescopic sum: fa B) fb) = = A B A B A B [ { i }) f B w j f B ρ f B j=1 { i 1 ρ f B, w i ) = j=1 w j } w A B, w i ) { i 1 j=1 w j })] ρ f B, w), 5) where the last inequality is obtained by applying inequality 3) to all discrete derivatives inside the summation with T = B { i 1 j=1 w j}, S = B T and y = w i / B = T y U T for all i = 1,..., A B ). Also note that inequality 5) is valid even if A B, since fa B) fb) = fb) fb) = 0 = w A B= ρ fb, w). Now assume B A hence B A. Represent it by B A = {z 1,..., z B A } and similar to the previous case use telescopic sums properties: [ { B A i }) { i 1 })] fa B) fa) = f A z j f A z j = B A B A ρ f A j=1 { i 1 j=1 z j }, z i ) ρ f A B {z i }, z i ) = z B A j=1 ρ f A B {z}, z), 6) where the inequality here is derived by applying inequality 3) with T = A B {z i }, S = A {z j } i 1 j=1 T and y = z i / A B {z i } = T for all i = 1,..., B A ). The yielded inequality in 6) holds even when B A since fa B) fa) = fa) fa) = 0 = z B A= ρ fa B {z}, z). Finally multiply 1 to both sides of inequality 6) and add them to both sides of inequality 5) to get: fa) fb) ρ f B, w) ρ f A B {z}, z) w A B fa) fb) + w A B z B A ρ f B, w) 7 z B A ρ f A B {z}, z).

Algorithm 1: Greedy approach for solving Problem 2 for monotone and submodular functions Inputs: the source set U, the submodular set function f, the maximum size of the subset Outputs: a subset A U of size at most /* Initializations */ 1 A 0 2 U 0 U /* Starting the Iterations */ 3 for t = 1 do /* Local maximization */ 4 u t arg max u U t 1 fa t 1 {u}) /* Updating the Loop Variables */ 5 A t A t 1 {u t } 6 U t U A t 7 return A = A We are now ready to start analyzing some of the algorithms for submodular maximizing. 3 Maximization Algorithms In the remaining of this report, we analyze in detail some of the existing maximization algorithms that belong to the categories listed above. 3.1 Monotone Functions In this section, we focus on the size-constrained maximization of monotone and submodular set functions loo at the categorization given in section 1), that is solving Problem 2 when the function f satisfies inequality 1) and the monotonicity condition in Definition 2. Here we discuss the efficient iterative algorithm that is firstly introduced in the seminal paper by Nemhauser et al. 1978). The pseudocode of the approach is shown in Algorithm 1. The proof of this well-nown result presented here, is based on the previous wors Nemhauser et al., 1978; Krause and Golovin, 2012). Suppose A denotes the output of the algorithm and A t the set generated after t iterations 0 t ) and A the optimal solution, i.e. A = arg max A U, A fa). Theorem 1. Let f : 2 U R be a monotone nondecreasing) submodular set function, A be the optimal solution and A U be the subset obtained through Algorithm 1. If A t is the 8

intermediate result of this algorithm at iteration t, then fa ) fa t ) + ρ f A t, u t+1 ), 0 t <, 7) Proof. Use Lemma 3 with A = A and B = A t to write: fa ) fa t ) + ρ f A t, w) ρ f A A t {z}, z) w A A t z A t A fa t ) + ρ f A t, w), w A A t where the last term is obtained by noticing that f is nondecreasing hence ρ f A A t {z}, z) 0 ρ f A A t {z}, z) 0, z U. Also note that at each iteration 0 t <, u t+1 is selected by maximizing fa t {u}), which is equivalent to maximizing ρ f A t, u) with respect to u. Therefore, one can write ρ f A t, w) ρ f A t, u t+1 ), w U t = U A t. Applying this to the inequality above together with the fact that A A t A complete the proof: fa ) fa t ) + ρ f A t, w) w A A t fa t ) + ρ f A t, u t+1 ). 8) Theorem 1 implies that if at any iteration 0 t 1 we have fa t ) = fa t+1 ), i.e. ρ f A t, u t+1 ) = 0, the output of Algorithm 1 is equal to the optimal solution: Corollary 1. Let f : 2 U R be a monotone nondecreasing) submodular set function, A be the optimal solution and A U be the subset obtained through Algorithm 1. If there exists an iteration 0 t < such that ρ f A t, u t+1 ) = 0, then fa t ) = fa ). Proof. From Theorem 1 we have: fa ) fa t ) + ρ f A t, u t+1 ) = fa t ). 9) On the other hand, A is assumed to be the optimal solution, suggesting fa ) fa t ), hence fa t ) = fa ). Now we prove the main result providing a lower bound for the value of f at the greedily obtained solution in terms of the optimal value: Theorem 2. Let f : 2 U R be a submodular and monotone nondecreasing) set function with f ) = 0. If A is the output of Algorithm 1 and A is the optimal solution to Problem 2, then we have: [ ) ] 1 fa) 1 fa ) 1 1 ) fa ). 10) e 9

Proof. From Theorem 1 we have for 0 t < ): fa ) fa t ) ρ f A t, u t+1 ) = [fa t {u t+1 }) fa t )] = [fa t+1 ) fa t )] = [fa ) fa t )] [fa ) fa t+1 )] Taing fa ) fa t ) terms to one side and simplifying the expression yields: fa fa t+1 )) fa ) fa t ) 1. Now let us relate fa ), fa) and fa 0 = ): fa A 1 ) fa) fa ) fa 0 ) = fa ) fa i+1 ) fa ) fa i=0 i ) from which we can write: fa) = [ [ 1 1 1 1 1 ) ] fa ) + 1 1 ) f ) 1 1 ) ] fa ), ) A 1 1 ), where we used the assumption that f ) = 0. Finally we use the inequality 1 x e x, x R + to get: [ 1 1 1 ) ] ) fa ) 1 e fa ) 11) yielding fa) 1 1 ) fa ). e 3.2 Non-Monotone Functions In this section, we discuss algorithms for solving Problems 1 and 2 for submodular and non-negative set functions. Maximization of non-monotone submodular functions is not as straightforward as the monotone ones. There are recent studies on greedy approaches similar to Algorithm 1 for the family of non-negative and submodular functions which are not necessarily monotone, but the lower bounds that are derived for the proposed algorithms are not as good as cases where monotonicity is present. For instance, as we saw in the last section, if f is monotone and submodular, its constrained maximizer fa ) i.e. the solution to Problem 2) can be approximated with the approximation factor of 1 1 e). However, it is shown that when there 10

is no constraints, no efficient algorithm can maximize non-negative, symmetric 1 and submodular set functions which is possibly non-monotone) with an approximation factor better than 1 2 Feige et al., 2011). Moreover, Algorithm 1 suggests that the bound 1 1 e) could be attained by the submodular and monotone maximization in a deterministic framewor. But, we will see shortly that the proposed algorithms analyzed here for maximizing submodular and non-monotone functions, acheive their relatively weaer factors by introducing randomness, and therefore their bound acheivements are in the sense of expectation. Here we discuss two randomized greedy algorithms for solving unconstrained and constrained optimization problems, proposed by Buchbinder et al. 2012) and Buchbinder et al. 2014). We will see that these algorithms, shown in Algorithms 2 and 3, result constant 1/2) and 1/e) approximation factors for Problems 1 and 2 respectively. In addition we also analyze a constrained version of Algorithm 2. 3.2.1 Unconstrained Maximization Pseudocode of the greedy algorithm proposed for unconstrained maximization is shown in Algorithm 2 Buchbinder et al., 2012). It is a randomized double algorithm in the sense that it starts with two solutions A 0 = and B 0 = U and at each iteration with some probability adds an element to the former or removes it from the latter. These probabilities depend on how much improvement will be attained by taing either of the actions. At the end of such procedure, the two evolving sets A t and B t would coincide for t = n and either of them can be returned as the ultimate solution. For obtaining a lower-bound for the objective value of the output of Algorithm 2 we need the following two lemmas: Lemma 4. Let f : 2 U R be a submodular function and A t 1 and B t 1 the intermediate double random sets generated up to the t-th iteration of Algorithm 2. If a t and b t are defined as in lines 4 and 5 of Algorithm 2, then we have a t + b t 0 for every iteration 1 t n. Proof. Denote U member-wise by U = {u 1,..., u n }. By the way we initialize and then continue sampling the double random sets, at each iteration of the algorithm we have B t 1 = A t 1 {u t,..., u n } hence A t 1 B t 1. We consider the two sets A t 1 {u t } and B t 1 {u t } that are involved in a t and b t and see that their union and intersection are B t 1 and A t 1 respectively: A t 1 {u t }) B t 1 {u t }) = A t 1 {u t } B t 1 ) = A t 1 B t 1 = B t 1 A t 1 {u t }) B t 1 {u t }) = A t 1 B t 1 {u t }) 1 The set function f : 2 U R is called symmetric if = A t 1 A t 1 {u t+1,..., u n }) = A t 1. fa) = fu A), A U. 11

Algorithm 2: Randomized double greedy approach for solving problem 1 for nonnegative submodular functions Inputs: the source set U = {u 1,..., u n }, the submodular set function f Outputs: a subset A U /* Initializations */ 1 A 0 2 B 0 U /* Starting the Iterations */ 3 for t = 1 n do 4 a t fa t 1 {u t }) fa t 1 ) 5 b t fb t 1 {u t }) fb t 1 ) 6 a t maxa t, 0) 7 b t maxb t, 0) /* Randomized Updating of the Double Loop Variables */ 8 with probability a t/a t + b t) do 9 A t A t 1 {u t } 10 B t B t 1 11 with probability b t/a t + b t) do 12 A t A t 1 13 B t B t 1 {u t } 14 return A = A n = B n If a t + b t = 0, tae a t/a t + b t) = 1. Forming a t + b t and usign definition of submodularity equation 1)) results: a t + b t = fa t 1 {u t }) fa t 1 ) + fb t 1 {u t }) fb t 1 ) 0. Now define the set C t = A A t ) B t where A is the optimal solution of the unconstrained maximization. Note that C 0 = A, the optimal set, and C n = A n = B n, the output. The set sequence C 0 = A, C 1,..., C n = A n generated by each iteration of the algorithm, starts from the optimal set and ends up with the output of the algorithm. Similarly we can loo at the sequence of objective values fa ), fc 1 ),..., fc n ). The following analysis is done by bounding the expected amount of decrease in the objective when moving from the global maximum fc 0 = A ) towards the output value fc n ) in the sequence {fc i )} n i=0. First we put a bound over the expected decrease of objective between two consecutive iterations. Note that in what follows for analysis of algorithm 2, all the expectations are jointly taen with respect to the random sets A t and B t : Lemma 5. Let f : 2 U R be a submodular function and A t 1 and B t 1 be the intermediate double random sets generated up to the t-th iteration of Algorithm 2 1 t n). If we 12

define C t := A A t ) B t where A is the optimal solution to Problem 1, then: E[fC t 1 ) fc t )] 1 2 E[fA t) fa t 1 ) + fb t ) fb t 1 )]. 12) Proof. We consider all possible cases in every iteration 1 t n and show that the inequality 12) holds in all of them. These cases are separated based on the signs of a t and b t and also whether the next element u t is inside A or not. From Lemma 4, a t + b t 0 hence they cannot be negative at the same time and we get the following cases as mentioned before B t 1 = A t 1 {u t,..., u n }): I) a t 0, b t 0): This case implies that with probability one A t = A t 1 {u t } and B t = B t 1 hence the right-hand-side of the inequality is 1 2 [fa t) fa t 1 )] = a t /2. We also get C t = A A t ) B t = A A t 1 {u t }) B t 1 = C t 1 {u t } B t 1 ) = C t 1 {u t }. The left-hand-side of the inequality depends on the following two cases: i) u t A ): In this case u t C t 1 implying that C t = C t 1 and the left-hand-side becomes zero. Hence the inequality holds. ii) u t / A ) The right-hand-side in this case is fc t 1 ) fc t 1 {u t }) = ρ f C t 1, u t ). By submodularity of f we get ρ f C t 1, u t ) ρ f B t 1 {u t }, u t ) because C t 1 B t 1 {u t } and u t / B t 1 {u t }. This implies that fc t 1 ) fc t ) = ρ f C t 1, u t ) ρ f B t 1 {u t }, u t ) = fb t 1 {u t }) fb t 1 ) = b t 0. Recall that we saw the right-hand-side is at 2 II) a t < 0, b t 0): 0 hence the inequality holds. This case is similar to the previous one. Here with probability one we have A t = A t 1 and B t = B t 1 {u t } implying that the right-hand-side becomes 1 2 [fb t) fb t 1 )] = b t /2. We also get: C t = A A t ) B t = A A t 1 ) B t 1 {u t }) = C t 1 {u t }. For obtaining the left-hand-side we consider the two following cases: i) u t A ): 13

Having u t inside A suggests that u t C t 1 because it exists in both A and B t 1 ). The left-hand-side can be written as ρ f C t 1 {u t }, u t ). Note that since A t 1 B t 1 we get: C t 1 = A A t 1 ) B t 1 = A B t 1 ) A t 1 B t 1 ) = A B t 1 ) A t 1 A t 1 C t 1 {u t } A t 1 {u t } = A t 1, since u t / A t 1. Now by submodularity of f we have ρ f C t 1 {u t }, u t ) ρ f A t 1, u t ) = fa t 1 {u t }) fa t 1 ) = a t < 0. But recall the right-hand-side was equal to bt 0 hence the inequality holds. 2 ii) u t / A ): In this case u t is not in A nor in A t 1 implying that u t / C t 1, therefore the left-hand-side is zero and clearly less than or equal to bt. 2 III) a t 0, b t > 0): In this case both probabilities are non-zero and we have a non-degenerate discrete distribution over the possible actions. Note that here a t = a t and b t = b t thus the a probabilities are t a t+b t and bt a t+b t resulting the following: 2 E[fA t ) fa t 1 ) + fb t ) fb t 1 )] = ) at [fa t 1 {u t }) fa t 1 )] + a t + b t ) bt [fb t 1 {u t }) fb t 1 )] a t + b t = a2 t + b 2 t a t + b t. where the first term corresponds to adding u t to A t 1 and the second term corresponds to removing it from B t 1. We already mentioned that C t would be equal to C t 1 {u t } and C t 1 {u t } in each case, respectively, and: ) at E[fC t 1 ) fc t )] = [fc t 1 ) fc t 1 {u t })] + a t + b t ) bt [fc t 1 ) fc t 1 {u t })] a t + b t Now note that the two terms shown above cannot be nonzero simultaneously and one of them is going to be zero depending on whether u t is inside A hence inside C t 1 ): 2 Note that in this case b t is considered a strictly positive number. The case where both numbers might a become zero is considered in case 1, since in if that happens we assume t a t+b t = 1 as if a t > 0 and b t 0. 14

i) u t A ) : E[fC t 1 ) fc t )] = bt where the inequality is already shown in case IIi). ii) u t / A ) : E[fC t 1 ) fc t )] = ) [fc t 1 ) fc t 1 {u t })] a t + b t a tb t, a t + b t at where the inequality is already shown in case 1b). ) [fc t 1 ) fc t 1 {u t })] a t + b t a tb t, a t + b t Therefore one can alway claim that E[fC t 1 ) fc t )] 2a tb t a t+b t a2 t +b2 t a t+b t for any a t and b t to complete the proof: E[fC t 1 ) fc t )] a tb t 1 ) a 2 t + b 2 t a t + b t 2 a t + b t = 1 2 E[fA t) fa t 1 ) + fb t ) fb t 1 )] atbt a t+b t. Finally notice that Theorem 3. Let f : 2 U R be a submodular set function. If A n is the output of Algorithm 2 and A is the optimal solution of Problem 1, then E[fA n )] 1 2 fa ). 13) Proof. The inequality 12) is proven for all iterations 1 t n, summing them all will result a telescopic sum in each side: n E[fC t 1 ) fc t )] 1 2 t=1 n E[fA t ) fa t 1 ) + fb t ) fb t 1 )] t=1 E[fC 0 ) fc n )] 1 2 E[fA n) + fb n ) fa 0 ) fb 0 )]. Now remember C 0 = A, A 0 =, B 0 = U and C n = A n = B n : fa ) 2E[fA n )] f ) fu) 2E[fA n )]. 15

3.2.2 Constrained Maximization In this section, first we give discuss a slight deviation of Algorithm 2, which maes it cosntrained, and then analyze a separate single-variable randomized algorithm recently proposed for constrained maximization of nonnegative and submodular set functions. In the constrained version of algorithm 2, we terminate the algorithm as soon as the size of the set A t reaches the maximum size > 0 determined in the constrained problem. Therefore, if this happens at iteration t the output would be A t. Clearly, t can be at least if in all iterations we add elements into A t ) and at most n if A n ). We have the following result on the lower bound of E[fA t )] where we used the implicit assumption of upper-boundedness of f over U: Theorem 4. Let f : 2 U R be a submodular function, A t be the output of the constrained version of Algorithm 2 and A be the optimal solution of Problem 1. If we denote the upper bound of f over U by M, i.e. f{u}) M u U, then E[fA t )] 1 2 fa ) n )M. 14) Proof. First we find an upper bound for the difference between the expected value of fa n ) and A t. We now that A t A n. Denote the set difference by S t = A n A t. Note that if S t = then we have t = n since A t = A n ) and therefore inequality 14) holds by Theorem 3. Thus assume that S t and denote it memeber-wise by S t = {s 1,..., s m }. Clearly, S t {u t +1,..., u n } and therefore m n t n. From submodularity of f and also using Lemma 1 we have: fa n ) fa t ) = fa t S t ) fa t ) fa t ) + fs t ) f ) fa t ) m fs t ) f{s i }) mm n )M This inequality together with Theorem 3 suggests that E[fA t )] E[fA n )] n )M 1 2 fa ) n )M. Now we turn to the recent algorithm proposed by Buchbinder et al. 2014) for maximizing nonnegative submodular functions with cardinality constraint. The pseudocode shown in Algorithm 3 is very similar to Algorithm 1 with a certain degree of added exploration, i.e. instead of selecting the element with highest f at each iteration, a sample is chosen randomly from the set of elements with the highest f values. Construction of M t in line 4 can be done simply by sorting the elements in U t 1 in a descending oreder in terms of their f 16

Algorithm 3: Randomized greedy approach for solving Problem 2 for non-negative and submodular functions Inputs: the source set U, the submodular set function f, the maximum size of the subset Outputs: a subset A U of size at most /* Initializations */ 1 A 0 2 U 0 U /* Starting the Iterations */ 3 for t = 1 do 4 M t arg max u M U t 1 M = ρ f A t 1, u) u M 5 u t RANDOMM t ) /* Updating the Loop Variables */ 6 A t A t 1 {u t } 7 U t U A t 8 return A = A values, and picing the first elements. Line 5 randomly selects one of the entries from the constructed M t. Extra care should be given to the size of U t in this algorithm. Note that if U < 2, after iterations we would run out of enough samples to put inside M t. The following Remar discuss this issue: Remar 1. In algorithm 3, we can always assume that U 2. Proof. If > U 2 we can define a dummy set D with 2 elements such that for every A U we have fa) = fa D). By doing this we might get some of the elements of D in the output of the algorithm, we simply remove them after the algorithm is finished. Note that introducing this set does not effect our following proofs and the output of the algorithm since it does not change neither fa ) nor fa t ), 0 t. Observe that according to the algorithm size of the output might be less than only if n < 2 which introduces some dummy elements to the universe set. Buchbinder et al. 2014) discuss that the performance of Algorithm 3 is expected to be similar to Algorithm 1 when the function to maximize is monotone. That is, while it guarantees the weaer bound of 1 for non-monotone functions, it maintains the stronger e bound of 1 e) 1 if its input set function is monotone. Of course, in contrary to Algorithm 1, becasue of the randomness here, this bound is achieved in expectation: f monotone and submodular: E A [fa)] 1 1 e ) fa ) 15a) f non-negative and submodular: E A [fa)] 1 e fa ) 15b) 17

In this section, first we show 15a) in Theorem 5, and then build up the proof for 15b). Theorem 5. Let f : 2 U R be a submodular and monotone set function with f ) = 0. If A is the output of Algorithm 3 and A is the optimal solution of Problem 2, then E A [fa)] 1 1 ) fa ). 16) e Proof. Suppose A t 1 hence M t is fixed for some 1 t ). In this case u t is a discrete random variable distributed uniformly over its domain M t. Also observe that M t gives the set of elements in U t 1 which has the highest discrete derivatives, that is u M t ρ f A t 1, u) u S ρ fa t 1, u), S U t 1, S. Thus one can write: E ut A t 1 [ρ f A t 1, u t )] = 1 ρ f A t 1, u t ) 1 ρ f A t 1, u t ). 17) u t M t u t A A t 1 We now that A A t 1 U is finite, represent it by A A t 1 = {w 1,..., w r }, r 0. Submodularity of f implies that the following inequalities hold: ρ f A t 1, w 1 ) ρ f A t 1, w 1 ) 18a) ρ f A t 1, w 2 ) ρ f A t 1 {w 1 }, w 2 ) 18b) ρ f A t 1, w 3 ) ρ f A t 1 {w 1, w 2 }, w 3 ) 18c). 18d) ρ f A t 1, w r ) ρ f A t 1 {w 1,..., w r 1 }, w r ). 18e) Summing both sides of the inequalities in 18) yields: ρ f A t 1, u) fa t 1 A A t 1 )) fa t 1 ) u A A t 1 We substitute obtained inequality in 19) into 17) to get: = fa A t 1 ) fa t 1 ) 19) E ut A t 1 [ρ f A t 1, u t )] 1 [fa A t 1 ) fa t 1 )] 1 [fa ) fa t 1 )], 20) where the last inequality is obtained from the monotonicity of f. So far we had the assumption that the set A t 1 was fixed. Unfix it and tae the expectation of both sides of inequality 20) with respect to A t 1. Then noticing that E At 1 [E ut A t 1 [ρ f A t 1, u t )]] = E At [ρ f A t 1, u t )] 3 we get: 3 For every two random variables X and Y and any joint function fx, Y ) we have E Y [E X Y fx, Y ))] = E X,Y [fx, Y )]. 18

E At [ρ f A t 1, u t )] 1 [fa ) E At 1 [fa t 1 )]] E At [fa t 1 {u t }) fa t 1 )] 1 [fa ) E At 1 [fa t 1 )]] E At [fa t )] E At 1 [fa t 1 )] 1 [fa ) E At 1 [fa t 1 )]] E At [fa t )] 1 fa ) + 1 1 ) E At 1 [fa t 1 )]. We can relate the difference between the optimal value fa ) and the value of the estimated set at iteration t by subtracting fa ) from both sides of the last inequality above: ) 1 E At [fa t )] fa ) 1 fa ) + 1 1 ) E At 1 [fa t 1 )] fa ) E At [fa t )] 1 1 ) [ ] fa ) E At 1 [fa t 1 )], implying that fa ) E At [fa t )] fa ) E A0 [fa 0 )] fa ) E At [fa t )] E At [fa t )] E At [fa t )] t fa ) E Ai [fa i )] fa ) E Ai 1 [fa i 1 )] = 1 1 ) t 1 1 ) t [fa ) fa 0 = )] [ 1 1 1 ) ] t fa ) + 1 1 ) t f ) [ 1 1 1 ) ] t fa ). Finally maing t = and utilizing the inequality 1 x e x, x R completes the proof: [ E A [fa )] 1 1 1 ) ] fa ) 1 1 ) fa ). e Now we turn to non-monotone functions and see that the lower bound that can be guaranteed for the function value at the output of Algorithm 3 is much lower than what we proved for monotone functions. First we need few lemmas: Lemma 6. Let f : 2 U R be a nonnegative submodular set function and p : U [0, 1] be a probability assignment to the elements of U with an upper bound M, i.e. pu) M, u U. If we denote S p the set obtained by sampling from U where the element u U is selected with probability pu), that is P ru S p ) = pu) u U, then we have: E p [fs p )] 1 M)f ). 21) 19

Proof. Without loss of generality assume that the elements in U are sorted in terms of their probability of being selected, i.e. U = {u 1,..., u n } such that pu i ) pu j ), 1 i < j n. Also denote S p = {s 1,..., s Sp } and define U i := {u 1,..., u i } for any 1 i < n and U 0 =. We can write fs p ) in terms of the telescopic sums over all the members of U: U fs p ) = f ) + 1u i S p ) [fu i S p ) fu i 1 S p )] = U 1u i S p )ρ f U i 1 S p, u i ) U f ) + 1u i S p )ρ f U i 1, u i ), where the last inequality is true because of submodularity of f. Now taing the expectation over both sides of the inequality above yields: U E p [fs p )] E p f ) + 1u i S p )ρ f U i 1, u i ) U = f ) + E p [1u i S p )] ρ f U i 1, u i ) U = f ) + pu i ).ρ f U i 1, u i ), where we used the fact that 1u i S p ), the indicator function, is a random variable with Bernouli distribution with parameter pu i ), hence its expected value is pu i ). Now we expand the discrete derivatives and use non-negativity of f to get: U E p [fs p )] f ) + pu i )fu i ) fu i 1 )) = [1 pu 1 )].f ) + U 1 U pu i )fu i ) pu i )fu i 1 ) + pu U ).fu) U = [1 pu 1 )].f ) + [pu i 1 ) pu i )].fu i 1 ) + pu U ).fu) [1 pu 1 )]f ). i=2 Finally recalling that M is an upper bound for the function p, we can write i=2 E p [fs p )] [1 pu 1 )].f ) 1 M)f ). 20

Lemma 7. Let f : 2 U R be a nonnegative submodular set function and A t and U t be the sets generated at the t-th iteration of Algorithm 3 for 0 t. If A is the optimal solution of Problem 2, then we have: E At [fa A t )] 1 1 ) t fa ). 22) Proof. The claim is trivial for the case t = 0. Tae 1 t and u / A t 1. Note P ru t = u u M t ) = 1 since u t is selected randomly from M t which has size. The goal here is to compute P ru A t ) and then use the result of Lemma 6. For doing so, we start from computing the conditional probability that u will be selected in iteration t given that it is not selected so far. Note that the followings hold no matter if u A or not: Hence: P ru A t u / A t 1 ) = P ru M t, u = u t u / A t 1 ) = P ru M t u / A t 1 ) P ru t = u u M t ) }{{} 1/ P ru / A t u / A t 1 ) 1 1. 1. Now we compute the unconditional probability that u U is inside A t. However, it is easier to compute the probability of its complement, u / A t : P ru / A t ) = P ru / A 1,..., u / A t ) = P ru / A 1 ) P ru / A 2 u / A 1 )...P ru / A t u / A t 1 ) 1 1 ) t, which gives us P ru A t ) 1 1 1 ) t. Now define the function g A : 2 U R as g A A) = fa A), A U. Note that we have g A A t ) = g A A t A ) = fa t A ) Since f is a submodular set function, from Lemma 2, g A is also submodular, and we can use Lemma 6 with M = 1 1 ) 1 t to write: E At [g A A t )] = E At [fa A t )] 1 1 ) t g A ) = 1 1 ) t fa ). Now we bring the main theorem which gives a lower bound on the expected value of the objective set function at the generated sets of Algorithm 3: Theorem 6. Let f : 2 U R be a submodular nonnegative set function and A t and U t be the sets generated at the t-th iteration of Algorithm 3 for 0 t. If A is the optimal solution of Problem 2, then we have: E At [fa t )] t ) 1 1 ) t 1 fa ), 0 t. 23) 21

Proof. The theorem is trivial for t = 0. Tae 1 t and fix A t 1 hence M t. In this case u t is a discrete random variable distributed uniformly over its domain M t. Also observe that M t gives the set of elements in U t 1 which has the highest discrete derivatives, that is u M t ρ f A t 1, u) u S ρ fa t 1, u), S U t 1, S. Thus one can write: E ut A t 1 [ρ f A t 1, u t )] = 1 ρ f A t 1, u t ) 1 ρ f A t 1, u t ). 24) u t M t u t A A t 1 We now that A A t 1 U is finite, represent it by A A t 1 = {w 1,..., w r }, r 0. Submodularity of f implies that the following inequalities hold: ρ f A t 1, w 1 ) ρ f A t 1, w 1 ) 25a) ρ f A t 1, w 2 ) ρ f A t 1 {w 1 }, w 2 ) 25b) ρ f A t 1, w 3 ) ρ f A t 1 {w 1, w 2 }, w 3 ) 25c). 25d) ρ f A t 1, w r ) ρ f A t 1 {w 1,..., w r 1 }, w r ). 25e) Summing both sides of the inequalities in 25) and using the telescopic properties of the right-hand-side sum yields: u A A t 1 ρ f A t 1, u) fa t 1 A A t 1 )) fa t 1 ) We substitute the inequality obtained in 26) into 24) to get: = fa A t 1 ) fa t 1 ) 26) E ut A t 1 [ρ f A t 1, u t )] 1 [fa A t 1 ) fa t 1 )]. So far we had the assumption that the set A t 1 was fixed. Unfix it and tae the expectation of both sides with respect to A t 1. Then noticing that E At 1 [E ut A t 1 [ρ f A t 1, u t )]] = E At [ρ f A t 1, u t )] we get: E At [ρ f A t 1, u t )] 1 [E At 1 [fa A t 1 )] E At 1 [fa t 1 )] ]. Since f is a submodular and nonnegative set function we can use Lemma 7 to write: [ E At [ρ f A t 1, u t )] 1 1 1 ) t 1 fa ) E At 1[fA t 1)]]. 27) Having already mentioned that the proof is valid for t = 0, now we are ready to demonstrate an inductive proof for the theorem. Suppose 23) holds for 0 t 1 <, let us prove the inequality for t: 22

First note that fa t ) = fa t 1 ) + ρ f A t 1, u t ), tae the expectation of both sides with respect to A t and apply the inequality 27) and the inductive assumption E At 1 [fa t 1 )] t 1 ). 1 1 t 2 ).fa ) to get: E At [fa t )] = E At 1 [fa t 1 )] + E At [ρ f A t 1, u t )] [ E At 1 [fa t 1 )] + 1 1 1 ) t 1 fa ) E At 1[fA t 1)]] = = 1 1 ) E At 1 [fa t 1 )] + 1 [ 1 1 ) ) t 1 ) t 1 1 ) t 1 fa ). 1 1 ) t 1 fa ) 1 1 ) t 2 + 1 1 1 ) ] t 1 fa ) Corollary 2. Let f : 2 U R be a submodular nonnegative set function and A be the output of Algorithm 3. If A is the optimal solution Problem 2, then we have: E A [fa)] 1 e fa ). 28) Proof. f is a submodular and nonnegative set function hence we can use Theorem 6 with t = to write: E A [fa )] 1 1 ) 1 fa ) 1 e fa ). References Bach, F. 2013). Learning with submodular functions: A convex optimization perspective. Buchbinder, N., Feldman, M., Naor, J., and Schwartz, R. 2012). A tight linear time 1/2)- approximation for unconstrained submodular maximization. In 53rd Annual Symposium on Foundations of Computer Science, pages 649 658. IEEE. Buchbinder, N., Feldman, M., Naor, J. S., and Schwartz, R. 2014). Submodular maximization with cardinality constraints. In Symposium on Discrete Algorithms. SIAM. Feige, U., Mirroni, V. S., and Vondra, J. 2011). Maximizing non-monotone submodular functions. SIAM Journal on Computing, 404):1133 1153. Jegela, S. and Bilmes, J. 2011). Submodularity beyond submodular energies: coupling edges in graph cuts. In Computer Vision and Pattern Recognition CVPR), 2011 IEEE Conference on, pages 1897 1904. IEEE. 23

Kim, G., Xing, E. P., Fei-Fei, L., and Kanade, T. 2011). Distributed cosegmentation via submodular optimization on anisotropic diffusion. In Computer Vision ICCV), 2011 IEEE International Conference on, pages 169 176. IEEE. Krause, A. and Golovin, D. 2012). Submodular function maximization. Tractability: Practical Approaches to Hard Problems, 3:19. Krause, A., Singh, A., and Guestrin, C. 2008). Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 9:235 284. Lovász, L. 1983). Submodular functions and convexity. In Mathematical Programming The State of the Art, pages 235 257. Springer. Nemhauser, G. L. and Wolsey, L. A. 1978). Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research, 33):177 188. Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. 1978). An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming, 141):265 294. Vondrá, J. 2013). Symmetry and approximability of submodular maximization problems. SIAM Journal on Computing, 421):265 304. 24