APPROXIMATION ALGORITHMS RESOLUTION OF SELECTED PROBLEMS 1

UNIVERSIDAD DE LA REPUBLICA ORIENTAL DEL URUGUAY IMERL, FACULTAD DE INGENIERIA LABORATORIO DE PROBABILIDAD Y ESTADISTICA APPROXIMATION ALGORITHMS RESOLUTION OF SELECTED PROBLEMS 1 STUDENT: PABLO ROMERO PROFESSOR: DR. MAURICE QUEYRANNE 2009 1 Problems 1,2,3,4,5,6 and 12 are presented here in order.

Interesting gap between P and NP-Hard 1 In this problem the objective is to design a minimum cost subnet that connects each receiver node with at least one sender. Formally, consider a graph G = (V, E) with positive edge costs and two subsets of disjoint nodes: the senders and receivers, denoted by S and R respectively. The design problem is to choose the minimum cost subgraph G = (V, E) such that each receiver results connected at least to one sender. We can consider two mutually exclusive disjoint cases: i) S R = V ii) S R V In order to capture easily the constraint of connectivity between every receiver to one sender, let consider an auxiliary node n, and the new graph G 1 = (V {n}, E {(n, s) : s S}). Fix the cost of all edges directly connected to the auxiliary node as zero. Given that the node n is connected to every sender, the condition of connectivity to each receiver with at least one sender is equivalent to the reachability from the auxiliary node n to all receivers. Then, the communication scenario is successful whenever we find a subgraph G 2 G 1 such that the auxiliary node n reaches each receiver, at minimum cost. The edges between senders and the node n are for free, so for each feasible graph G 2 G 1, we can extend this last adding zero cost edges and obtain a graph inside G 1 which reaches all receivers also. Then we translate in this way the communication design problem to the search of the minimum cost graph inside G 1 which must reach all receivers and transmitters. Let us study the two proposed cases in order. i) If all non-auxiliary nodes in G 1 are either receivers or transmitters, the problem consists of finding a minimum cost spanning subgraph, because it visits all nodes. Moreover, the cheapest design has no cycles, because no edge redundancy of positive cost is permitted (observe that removing an edge of a cycle does not increase cost, and the graph remains connected). Then, the solution of the problem is a Minimum Spanning Tree inside G 1. This famous problem has at least two known polynomial algorithms. One is Prim s Algorithm, and the greedy alternative is Kruskal. Let us choose Kruskal Algorithm, and prove that it finishes in polynomial time (for brevity, we will assume that it returns a minimum cost tree, which is a classical result). The Kruskal Algorithms adds the cheapest edge in each step without making cycles. It takes a linear order in edges to find the cheapest edge. We can add this edge only if it does not generate any cycle, which must be checked. Observe that in each step the resulting graph is a forest, and saving either the list of neighbors or adjacency matrix, it is possible to check linearly if already exists a path between the extremes of the intended new edge (just asking recursively the neighbors of each neighbor visit no more than all the vertices in the graph, possibly the one in the other extreme of the intended edge). If we call m = V {n} then Kruskal finishes in m 1 steps, because all trees have one edge less than its number of vertices. The number of edges is bounded by m 2, so the number of operations in Kruskal is bounded by m 2 m (m 1) < m 4. This finishes the proof that 2

Kruskal is polynomial. The last step of the Algorithm is a removal of auxiliary edges. The solution is assured since it has minimum cost and the connection with the auxiliary node requires to pass by a sender vertex. The Algorithm 1 finds the minimum cost tree to connect all receivers with at least one sender. Algorithm 1 Returns the minimum cost connection between a set of receivers R and senders S = V R. Inputs: Set of Vertices: V Set of Senders: S Set of Edges: E Cost of edges: C e Output: Edges of a Minimum cost connection: E out 1: E out := {} 2: R := V S 3: V := V {n} 4: E := E {(s, n) : s S} 5: for all s S do 6: C (s,n) = 0 7: end for 8: while ( E out < V 1) do 9: e := F indcheapestnocycle(v, E, E out, C e ) 10: E out := E out {e} 11: E := E {e} 12: end while 13: E out := E out {(s, n) : s S} 14: return E out 3

ii) Now, the situation is different if not all nodes are either senders or receivers. Reasoning in the same way as before, the problem is equivalent to solve the Minimum Steiner Tree in this case, because the vertices in V {R {n}} do not need to be visited, and so are Steiner Vertices. The Minimum Steiner Problem is NP-Hard. Algorithm 2 returns a minimum spanning tree which contains all non-steiner nodes. The first block consists of lines 1 to 6. It adds the auxiliary node of the vertex set connected to each sender with edge costs of 0. The condition of connectivity from each receiver with at least one sender can be stated again as the connection from each receiver to the auxiliary node. This is expressed in line 7, by defining the Required vertices as R {n}. The block of lines 8 to 10 find the minimum distance between each pair of required vertices, applying the Dijkstra Algorithm, which is possible since all edges have nonnegative costs. The minimum paths P (i,j) between each pair of distinct required vertices i, j are saved, so as to find easily the minimum tree that connects non-steiner nodes. In the last while block, it is added the cheapest path in each step, until the final tree T is returned in line 17. Algorithm 2 Returns the minimum spanning tree of a set of required vertices. Inputs: Set of Vertices: V Set of Senders: S Set of Receivers: S Set of Edges: E Cost of edges: C e Output: Minimum Spanning Tree for Required Vertices: T 1: E out := {} 2: V := V {n} 3: E := E {(s, n) : s S} 4: for all s S do 5: C (s,n) = 0 6: end for 7: Required := R {n} 8: for all {i, j, i j} Required do 9: [D (i,j), P (i,j)] := ApplyDijkstra(V, i, j, C e ) 10: end for 11: T := {} 12: while T < R do 13: [P ath, i, j] := F indm inimumdistance(d) 14: T := T P ath 15: D (i,j) := 16: end while 17: return T 4

Proposition 1. Algorithm 2 is a factor-2 approximation for the Minimum Steiner Problem in the graph G = (V {n}, E {(n, s) : s S}. Proof. Let T be the Minimum Steiner Tree with optimal minimum cost OP T. If we duplicate its edges we obtain T. By Euler s Theorem, it has an eulerian circuit (because all nodes have even degree). Naturally, the cost of the eulerian circuit is 2OP T. Now construct a hamiltonian path P skiping Steiner nodes and taking shorcuts (minimum paths), by visiting each required vertex in the same order than the eulerian cycle. It is clear that P is cheaper than the eulerian cycle, because we take shortcuts (the triangle equality holds in the equivalent complete graph). This assures that the cost of P is lower than 2OP T. Moreover, P is a tree of required vertices, so it is at least as expensive as the minimum spanning tree of required vertices T, returned by Algorithm 2. This states that the cost of T is not bigger than 2OP T, and finishes the proof. Finally, the factor-2 approximate solution for the problem is obtained inducing the artificial node n to the tree T of Algorithm 2. It does not affect the results, since all removed edges have no cost. It is interesting to appreciate the meaning of these results. The most shocking is that when all nodes are either senders or receivers the communication design can be done optimum (at minimum cost) in polytime. However, if there is some Steiner node (neither sender nor receiver) the general problem is NP-Hard, and there is no polynomial time for solving it unless P = NP. 5

The Cheapest Forest is a SSC Problem in P Consider a graph G = (V, E) with positive edge costs. For every edge subset S E let f(s) denote the maximum cardinality of a subset F S such that the subgraph (V, F ) does not contain a cycle. In this problem we will show that f is a nondecreasing submodular set function 2. Moreover, the Submodular Set Cover problem associated with f reduces to a well-known combinatorial optimization problem. Note that f is well-defined. In fact, the maximum cardinality of an acyclic subgraph is always a maximal forest by inclusion. The nondecreasing of f is evident by its definition, because if we consider A B E, each acyclic graph in (V, A) is also acyclic in (V, B). So, the cardinality of the biggest forest in (V, B) is never smaller than the biggest in (V, A), and f(a) f(b). If S E and {j} / S, then f S (j) = f(s {j}) f(s) is the marginal increasing value by adding {j} to S. The next Lemma will be useful to prove the quality of submodularity of f. Lemma 2. f S (j) = K(S) K(S {j}), S E, {j} E S, (1) where K(S) denotes the number of components of the graph (V, S). Proof. Acyclic graphs are forests by definition, so f(s) is the maximum cardinality forest inside S. All forests are planar and comply the Generalized Euler Theorem. This assures that if G = (V, E) is a planar graph with K connected components and a planar inmersion delimits R regions (including the infinite one), then V E + R = K + 1. Particularly in forests it holds that E = V K. Now, given any set S E if we denote by F the maximum cardinality forest in S we can apply the last result to the forest (V, F ). We get that: f(s) = max F S { F } = max F S { V K(F )} = V min {K(F )}. (2) F S The addition of edges never increases the number of connected components, so F S, K(F ) K(S). Moreover, for each component of S we can find a tree (in polytime) such that their unions achieve the same number of components than S. Joining theses observations we get that min F S {K(F )} = K(S). Substituting into eq. (2) we have now an expression for f(s) in terms of the number of connected components of S, precisely: f(s) = V K(S) (3) Take any edge {j} E S. By eq.(3) and definition of marginal increase function: f S (j) = f(s {j}) f(s) = V K(S {j}) ( V K(S)) = K(S) K(S {j}) (4) 2 Moreover, it is a rank function under a cycle matroid structure. 6

Important consequences bring the last result. First note that using eq. (3) we know that f({}) = 0, because the empty graph with V nodes has V connected components, and recall that f is nondecreasing. Lemma 2 gives an explicit expression of this function and its marginal values in terms of the connected components of the domain. Proposition 3. f(s) = max F S { F : (V, F )acyclic} is a submodular function in E. Proof. If A B, each component of A is included in a component of B. Consider any adge e E B that joins two different components B 1 and B 2 of B. Suppose by contradiction that e does not connect different components of A. Then there exists one component A 1 A such that e has both endpoints in A 1. But A 1 is inside some component B 3 of B. This is impossible, since e connects different components of B. This reasoning implies that K(A {e}) K(A) K(B {e}) K(B), A B E {e}. By Lemma 2 this is equivalent to say that the marginal values of f are nonincreasing by inclusion, or the submodularity of f. If S E is such that f(s) = f(e), it means that the graph (V, S) has the same number of connected components than G. Equivalently, (V, S) is a spanning forest of G (because it has at least one spannig tree per each component of (V, E)). In the particular problem we have non-negative weighted edges, and the SSC is: min S E {c(s) : f(s) = f(e)}, where c(s) is the sum of costs of edges. This is a known problem, and it refers to find the Minimum Spanning Forest in G. Observe that this problem is in P, and the greedy notion of this problem is Kruskal Algorithm, which solves it exactly. This can be anticipated by the Greedy Algorithm Approximability Theorem for the SSC, in which the approximability factor is H(max j E f(j)) = H(1) = 1 for this SSC instance. Nature of the function f As a complementary result, let us study the nature of the function f under the structure of forests, which are a natural notion of independence in graphs: Proposition 4. f is a rank function under some matroid of G. Proof. There is a natural notion of independence in graphs, which permits to define matroids, and is the property of not having cycles. It is clear that the empty graph (V, φ) has no cycles and every subgraph of a forest (graph with no cycles) is a forest. Moreover, if a graph has one cycle, the addition of one edge to it keeps the same cycle, so it remains dependent. This means that the subset of forests inside G = (V, E) is a matroid of G. The rank quality of f is a direct consequence of Proposition 3, Lemma 2 and the fact that f(φ) = 0. This completes the proof that f is a rank function under the cycle matroid. 7

Useful Redundancy in the RRA for the WSC In this problem we will show that the Randomized Rounding Algorithm (RRA) for the Weighted Set Cover (WSC) has a poor performance when we pick only one random set. Recall that in the Weighted Set Cover we have a ground set N = {1,..., n} and a collection of subsets S i N, i = 1,..., m with respective weights w i. The objective is to pick a subset of that collection such that covers N at minimum cost. This problem is NP Hard, and the approach considered here is a randomization of the exact solution of the real-valued LP Relaxation form of the WSC. Denote by x LP i the real number which represents the decision variable of using S i or not in the real domain for the LP problem of the WSC. It is clear that it does not give a direct solution of the original problem, because we must decide wether S i is part of the cover or not. However, 0 x LP i 1, so an interpretation of those outputs of the LP Relaxation is necessary in order to choose a collection of subsets. An intuitive procedure is the Randomization Rounding algorithm. It consists of picking S i with probability x LP i. Another important observation is that the direct appliance of this procedure does not assure feasibility of a solution (that is, to cover all the elements that are inside N). On the other hand, we can apply a randomization more than once and take the union of those collections. Intuitively, in this way the probability of achieving a feasible solution increases, but the cost of the solution grows up also. In order to illustrate this tradeoff between the redundancy of the solution and the final cost, we are interested in calculating the probability of picking a cost not higher than 4OP T and at the same time covering at least half of the elements of N, where OP T is the minimum cost between all feasible covers. Denote by f j the number of ocurrences of the element j N in the sets S i, i = 1,..., m. It is evident that 1 f j m, where the extreme cases are when only one set contains j, or all sets contain j. Then, if R is the set obtained via the Randomized Rounding algorithm (with one iteration) and j is an arbitrary element of the ground set N contained only in the sets S a1,..., S afj, we have that f j P (j R) = P (S ai R) = (1 x LP a i ). (5) f j 8

Given that in the WSC all the elements of the ground set must be covered, the restriction fj x LP a i 1 holds. It is easy to show 3 that under this latter restriction the probability of not covering j is maximum when x ai = 1 f j, i = 1,... f j. Then: f j P (j R) = (1 x LP a i ) (1 1 ) f j < 1 f j e, (6) where the last equality holds because (1 1 f j ) f j is monotonous increasing and tends to 1 e. Considering that the element j N is arbitrary, 1 is an upper bound for the probability of e not being picked by a random collection R, for every element of the ground set. Now, consider the random variable X that counts the number of elements of the ground set covered by R. By the inequality (6) we know that P (X < N ) < P (Bin(N, 1 1) < N ) = 2 e 2 P (Bin(N, 1) N ), where Bin(n, p) denotes the binomial random variable with n samples e 2 and probability of success p. Using the Markov s inequality (possible because binomial random variables are non-negative) and the fact that the expected value of the random variable Bin(n, p) is np, we get that: P (X < N 2 ) < P (Bin(N, 1 e ) e N 2 e ) 2 e This last means that the probability of not covering at least the half of the elements of N is upper-bounded by 2 e. On the other hand, let us focuse on the cost of the random collection. If we call C R to the random variable that measures the cost of the random collection R, its expected value is: m m E(C R ) = w i P (S i R) = w i x LP i OP T, (8) where the last inequality holds by the definition of relaxation (the minimum cost in reals is never bigger than the minimum cost limited to 0 1 values). Applying Markov s inequality we get that: P (C R 4OP T ) P (C R 4E(C R )) 1 (9) 4 Using equations (7) and (9), the probability that a cover does not pick half of the elements of N or has cost at least 4OP T is: P ((C R 4OP T ) (X < N 2 )) P (C R 4OP T ) + P (X < N 2 ) 2 e + 1 4. (10) 3 If f j = 1 the proof is trivial. Else, consider the lagrangean function with logarithmic objective (w.l.o.g., skiping 0 probabilities): f j φ(λ, x a ) = ln(1 x LP a i ) + f j f j λ i f j (1 Then φ x ar = 1 x ar 1 f j λ i, r = 1,..., f j. Taking f j λ i = fj f = 1 j 1 x ar 1 > 0, i = 1,..., f j, the gradient turns null and all complementary conditions (and nonnegativity of the multipliers) hold for the application of the Fundamental Lagrangean Relaxation Theorem. The maximum of the original problem is then x ai = 1 f j, i = 1,..., f j. 9 x LP a i ) (7)

This statement is equivalent to say that the probability of picking a collection with cost at most 4OP T and covering at least half of the elements of the ground set is not lower than β = 1 ( 1 + 2) = 3 2. Notice that this guarantee is poor, because not only the cost of the 4 e 4 e random collection can be far from the optimal (factor 4), but also we are considering to cover just the half of the elements, and the confidence of occurrence β is lower than 1, 5 percent. This shows the importance of picking more collections, possibly sacrificing cost but increasing the probability of obtaining a feasible cover. Other alternatives are to form a combination of many samples of collections, or embedding solutions with the Greedy heuristic or others. The nature of the problem (size of the ground set, priority to cost or to cover all members, computational effort, etc.) will determine which heuristic is more appropriate. 10

It is easy to achieve OP T 2 in MAXSAT Consider n boolean variables {x 1,..., x n }, X a random instantiation of its values and X the inverse instantiation. Let φ be any proposition in its conjuntive normal form which involves those n boolean variables: φ = m j i j=1 φ ij, (11) where φ ij is a literal (x h or x h, for some h {1,..., n}). Each clause C i = j i j=1 φ ij has a non-negative weight w i. The MAXSAT problem consists of finding an asignment to the atoms {x 1,..., x n } to maximize the sum of weights of the satisfied clauses inside φ. We will show here that the best between X and X achieves at least the half of the optimum. Lemma 5. Each clause C i = j i j=1 φ ij, i {i,..., m} is satisfied at least by one of X or X Proof. Denote by v X (φ) the truth function of an arbitrary proposition φ by the atoms value assignment X, which returns 1 if φ is true and 0 otherwise. Then: j i v X (C i ) = v X ( φ ij ) = max {v X (φ i1 ), v X (φ i2 ),..., v X (φ iji )}. (12) j=1 So, v X (C i ) = 0 if and only if v X (φ ij ) = 0, j {1,..., j i }. In this case we know by the third excluded law that v X (φ ij ) = 1, j {1,..., j i } an consequently by the definition of the truth of a disjunction, v X (C i ) = 1. Then, an arbitrary clause of disjunctions is satisfiable at least by one of X or X. We are in conditions now to prove that the algorithm of returning the best between X and X achieves at least the half of the maximum satisfactibility. More formally, if we denote with OP T the maximum satisfactibility and W to the function that assigns the sum of weights satisfied by an n boolean instance: Proposition 6. max {W (X), W (X )} OP T 2 Proof. Denote by C 1, C 2,..., C s all the s clauses that figure in the proposition φ and are satisfied by the random instance X. If W (X) = s w(c i ) OP T the result follows. In 2 other case, by Lemma 5 we know that all the clauses not satisfied by X are satisfied by X. Recall that OP T m w(c i ), because in the best case the optimum assignment achieves all the clauses true at the same time. Then: W (X m s s ) w(c i ) = w(c i ) w(c i ) OP T w(c i ) OP T i:c i NotSatisfiedByX 2 which finishes the proof., (13) 11

Factor VS Velocity in the MLS for MAX-CUT In this problem we will appreciate the tradeoff between the factor of approximation and the velocity of convergence in the Modified Local Search (MLS) for the MAX-CUT problem. Given a graph G = (V, E) with non-negative edge weights w e, the MAX-CUT problem consists of finding a subset of vertices S V such that the sum of weights of all edges that connects S with S C is maximum. This problem is NP Hard, and there exist different approaches for defining approximation algorithms for this problem. Some algorithms exploite the similar structure of a quadratic programming relaxation of its programming form, achieving good factors. In this case we will focuse on the Modified Local Search. Start with a feasible cut, and substitute it in a cycle, whenever we can assure that the total weight of the new cut is a factor (1 + ɛ) better than the previous one. Let us study the termination of this metaheuristic, and which is the impact of the parameter ɛ in the solution. Precisely, we would like to assure a factor of approximation given any ɛ. Recall that the Local Search and its variants frequently depend strongly on the neighborhood structure. In this case we will consider the neighbors of a cut as the ones generated by the first by only one movement of a vertex: N(S) = {S {y}, y / S} {S {x}, x S}. In Step i we substitute S i by S i+1 only if f(s i+1 ) > (1 + ɛ)f(s). Then, the algorithm stops when we cannot improve. This last means that either if we add any vertex to S k or remove one, the new cut cannot improve the quality of the cut in a factor higher than 1 + ɛ. If we call S to the output cut of the MLS algorithm, we know that the next conditions are met: f(s {x}) = w uv (1 + ɛ)f(s), x S (14) f(s {y}) = u S {x} v S C {x} u S {y} v S C {y} w uv (1 + ɛ)f(s), y S C (15) The condition (14) means that a substraction of any vertex in S does not improve the cut in more than 1 + ɛ, and condition (15) is analogous but considering an addition of a vertex. It is easy to relate the change of quality of the cut with the previous quality f(s): observe that when we move one vertex v from S to S C, we loose the sum weight of all edges (v, y), y S C and win the new edges of the cut (x, v) : x S {v} (it is assumed that we refer always to edges in E). Consequently, it is possible to express the quality of the neighbors of S in the following way: f(s {x}) = f(s) + w xu w xv, x S (16) u S {x} v S C f(s {y}) = f(s) + w yv w uy, y S C (17) v S C {y} u S Substituting (16) and (17) in (14) and (15) respectively, and canceling f(s) in each term, we get that: w xv w xu ɛf(s), x S (18) v S C u S {x} w uy w yv ɛf(s), y S C (19) u S v S C {y} 12

Observe that if we now sum over all vertices x S the equations in (18), in the first side we obtain x S v S w C xv = f(s), and that the sum x S u S {x} w xu counts exactly twice each edge inside S. Analogously summing (19) over all y S C we get the two expressions: f(s)(1 + ɛ S ) 2 f(s)(1 + ɛ S C ) 2 e E (S S) e E (S C S C ) w e (20) w e. (21) Now, if we define W as the total sum weight of all edges in E, we know that W = f(s) + e E (S S) w e + e E (S C S C ) w e, because all edges are either inside S, inside S C or connecting S with S C (in the cut contributing with f(s)). Observing that all cuts achieve a sum weight bounded by W and by summing the expressions (20) and (21) we finally obtain a lower bound for f(s): f(s)(2 + ɛ V ) 2W 2f(S) f(s) 2W 4 + ɛ V 2OP T 4 + ɛ V, (22) where OP T is the weight of an optimum cut for the weighted graph G. The expression (22) shows that the approximation factor is 2 + ɛ V. We know that the traditional local search is 2 defined by ɛ = 0 (another concept of local search is taking always the best neighbor), case which achieves a factor 2. Observe that the factor obtained via MLS cannot be as better as 2, and increases with ɛ. However, as ɛ increases the velocity of convergence does, basically because the steps are higher (the number of iterations in the MLS cannot be bigger than ). Finally, if we wish a factor of approximation 2 + δ, then ɛ must be not bigger ln OP T f(s 0 ) ln(1+ɛ) than ɛ max = ln( OP T f(s 0 ) ) ln(1+ɛ). In conclusion, there is a tradeoff between the guarantee of the factor approximation and the velocity of convergence to the output cut in MAX-CUT. Intuitively, the parameter ɛ in the MLS plays at the same time the role of a measure of improvement in each step and the velocity of convergence. Practically, if the combinatorial problem has a rich domain and solutions with high quality variance, ɛ should not be extremely high, because the quality of the solution could be poor. On the opposite side, if ɛ is too small, the algorithm tends to a local search, which might take several computational effort for obtaining an output. The tradeoff for the particular application of the Modified Local Search in MAX-CUT is summarized in the next two propositions (the first was proved and the second is direct from definitions): δ > 0(ɛ 2δ V MLS F actor(2+δ)) (23) ɛ > 0,k max where k max is the maximum number of iterations. ln( OP T f(s 0 ) ) ln(1 + ɛ), (24) 13

SFMAX Complexity and a LS Approach In this problem we will prove that the SFMAX (Submodular Function Maximization) problem is NP-Hard. Finally, we will investigate some consequences of obtaining a local maximum for this problem. Consider a finite ground set V and a submodular set function f : 2 V R. The SFMAX problem consists of finding the subset X = arg max S V {f(s)}. If f is also increasing the problem is trivial, since X = V. W.l.o.g. we will suppose that for the empty set φ we have that f(φ) = 0. In order to prove the hardness of the SFMAX problem we will assume that the MAX-CUT problem is NP-Hard. Recall the MAX-CUT problem: given a graph G = (V, E) where its edges have non-negative weights, find a set of vertices S such { that the sum-weight of edges that connect S with V S is maximum: X = arg max S V f(s) = s S t I s + S w(s, t) }. C We will use the next Lemma: Lemma 7. The objective function f of the MAX-CUT problem is submodular Proof. We have to prove that f(s)+f(t ) f(s T )+f(s T ) for every pair of sets of vertices S and T, where f(s) = s S t I s + S w(s, t). Denote with W (S, T ) the sum weight of edges C that link nodes in S with nodes in T. Given any pair of subsets of vertices S and T, we can consider the next partition of the ground set: V = {T S} {S T } {S T } { (S T ) C}. Now, by the definition of f and additivity, the next relations hold: f(s) = W (S T, T ) + W (S T, (S T ) C ) + W (S T, T S) + W (S T, (S T ) C ) (25) f(t ) = W (T S, S) + W (T S, (S T ) C ) + W (S T, S T ) + W (S T, (S T ) C ) (26) f(s T ) = W (S T, (S T ) C ) + W (S T, (S T ) C ) + W (T S, (S T ) C ) (27) f(s T ) = W (S T, T S) + W (S T, S T ) + W (S T, (S T ) C ). (28) Observe that f(s) + f(t ) has eight terms: the six corresponding to f(s T ) + f(s T ) and two more. More explicitly: f(s) + f(t ) = f(s T ) + f(s T ) + W (T S, S) + W (S T, T ) f(s T ) + f(s T ), (29) where the last inequality holds because the weights are non-negative. For the particular cut of the empty set of vertices we know that f(φ) = 0 in the MAX-CUT problem, compatible with instances of the SFMAX. Observe that if we can solve the general SFMAX, particularly we solve the MAX-CUT problem, which is NP-Hard. This is because using Lemma 7 every instance of the MAX-CUT problem is characterized by a maximization of a submodular function f. Observe that the polynomial reduction that translates the SFMAX problem to the MAX-CUT is the identity, using the same function f in both problems. This shows that the SFMAX problem is NP-Hard. 14

Having understood the hardness of the SFMAX problem, let us focuse the attention on defining an approximation factor knowing a local maximum for the SFMAX. Let S be a local maximum for the neighborhood structure defined by moving one element from S to V S or in the inverse direction. The next proposition permits to construct an algorithm with factor 3 given a local maximum (or 2, depending if f satisfies an additional property stated next). Proposition 8. If S achieves a local maximum for the SFMAX, then: i) f(r) f(s) R S ii) f(t ) f(s) T : S T iii) S = arg max S,V S {f(s), f(v S)} f(s ) OP T 3 iii) If also f(t ) = f(v T ) T N, then f(s ) OP T 2 Proof. i) Let R be any subset of S and {x} one element of R. By submodularity of f we know that f(r) + f(s {x}) f(r S {x}) + f(r {S {x}}) = f(s {x}) + f(r {x}) (30) Now, exploiting the fact that S achieves a local maximum we know that f(s) f(s {x}). So substituting in (30) we get that f(r {x}) f(r), R S, x R. This last states a condition stronger that lower values of f for subsets inside S: the substraction of one element of a set inside S reduces the functional value of the set. This means that f is monotonous by inclusion, inside S. ii) Analogous to (i): if S R we can take any element outside R and by submodularity we have that f(r) + f(s {x}) f(r {x}) + f(s). Moreover, S {x} is in the neighborhood of S and f(s) f(s {x}). As a consequence, f(r) f(r {x}), R : S R, x / R. The addition of elements of a set that includes S decreases the functional value. iii) Let S be the the set in V that achieves the global optimum. Given that f is submodular we know that f(s S )+f(v S) f(v )+f(s S C ). Using parts (i) and (ii) in particular it holds that f(s S ) f(s) and f(s S ) f(s). Joining these inequalities, if we call S = arg max S,V S {f(s), f(v S)} we obtain that: 3f(S ) 2f(S)+f(V S) f(s S )+(f(s S )+f(v S)) f(s S )+(f(v )+f(s S C )) (31) Now, considering the next partition of S and the submodularity of f we have that: OP T = f(s ) f(s S) + f(s S C ) f(φ) = f(s S) + f(s S C ), (32) where the last equality holds because f(φ) = 0. Finally, combining the chain of inequalities in (31) with (32) we get that: 3f(S ) OP T f(s S ) + f(s S C ) + f(v ) OP T + f(v ) OP T, (33) because f is non-negative. This assures that the best between the local maximum S and its complement V S achieves a factor-3 approximation. 15

iv) Moreover, if it holds that f(t ) = f(t C ) for every subset T V, using part (ii) we have that f(s) f((s ) C S) = f(s S C ). Because of part (i) we know that f(s) f(s S). Joining there inequalities we get that 2f(S) f(s S C )+f(s S) f(s ) f(φ) = OP T. Then, S directly achieves a 2-factor approximation for this case, and naturally, S also. Recall that a Local Search can find local maximum, for example with the structure of substracting or adding one element starting by the empty set (or ground set). A simple algorithm of applying in a first block a Local Search with output S and finally returning the best between S and T S assures, by Property 8, a factor 3 approximation for the SFMAX problem. 16

Inapproximability and the CLIQUE Problem Let G = (V, E) be a graph. An independent set of vertices in G is a set of vertices which do not share any edge in common. The maximum cardinality of an independent set is denoted with α(g). Another related concept is the size of the maximum clique inside G, which is denoted with w(g). Last but not least, the minimum cardinality of a vertex cover of G is denoted with γ(g). In this opportunity we will show two simple equalities that relate these important parameters. These relations have interesting impacts in the Vertex Cover, MAX-STABLE and CLIQUE problems in a graph. Lemma 9. α(g) = w(g C ) Proof. The proof is direct from their definitions. If we can find a maximum cardinality clique in G C, then those vertices are independent in G, so α(g) w(g C ). Moreover, if we take any independent set of vertices in G, by independence they must be all directly connected in G C, so α(g) w(g C ). Then for every graph we get that α(g) = w(g C ). Lemma 10. γ(g) + α(g) = n, where n is the number of vertices of G. Proof. The complement of each set cover must be independent (otherwise there would be an edge by a pair of not taken vertices, in contradiction of the definition of a set cover). Particularly, the minimum cardinality of a set cover does not take the maximum cardinality of all independent vertices. This partition of vertices of the graph shows that γ(g)+α(g) = n, as we wanted to prove. Let us study now the consequences of the previous simple relations between α, γ and w. First of all, it is clear from Lemmas 9 and 10 that if we know one parameter then we deduce the other two. However, it is paradoxical that the approximability results for the corresponding problems are naturally different. It is easy to find an n approximation for the CLIQUE problem, because all cliques are smaller than the number of vertices of the graph. However, it has been proved that given any ɛ > 0, there is no polynomial time algorithm that approximates the CLIQUE problem within a factor n 1 ɛ (unless NP = ZP P ). This strong inapproximability result has a direct impact in the MAX-STABLE set problem. Suppose by contradiction that we find a factor p(n) < n 1 ɛ for the MAX-STABLE problem. Then we can always obtain a p(n)-approximation for the clique, simply complementing the graph and applying a p(n)-approximation to find the maximum independence set in the graph obtained. Those vertices are connected in the original graph, and we would get a p(n)-approximation for the CLIQUE problem. This shows that there is no polynomial approximation factor for the MAX-STABLE problem of factor n 1 ɛ, unless NP = ZP P. It is interesting to note that although the minimum cardinality cover γ complies by its definition that γ = n α, there exists a factor-2 approximation for the Vertex Cover problem. It is easily achievable by finding a maximal matching for example. If we assume that P NP, the best factor for the Vertex Cover is 10 5 21. By other side, assuming the Unique Games Conjecture a stronger version of inapproximability states that 2 is the best possible factor for the Vertex Cover. Actually, the PCP (Probabilistically Chekable Proofs) allows important progress in the area of Approximation Algorithms. 17