CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain, with sets of a ertain harateristis Many of these problems an be expressed abstratly as the set over problem We are given a pair Σ = X, S), alled a set system, where X = {x 1,, x m } is a finite set of objets, alled the universe, and S = {,, s n } is a olletion of subsets of X, suh that every element of X belongs to at least one set of S Set systems arise in many appliations of siene and engineering Undireted Graph: An undireted graph G = V, E) is a set system where V onstitutes the universe, and the edges E are subsets of ardinality two Geometri set systems: The universe onsists of n points in spae, and the sets are the subsets of points that are ontained within some speified geometri shape balls, ubes, retangles, triangles, et) Wireless Coverage: The universe onsists all the loations on a ollege ampus, and for eah possible loation of a wireless transmitter there is an assoiated region of the ampus that is overed by plaing a wireless transmitter at this loation A fundamental question involving set systems is determining the smallest number of sets needed to over the entire universe A over of S is defined to be a subolletion of sets whose union overs X For example, in Fig 1a), we elements of X are the blak irles, and the sets,, are indiated by retangles In this ase there exists a over of size three, onsisting of s 3, s 4, and s 5 see Fig 1b)) The sets of this over do not overlap, whih is sometimes alled an exat over In general, the sets of the over are allowed to overlap) a) b) Fig 1: Set over The optimum over onsists of the three sets {s 3, s 4, s 5 } Notie that the output of set over is not a set, but rather a set of sets If we think of the sets of S as being indexed by the integers from 1 to n, then we an think of a over C more onveniently as a subset of {1,, n} This suggests the following definition Leture 9 1 Fall 2017
Set Cover Problem: Given a set system Σ = X, S), where S = {,, s n }, ompute a set C {1,, n} of minimum ardinality suh that X = The set over problem is a very important and powerful optimization problem It arises in a vast number of appliations Determining the fewest loations to plae wireless transmitters to over the entire ampus is an example Unfortunately, the set over problem is known to be NP-hard We will present a simple greedy heuristi for this problem How do we determine how good our approximation is? Given an input instane Σ = X, S), let OΣ) denote an optimum over of minimum ardinality) and let GΣ) denote the over produed by our greedy heuristi Clearly, greedy annot have fewer sets than the optimum, and so we have GΣ) OΣ) We say that G ahieves an approximation ratio of ρ if GΣ) ρ OΣ), for any input Σ Ideally, we would like ρ to be as small as possible, say, a small onstant Unfortunately, the best that we an show for set over is that ρ is a slowly growing funtion of m = X, and in partiular ρ = ln m This might strike you as being rather weak, but there are ompelling reasons from the theory of omputational omplexity that logarithmi approximation ratio is the best that we might hope for assuming that P NP With a bit more work, it is possible to improve this slightly to an approximation ratio of ρ = ln m ), where m is the maximum ardinality of any set of S) Greedy Set Cover: A simple greedy approah to set over works by at eah stage seleting the set that overs the greatest number of unovered elements The algorithm is presented in the ode blok below The set C ontains the indies of the sets of the over, and the set U stores the elements of X that are still unovered Initially, C is empty and U X We repeatedly selet the set of S that overs the greatest number of elements of U and add it to the over Greedy Set Cover Greedy-Set-CoverX, S) { U = X // U stores the unovered items C = empty // C stores the sets of the over while U is nonempty) { selet s[i] in S that overs the most elements of U add i to C remove the elements of s[i] from U } return C } i C s i We will not worry about implementing this algorithm in the most effiient manner If we assume that U and the sets s i are eah represented as a simple list of elements of X eah of length at most m), then we an perform eah iteration of the main while loop in time Omn), for a total running time of Omn 2 ) For the example given earlier the greedy-set over algorithm would selet see Fig 2a)), then see Fig 2b)), then see Fig 2)) and finally s 3 see Fig 2d)) Thus, it would return a set over of size four, whereas the optimal set over has size three Leture 9 2 Fall 2017
over overs 3 over s 3 over a) b) ) d) Fig 2: The greedy heuristi Final over is {,,, s 3 } What is the approximation fator? The problem with the greedy heuristi is that it an be fooled into piking the wrong set, over and over again Consider the example shown in Fig 3 involving a universe of 32 elements The optimal set over onsists of sets s 7 and s 8, eah of size 16 Initially all three sets, s 7, and s 8 have 16 elements If ties are broken in the worst possible way, the greedy algorithm will first selet the set We remove all the overed elements Now, s 7 and s 8 all over eight of the remaining elements Again, if we hoose poorly, is hosen The pattern repeats, hoosing s 3 overing four of the remainder), s 4 overing two) and finally s 5 and eah overing one) Although there are ties for the greedy hoie in this example, it is easy to modify the example so that the greedy hoie is unique s 7 s 8 s 5 s 4 s 3 Opt: {s 5, } Fig 3: Repeatedly fooling the greedy heuristi Greedy: {,, s 3, s 4, s 5, } From the pattern, you an see that we an generalize this to any number of elements that is a power of 2 While there is a optimal solution with 2 sets, the greedy algorithm will selet roughly lg m sets, where m = X Reall that lg denotes logarithm base 2, and ln denotes the natural logarithm) Thus, on this example the greedy heuristi ahieves an approximation fator of roughly lg m)/2 There were many ases where ties were broken badly here, but it is possible to redesign the example suh that there are no ties, and yet the algorithm has essentially the same ratio bound We will show that the greedy set over heuristi never performs worse than a fator of ln m Before giving the proof, we need one useful tehnial inequality Leture 9 3 Fall 2017
Lemma: For all > 0, 1 1 ) 1 e where e is the base of the natural logarithm Proof: We use the fat that for any real z positive, zero, or negative), 1 + z e z This follows from the Taylor s expansion e z = 1 + z + z2 2! + z3 3! + 1 + z) Now, if we substitute 1/ for z we have 1 1/) e 1/ By raising both sides to the th power, we have the desired result We now prove the approximation bound Theorem: Given any set system Σ = X, S), let G be the output of the greedy heuristi and let O be an optimum over Then G O ln m, where m = X Proof: We will heat a bit Let denote the size of the optimum set over, and let g denote the size of the greedy set over minu We will show that g ln m Note that we should really show that g + 1 ln m, but this is lose enough and saves us some messy details) Let s onsider how many new elements we over with eah round of the algorithm Initially, there are m 0 = m elements to be overed Let m i denote the number of elements remaining to be overed after i iterations of the greedy algorithm After i 1 rounds there are m i 1 elements that remain to be overed We know that there is a over of size for these elements namely, the optimal over), and so by the pigeonhole prinipal there exists some set that overs at least m i 1 / elements Sine the greedy algorithm selets the set overing the largest number of remaining elements, it must selet a set that overs at least this many elements The number of elements that remain to be overed is at most m i m i 1 m i 1 = m i 1 1 1 ) Thus, with eah iteration the number of remaining elements dereases by a fator of at least 1 1/) If we repeat this i times, we have m i m 0 1 1 ) i = m 1 1 ) i How long an this go on? Sine the greedy heuristi ran for g + 1 iterations, we know that just prior to the last iteration we must have had at least one remaining unovered element, and so we have 1 m g m 1 1 ) g = m 1 1 ) ) g/ In the last step, we just rewrote the expression in a manner that makes it easier to apply the above tehnial lemma) By the above lemma we have ) 1 g/ 1 m e Leture 9 4 Fall 2017
Now, if we multiply by e g/ on both sides and take natural logs we find that g satisfies: e g/ m g ln m g ln m Therefore ignoring the missing +1 as mentioned above) the greedy set over is larger than the optimum set over by a fator of at most ln m Leture 9 5 Fall 2017