CS124 Lecture 7 Fall 2018 Disjoit set (Uio-Fid) For Kruskal s algorithm for the miimum spaig tree problem, we foud that we eeded a data structure for maitaiig a collectio of disjoit sets. That is, we eed a data structure that ca hadle the followig operatios: MAKESET(x) - create a ew set cotaiig the sigle elemet x UNION(x,y) - replace two sets cotaiig x ad y by their uio. FIND(x) - retur the ame of the set cotaiig the elemet x Naturally, this data structure is useful i other situatios, so we shall cosider its implemetatio i some detail. Withi our data structure, each set is represeted by a tree, so that each elemet poits to a paret i the tree. The root of each tree will poit to itself. I fact, we shall use the root of the tree as the ame of the set itself; hece the ame of each set is give by a caoical elemet, amely the root of the associated tree. It is coveiet to add a fourth operatio LINK(x,y) to the above, where we require for LINK that x ad y are two roots. LINK chages the paret poiter of oe of the roots, say x, ad makes it poit to y. It returs the root of the ow composite tree y. With this additio, we have UNION(x, y) = LINK(FIND(x),FIND(y)), so the mai problem is to arrage our data structure so that FIND operatios are very efficiet. Notice that the time to do a FIND operatio o a elemet correspods to its depth i the tree. Hece our goal is to keep the trees short. Two well-kow heuristics for keepig trees short i this settig are UNION BY RANK ad PATH COMPRESSION. We start with the UNION BY RANK heuristic. The idea of UNION BY RANK is to esure that whe we combie two trees, we try to keep the overall depth of the resultig tree small. This is implemeted as follows: the rak of a elemet x is iitialized to 0 by MAKESET. A elemet s rak is oly updated by the LINK operatio. If x ad y have the same rak r, the ivokig LINK(x,y) causes the paret poiter of x to be updated to poit to y, ad the rak of y is the updated to r + 1. O the other had, if x ad y have differet rak, the whe ivokig LINK(x,y) the paret poit of the elemet with smaller rak is updated to poit to the elemet with larger rak. The idea is that the rak of the root is associated with the depth of the tree, so this process keeps the depth small. (Exercise: Try some examples by had with ad without usig the UNION BY RANK heuristic.) 7-1
Lecture 7 7-2 The idea of PATH COMPRESSION is that, oce we perform a FIND o some elemet, we should adjust its paret poiter so that it poits directly to the root; that way, if we ever do aother FIND o it, we start out much closer to the root. Note that, util we do a FIND o a elemet, it might ot be worth the effort to update its paret poiter, sice we may ever access it at all. Oce we access a item, however, we must walk through every poiter to the root, so modifyig the poiters oly chages the cost of this walk by a costat factor. procedure MAKESET(x) p(x) := x rak(x) := 0 fuctio FIND(x) if x p(x) the p(x) := FIND(p(x)) retur(p(x)) fuctio LINK(x,y) if rak(x) > rak(y) the x y if rak(x) = rak(y) the rak(y) := rak(y) + 1 p(x) := y retur(y) procedure UNION(x,y) LINK(FIND(x),FIND(y)) I our aalysis, we show that ay sequece of m UNION ad FIND operatios o elemets take at most O((m + )log ) steps, where log is the umber of times you must iterate the log 2 fuctio o before gettig a umber less tha or equal to 1. (So log 4 = 2,log 16 = 3,log 65536 = 4.) We should ote that this is ot the tightest aalysis possible; however, this aalysis is already somewhat complex! Note that we are goig to do a amortized aalysis here. That is, we are goig to cosider the cost of the algorithm over a sequece of steps, istead of cosiderig the cost of a sigle operatio. I fact a sigle UNION or FIND operatio could require O(log ) operatios. (Exercise: Prove this!) Oly by cosiderig a etire sequece
Lecture 7 7-3 of operatios at oce ca obtai the above boud. Our argumet will require some iterestig accoutig to total the cost of a sequece of steps. We first make a few observatios about rak. if v p(v) the rak(p(v)) > rak(v) wheever p(v) is updated, rak(p(v)) icreases the umber of elemets with rak k is at most 2 k the umber of elemets with rak at least k is at most 2 k 1 The first two assertios are immediate from the descriptio of the algorithm. The third assertio follows from the fact that the rak of a elemet v chages oly if LINK(v,w) is executed, rak(v) = rak(w), ad v remais the root of the combied tree; i this case v s rak is icremeted by 1. A simple iductio the yields that whe rak(v) is icremeted to k, the resultig tree has at least 2 k elemets. The last assertio the follows from the third assertio, as j=k 2 j = 2 k 1. Exercise: Show that the maximum rak a item ca have is log. As soo as a elemet becomes a o-root, its rak is fixed. Let us divide the (o-root) elemets ito groups accordig to their raks. Group i cotais all elemets whose rak r satisfies log r = i. For example, elemets i group 3 have raks i the rage (4,16], ad the rage of raks associated with group i is (2 i 1,2 2i 1 ). For coveiece we shall write this more simply by sayig group (k,2 k ] to mea the group with these raks. It is easy to establish the followig assertios about these groups: The umber of distict groups is at most log. (Use the fact that the maximum rak is log.) The umber of elemets i the group (k,2 k ] is at most 2 k. Let us assig 2 k tokes to each elemet i group (k,2 k ]. The total umber of tokes assiged to all elemets from that group is the at most 2 k =, ad the total umber of groups is at most log, so the total umber of 2 k tokes give out is log. We use these tokes to accout for the work doe by FIND operatios. Recall that the umber of steps for a FIND operatio is proportioal to the umber of poiters that the FIND operatio must follow up the tree. We separate the poiters ito two groups, depig o the groups of u ad p(u) = v, as follows:
Lecture 7 7-4 Type 1: a poiter is of Type 1 if u ad v belog to differet groups, or v is the root. Type 2: a poiter is of Type 2 if u ad v belog to the same group. We accout for the two Types of poiters i two differet ways. Type 1 liks are charged directly to the FIND operatio; Type 2 liks are charged to u, who pays for the operatio usig oe of the tokes. Let us cosider these charges more carefully. The umber of Type 1 liks each FIND operatio goes through is at most log, sice there are oly log groups, ad the group umber icreases as we move up the tree. What about Type 2 liks? We charge these liks directly back to u, who is supposed to pay for them with a toke. Does u have eough tokes? The poit here is that each time a FIND operatio goes through a elemet u, its paret poiter is chaged to the curret root of the tree (by PATH COMPRESSION), so the rak of its paret icreases by at least 1. If u is i the group (k,2 k ], the the rak of u s paret ca icrease fewer tha 2 k times before it moves to a higher group. Therefore the 2 k tokes we assig to u are sufficiet to pay for all FIND operatios that go through u to a paret i the same group. We ow cout the total umber of steps for m UNION ad FIND operatios. Clearly LINK requires just O(1) steps, ad sice a UNION operatio is just a LINK ad 2 FIND operatios, it suffices to boud the time for at most 2m FIND OPERATIONS. Each FIND operatio is charged at most log for a total of O(mlog ). The total umber of tokes used at most log, ad each toke pays for a costat umber of steps. Therefore the total umber of steps is O((m + )log ). Let us give a more equatio-orieted explaatio. The total time spet over the course of m UNION ad FIND operatios is just We split this sum up ito two parts: (# liks passed through). (# liks i same group) + (# liks i differet groups). (Techically, the case where a lik goes to the root should be hadled explicitly; however, this is just O(m) liks i total, so we do t eed to worry!) The secod term is clearly O(mlog ). The first term ca be upper bouded by: (# raks i the group of u), all elemets u
Lecture 7 7-5 because each elemet u ca be charged oly oce for each rak i its group. (Note here that this is because the liks to the root cout i the secod sum!) This last sum is bouded above by This completes the proof. (# items i group) (# raks i group) all groups log k=1 2 k 2k log. x y UNION(x,y) y x a a b FIND(d) c b c d d Figure 7.1: Examples of UNION BY RANK ad PATH COMPRESSION.