find (x): given element x, return the canonical element of the set containing x;

COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method: for each set, the algorthm mantans a canoncal element (arbtrary but unque), holdng any desred nformaton about the set. Two low-level operatons: fnd (x): gven element x, return the canoncal element of the set contanng x; ln(x, y): gven canoncal elements x and y, destructvely unte the sets contanng them, and mae x or y the canoncal element of the new set. (Do nothng f x = y.) Then unte can be mplemented as follows: unte (x, y) : unte the sets contanng (arbtrary) elements x and y (f they dffer) unte (x, y) = ln (fnd (x), fnd (y)) Tree-based mplementaton: the elements of each set form a tree, wth each node pontng to ts parent, va a p ponter. Each tree root ponts to tself. Assume n sngleton sets ntally (p(x) = x for every x ntally); m total fnds, nterspersed wth lns; m n.

To perform fnd, follow parent ponters to tree root. To perform compresson after a fnd, mae every node on the fnd path pont drectly to the root. Lnng by ran (ran s maxmum length, n edges, of an uncompressed path from a descendant) r (x) = 0 for every x ntally. To ln x and y, mae the smaller-raned root pont to the larger; n case of a te, ncrease the ran of the new root by one. Queston: What s the total tme for m fnds nterspersed wth lns? Answer: Om ( α( n)), where A ( ) 0 x = x + for x A ( ) + x = A x+ ( x) for x 0 + ( A( x) = x, A ( x) = A( A( x))) α ( n) = the smallest such that A () n From these defntons, A( x) = x+, A( x) > x A3 ( x ) >. } x + and α ( n) grows very slowly. Exercse: Prove that A ( x) s an ncreasng functon of both and x. To prove the Om ( α( n)) bound we use an amortzed analyss.

Observe that the ran of a node x starts at 0, can ncrease but not decrease whle x s a tree root, and remans constant once x s a nonroot. Observe also that r(p(x)) > r(x). Once x has a parent, r(x) s constant, but r(p(x)) can ncrease (but not decrease), ether because p(x) changes due to a compresson or r(p(x)) changes due to a ln. The maxmum node ran s at most n. (Why?) (Actually, t s at most lgn, but we won t use ths.) We wll defne a potental functon that assgns a non-negatve nteger potental of at most α( nrx ) ( ) to each node x; the total potental s the sum of all the node potentals. Any tree root x has potental α ( nrx ) ( ). (Thus the total ntal potental s 0.) Let x be a nonroot wth r(x). Defne the level of x, denoted by (x), to be the largest for whch r(p(x)) A (()). r x We have A (()) r x = r(x) + r(p(x)) and 0 Aα( n) (()) r x Aα( n) () n > n r( p()). x Thus (x) s well-defned and 0 (x) < α (n). Furthermore, snce r(p(x)) can never decrease, (x) can never decrease, only ncrease. Defne the ndex of x, denoted by (x), to be the largest for whch r(p(x)) A ( x) (r(x)). We have A ( x) (r(x)) = A ( x) (r(x)) r(p(x)) by the defnton of (x), and r( x) + A( x) (()) r x = A ( x) + (r(x)) > r(p(x)), by the defntons of A and (x). Thus (x) s welldefned and (x) r(x). Also, snce r(p(x)) can never decrease, (x) cannot decrease unless (x) ncreases: whle (x) remans constant, (x) can only ncrease or stay the same. Now we are ready to defne the potental of a node x. φ ( x) = α ( nrx ) ( ) f x s a root or r(x) = 0 φ ( x) = ( α( n) ( x)) r( x) ( x) f x s a nonroot and r(x) > 0 We defne the total potental Φ to be the sum over all nodes x of φ ( x). 3

Let us show that 0 φ( x) α( nrx ) ( ) for every node x. Ths s obvous f x s a root or r(x) = 0. Suppose x s a nonroot and r(x) > 0. Snce x ( ) α( n) and x ( ) rx ( ), φ( x) r( x) ( x) 0. Snce x ( ) 0 and x ( ), φ( x) α( n) r( x). What remans s to show that the amortzed cost of a ln or fnd s O( α ( n)). Frst consder a ln, say ln (x, y). Wthout loss of generalty suppose the ln maes y the new root. The actual cost of the ln s (order of) one. The potental of any node other than y can only decrease. (Exercse: show ths.). The potental of y stays the same or ncreases by α( n), snce r(y) stays the same or ncreases by one. Thus the ncrease of Φ due to the ln s at most α ( n), and the amortzed cost of the ln s at most α ( n) +. Consder a fnd wth compresson. The actual cost of the fnd s (order of) the number of nodes on the fnd path. No node can have ts potental ncrease as a result of the fnd. (Exercse: prove ths.) We shall show that f l s the number of nodes on the fnd path, at least max { 0, l ( α( n) + ) } of these nodes have ther potental decrease (by at least one) as a result of the compresson. Ths mples that the amortzed cost of the fnd s at most α ( n) +. Specfcally, let x be a node on the fnd path such that r(x) > 0 and x s followed on the fnd path by another nonroot node y such that (y) = (x). All but at most α ( n) + nodes on the fnd path satsfy ths constrant; those that do not are the frst node on the path (f t has ran zero), the last node on the path (the root), and the last node on the path of level, for each possble n the range 0 < α( n). Let = (x) = (y). Before the compresson, r p x A r x r p y A r y and ( x) ( ( )) ( ( )), ( ( )) ( ( )), r( y) r( p( x)). These nequaltes mply r( p( y)) A ( r( y)) A ( r( p( x))) A A r x ( x) ( ( ( ))) ( x) + = A (()), rx whch means that the compresson causes (x) to ncrease or (x) to ncrease, n ether case decreasng φ( x) by at least. (Exercse:prove ths.) 4

Suppose now that nstead of usng path compresson and lnng by ran, we use path compresson and naïve lnng, n whch we ln x and y by mang x the parent of y. The amortzed analyss that gves α( n) per operaton breas down for two reasons. The frst s that rans (hence levels and ndexes) are undefned. We can fx ths by defnng rans as follows: an ntal sngleton node has ran 0; when a ln of x and y s performed, mang x the parent of y, we replace the ran of x by max{ rx ( ), ry+ ( ) }. Then many of the needed propertes of rans hold. Specfcally, the ran of a node starts at 0, can ncrease but not decrease whle x s a root, and remans constant once x s a non-root. Also, r( p( x)) > r( x), and once x has a parent rxremans ( ) fxed but r( p( x)) can only ncrease. Even wth ths defnton of ran, the analyss breas down, because the ran of a root can ncrease by up to n durng a ln, and the amortzed tme for a ln s no longer O( α ( n)). We can obtan an O(log n) amortzed tme bound per operaton n ths case by changng the potental functon, however. We defne the log-level of a non-root node x to be gx ( ) = lg rpx ( ( )) rx ( ). Then 0 gx ( ) lg n. We defne the potental of a node x to be 0 f t s a root and lg n g( x) f t s a non-root. Then the ntal potental s zero, and the potental s always non-negatve, so the total tme of an arbtrary sequence of operatons s at most the sum of ther amortzed tmes. Consder an operaton ln( x, y ). The actual tme s O(). The potental of every node except x ether stays the same or decreases. The potental of x can ncrease by at most lg n (from 0 to lg n ). Thus the amortzed tme of a ln s O(lg n ). Consder a fnd wth compresson. Let l be the number of nodes on the fnd path. No node can have ts potental ncrease as a result of the fnd. We shall show that at least max{0, l lg n } nodes on the fnd path have ther potental decrease by at least one as a result of the compresson. Ths mples that the amortzed tme of the fnd s at most lg n +. Let x be a node on the fnd path such that x s followed on the fnd path by another non-root node y such that g( y) = g( x). All but at most lg n + nodes on the fnd path (the last for each possble log-level and the root) satsfy ths property. Let g = g( x) = g( y). Before the compresson, r( p( x)) r( x) g g and r( p( y)) r( y). After the compresson, the new parent of x has ran at least that of the old parent of y, g g g+ whch means that r( p ( x)) r( x) + =, where p denotes the new parent, and the compresson causes the log-level of x to ncrease by at least one, and hence ts potental to decrease by at least one. 5