Possible nd Impossible Vector Clock Sets Estebn Meneses nd Frncisco J. Torres-Rojs Abstrct It is well known tht vector clocks cpture perfectly the cuslity reltionship mong events in distributed system. However, there re some interesting properties of vector clocks tht re still to be explored. In prticulr, we re interested in discovering whether there is n efficient procedure for deciding if given set of vector clocks is contined in some distributed history. We cll this the possible vector clock set problem. Index Terms Distributed Computing, Logicl Clocks, Vector Clocks. I. INTRODUCTION SOME reltionships mong events hppening in rel life cn be deduced just by compring the times when they hppened. Typiclly, if event occurred before event b we cn sy tht is potentilly cuse for b. The sme hppens with events in distributed system. The generl problem here is to find mechnism such tht, given two rbitrry events nd b, it cn be estblished which the cuslity reltionship between them is or, if it were the cse, conclude tht both events re concurrent [6]. Vector clocks [3], [5] define technique to determine precisely the cuslity reltionship mong events in distributed system, including the concurrent cse. Given distributed system with N sites, ech site keeps N entry vector of integers where the j-th entry ccounts for the quntity of events occurred in site j tht re known to this site. When event hppens t certin site, it is timestmped with the current vector clock of the site. There re few, but precise, rules for updting locl clocks when messge in sent or received. By compring the vector clocks ssigned to ny two rbitrry events, it cn be precisely determined the cuslity reltion between them. Besides this well known property of vector clocks, there re still Estebn Meneses (estebn.meneses@predisoft.com), Predisoft nd Centro de Investigción en Computción e Informátic Avnzd (CIenCIA), Cost Ric. Frncisco J. Torres-Rojs (torres@ic-itcr.c.cr), Cost Ric Institute of Technology nd Centro de Investigción en Computción e Informátic Avnzd (CIenCIA), Cost Ric. number of interesting questions bout vector clocks tht re worthy of exploring [8]. In prticulr, we propose the possible (or impossible) vector clock set problem, where, given n rbitrry, finite set of vector clocks, we re to find distributed history tht contins events timestmped with the vector clocks of the set. As we will show, there exist impossible sets, for which there re no distributed history tht contins them. In Section II, some bsic concepts bout vector clocks re reviewed. The problem of possible sets of vector clocks is presented in Section III. Some properties of these logicl clocks, useful for the problem t hnd, re explined in Section IV. A preliminry pproch to finding distributed history tht contins given set of vector clocks is explored in Section V, nd Section VI gives n exmple of the proposed technique. Finlly, conclusions nd future work re listed in Section VII. II. VECTOR CLOCKS Let s ssume distributed system with N sites, where ll the communictions re mde through messge exchnge. There re three kinds of events: internl, send nd receive. The locl history H i for site i is the totl-ordered sequence of events H i =e i1 e i2... tht re executed t site i. The globl or distributed history H for the distributed system is the prtilly ordered set of events occurring t every site in the system. Leslie Lmport [4] proposed the concept of logicl clocks, i.e., mpping between events in distributed history H nd integer numbers tht cn be used to detect some of the cusl reltionships between events. Although they re very esy to implement, Lmport clocks cnnot cpture ll the cuslity informtion. For exmple, concurrency between events is poorly detected. These clocks re consistent with cuslity, but they do not chrcterize it [6]. Vector clocks [3], [5], [2], [6] consist of mpping between events in distributed history H nd
integer vectors. Ech site i keeps its own vector clock V i with N entries, where N ccounts for the number of sites in the system. Entry V i [i] mintins the locl clock of site i, while V i [j] keeps trck of the ctivity t site j, from the point of view of site i. Ech time n event occurs t site i, its locl clock ticks nd vector clock is ssocited with tht event. Also, every messge sent in the system is piggybcked with the vector clock corresponding to the send event. Site i updtes its vector clock obeying the following rules: V i [j] = 0,0 j N 1 - Initil vlue. Ech time n internl event occurs: V i [i] = V i [i]+ (typiclly =1) When messge with timestmp T is received: V i [j] = mx(v i [j],t[j]),0 j N 1 V i [i] = V i [i]+ Let H i. We sy tht hs timestmp V(), where V() is the vlue of V i t the instnt when ws executed. Vector clocks re compred following these rules: v = w v[j] = w[j],0 j N 1 v w v[j] w[j],0 j N 1 v < w v w nd j such tht v[j] < w[j] v w k such tht v[k] < w[k] nd j such tht v[j] > w[j] Mttern showed in [5] tht there is n isomorphism between timestmps obtined from vector clock when events re executed, nd the cuslity reltionship mong events in H. Thus, vector clocks stisfy the Strong Clock Condition, where nd b H: 1) = b V() = V(b) 2) b V() < V(b) 3) b V() V(b) Given two vector clocks v nd w, only one of four cses might hppen: v = w, v w, w v or v w. Following the comprisons defined bove nd using the fct tht the reltionship genertes prtil order, Hsse digrm is obtined when drwing ll those reltionships [7]. Vector clocks hve severl interesting properties, some of which hve been studied in [2], [8], [9]. Definition 1: If v nd w re two vector clocks, we sy tht t is the mximum of them, denoted s mximum(v,w), if these conditions hold: v t nd w t It doesn t exist vector clock z such tht v z w z z t The definition for minimum is nlogous to the one given for mximum. It cn be esily verified tht given vector clocks v nd w, t is mximum(v,w) iff t[k] = mx(v[k],w[k]), 0 k N 1, nd tht u is minimum(v,w) iff u[k] = min(v[k],w[k]), 0 k N 1. The next two lemms re needed for the min result of next section: Lemm 1: No other site hs more updted informtion bout the ctivity of Site i (i.e., lrger vlue in its i-th entry) thn Site i itself. Proof: From the rules defined previously, it is evident tht site i is the only one tht increments V i [i], nd the vlue of this entry in ny other site is either zero, or the vlue sent (directly or indirectly) by site i, which in turn could hve incremented it since then. Lemm 2: Let,b H. In prticulr, let H k, i.e., ws executed t site k. Then: b V()[k] V(b)[k]. Proof: This is corollry of the isomorphism between vector clocks nd events in the distributed history, nd the comprison of vector clocks defined previously. Since occurred t site k nd it is cuslly before b, then, when b occurs, the timestmp of site k is t lest the one hd, nd vice vers. This result is usully known s the Simple Strong Clock Condition. III. THE POSSIBLE VECTOR CLOCK SET PROBLEM Not every rbitrry set of vector clocks is vlid or possible, in the sense tht it might be impossible to find distributed history H tht contins subset of events such tht their vector clocks correspond to the ones in the given set. In other words, certin combintions of vector clocks cn not coexist in the sme distributed history. Theorem 1 (Deserted Zone): Let 1, 2,..., n be n events in H ssocited with the n vector clocks v 1,v 2,...,v n, respectively. Also, let the n events be concurrent with ech other. If t=mximum(v 1,v 2,...,v n ), then, it cnnot exist n event z H such tht v i V(z) t, for ny i. Proof: By contrdiction. Let s ssume tht there exists n event z H such tht v i V(z) t for some i. Without loss of generlity, let i be
1, nd let z hppen on site k, i.e., z H k. Now, using Lemm 1, we hve v 1 [k] < V(z)[k]. Given tht V(z) t, then v 1 [k] < V(z)[k] t[k] = mx(v 1 [k],v 2 [k],...,v n [k]), which mens definitively tht V(z)[k] v p [k], for some p {2,3,...,n}. In virtue of Lemm 2 nd the isomorphism between vector clocks nd events, V(z)[k] v p [k] implies tht z p, but by hypothesis 1 z, nd, therefore, 1 p, which contrdicts the fct tht 1 nd p re concurrent. Hence, z cnnot exist. Let s consider the cse with n=2, where 1 nd 2 re two concurrent events with vector clocks v 1 nd v 2, respectively. If t=mximum(v 1, v 2 ), then v 1, v 2 nd t define kind of deserted zone, where no event might occur. Hence, Theorem 1 ffirms tht there cnnot be ny event z in the sme distributed history tht contins concurrent events 1 nd 2, with its vector clock lying in the deserted zone induced by 1 nd 2. Notice, however, tht there could exist mny integer vectors which hve this property. Nevertheless, none of them cn be ssocited to ny event in the sme distributed history. From the result of Theorem 1, we relize the existence of certin impossible sets of vector clocks. For instnce, the set {<100, 0, 100>, <0, 100, 0>, <50, 100, 0>} is impossible. Thus, if presented with set of vector clocks, it mkes sense to sk ourselves whether or not this set of vector clocks is possible subset of timestmps seen in distributed history. This question is wht we clled the possible vector clock set problem. In its more generl form, this problem cn be stted s: Given set of vector timestmps, decide whether or not there is distributed history contining them. There is not informtion vilble bout the site where ech vector clock is supposed to hve occurred. There could be version of the problem where the sites corresponding to certin vector clocks in the set re known, while other vector clocks re free to be ssigned to ny site: Given set of vector timestmps, some of them ssigned to specific sites, decide whether or not there is distributed history contining them ll. Fig. 1. <0,0,1> <1,0,0> <2,0,1> <0,1,0> <1,2,0> <1,3,2> <0,0,2> Underlying distributed history for set A Finlly, the, pprently, softest version of the problem pins down ll the vector clocks to known sites: Given set of vector timestmps, ech one ssigned to specific site, decide whether or not there is distributed history contining them. As n exmple of the ltest sttement of the problem, consider the vector clock set A over distributed system with 3 sites: A = { [<2,0,1>, site 1], [<0,1,0>, site 2], [<1,3,2>, site 2], [<0,0,2>, site 3] } To demonstrte tht A is possible set, we just hve to show distributed history H which contins every element of A s timestmp. In this cse, A is possible, becuse there re multiple distributed histories tht stisfy the requirement. Figure 1 shows n exmple of such distributed history (the timestmps from A re written in bold). On the other hnd, the vector clock set B is impossible: B = { [<2,0,1>, site 1], [<0,1,0>, site 2], [<0,1,1>, site 2], [<1,3,2>, site 2], [<0,0,2>, site 3] } To prove tht B is n impossible set, consider the concurrent vector clocks v 1 =< 0,1,0 > nd v 2 =< 2,0,1 >, both in B. Let s sy tht these vector clocks correspond to events 1 nd 2. Following Theorem 1, if we compute the mximum between v 1 nd v 2, we obtin t=< 2,1,1 >. Then, there cn not exist n event z in the sme distributed history s 1 nd 2, such tht its ssocited vector clock V(z)
< 0,1,0 > Fig. 2. < 0,1,1 > z t < 2,1,1 > 1 2 Impossible set due to deserted zone theorem < 2,0,1 > hs the property: v 1 < V(z) < t or v 2 < V(z) < t. However, vector clock < 0, 1, 1 > in B hppens to be between < 0,1,0 > nd t=< 2,1,1 >, which renders set B s impossible. Figure 2 shows grphic representtion of this prticulr sitution. IV. INTERMEDIATE RESULTS In this section, we consider certin properties of vector clocks nd some impossibility results, which re useful to explore the problem of possible vector clock sets. Lemm 3: Let timestmp v correspond to event inh i. Then, for everyj i nd v[j] 0, thev[j]-th event of H j is send. Proof: Left to the reder. Thus, if every timestmp in set S is ssocited with some site, then, just by checking these timestmps, necessry set of send events in the distributed history cn be deduced. However, this resultnt group of send events might not be sufficient for building the underlying distributed history. Let v j nd v k be two timestmps corresponding to events hppening in sites j nd k, respectively. Also, let v j [i] = v k [i] = x (i j nd i k). As Lemm 3 estblished, the x-th event in Site i must be send. In order to construct timestmps v j nd v k, this send must rech, directly or indirectly, sites j nd k. Since our model does not llow multicsting, the x-th event in Site i must be directed to, let s sy, Site j, nd from there to Site k with n extr send. We cll this kind of send deferred send. Lemm 4: Given vector clock set S where ech timestmp is ssocited with some site, if two timestmps (occurring in different sites) coincide in their i-th entry nd neither of them occurred in Site i, deferred send is required in the underlying distributed history. Proof: It is strightforwrd given the previous discussion. Although the im of deferred send is to provide time informtion indirectly, this functionlity could be fulfilled by one of the regulr sends derived from Lemm 3. Thus, it could be the cse in n underlying distributed history, tht some deferred send is collpsed with nother send event. The following lemm proves the existence of impossible vector clock sets with just one element: Lemm 5: The vector clock set S={< 1,1,1 >} is impossible. Proof: Let s suppose, without loss of generlity, tht timestmp < 1,1,1 > occurred t Site 1. So, entry 1 is justified by being the locl stmp, which mens tht this vector clock corresponds to the first event of this site. Lemm 3 indictes tht the first event of both nd must be send events. Notice tht the first event of must be receive. Assume, for instnce, tht it receives messge from. Then, its third entry cn not be 1, since there is no time for such informtion to be trnsmitted. Note tht deferred send is not sufficient either, becuse if sends messge to, then the first event of should be receive event, being unble to send the required informtion to. Theorem 2: Consider distributed system with N sites. Let S = {v 1,v 2,...,v k } be set with k N vector clocks concurrent to ech other nd let t = mximum(v 1,v 2,...,v k ). There cnnot be v i (1 i k) such tht v i [j] < t[j] for ll j = 1,2,...,N. Proof: By contrdiction. Assume tht such v i exists, nd tht it occurred t Site p. So, by ssumption, v i [p] < t[p]. Since t is the mximum of the vector set, there must exist some v j, such tht v j [p] = t[p]. The only wy to explin this vlue of entry v j [p] is vi chin of messges strting with send event t Site p. Let V() be the vector clock of. Clerly, V() < v j, becuse event cuslly precedes the event with timestmpv j. Now, nd the event with timestmp v i come from the sme site p, nd since v i [p] < V()[p], we hve v i < V() < v j, which contrdicts the premise tht ll vector clocks re concurrent.
V. AN EXHAUSTIVE APPROACH By using n exhustive strtegy, it cn be possible to decide whether or not there is n underlying distributed history for some vector clock set. We nlyze solution for the softest version, where ech timestmp is ssocited with some specific site. Strtegies for the other two versions of the problem cn be obtined by extending this one. We re looking for set A of pirs of events of the form < Site : send,site : receive >, where send nd receive indicte the reltive positions of these events in ech locl history. In fct, ny distributed history cn be chrcterized by the number of sites, the number of events in ech site nd set A. Thus, the lgorithm bsiclly tests series of lterntives for the underlying history, until set A tht produces ll required timestmps in the distributed history is obtined. Given vector clock set S ={v 1,v 2,...,v m } with m timestmps, ech one ssocited to some site, the following steps will find n underlying distributed history: Step 1: The number of sites N is equl to the length of every timestmp in S. Step 2: Let t = mximum(v 1,v 2,...,v m ), then t[i] is the number of events in site i, for i=1,2,...,n. Step 3: Obtin ll the necessry send events set R, using Lemm 3. Step 4: Obtin ll the deferred send events set D, using Lemm 4. Step 5: Associte every element of R nd D with pproprite receive events, computing set A If timestmp v ppers in Site i, it is known tht it ws the v[i]-th event in this site. Let s cll this event. Now, we hve to justify ll the other entries in v. Lemms 3 nd 4 estblish the quntity of send events, being deferred or not, tht must rech site i in order to justify timestmp v for event. Definition 2: The receive block of event includes the preceding events of in the sme site nd itself, which cn be used to receive informtion for justify timestmp of event. Figure 3 shows the receive blocks for two different events nd b, occurring in the sme site i, nd whose timestmps must be justified. Notice tht if these timestmps shre the j-th entry (with j i), then j-th must not be justified for in the receive Site i Fig. 3. Fig. 4. b Receive blocks for events b nd, respectively Originl distributed history <4,5,3> <1,6,3> <3,3,5> <3,7,7 <6,5,6> block for. This entry, on the contrry, hs to be kept nd not overridden by messge with lrge j-th entry in its timestmp. Finlly, we hve ll the pieces to solve the softest version of the problem. Notice tht the difference between versions of the problem is the fct tht site informtion is inexistent, incomplete or complete. When informtion bout sites is inexistent, this is prticulr cse of being incomplete. Then, if we hve n lgorithm for detecting possibility or impossibility when site informtion is complete, we cn iterte the lgorithm over the options tht re creted by ssocite the timestmps without site informtion with prticulr one. Those possibilities could be huge, but finite. VI. EXAMPLE: BUILDING UNDERLYING DISTRIBUTED HISTORY This section gives n exmple of the exhustive pproch introduced in the previous section. Suppose we re given set S of timestmps tken from the distributed history in Figure 4 (obviously, we do not know this history): S = { [<4,5,3>, site 1], [<6,5,6>, site 1], [<1,6,3>, site 2], [<3,3,5>, site 3], [<3,7,7>, site 3] } Let s cll the events ssocited with the previous timestmps,b,c,d nd e, respectively. Now, the lgorithm from section V will be pplied to verify if this set hs t lest one underlying or witness distributed history. First of ll, it is known tht N=3, so the system hve 3 sites where we re
to ccommodte ll the required send nd receive events. Secondly, the mximum for ll timestmps ins ist = mx(< 4,5,3 >,< 6,5,6 >,< 1,6,3 >,< 3,3,5 >,< 3,7,7 >) =< 6,7,7 >, which mens tht there re 6 events in, 7 in Site 2 nd 7 in. Using Lemm 3, the setris obtined. Thus, locl events {1,3} in must be sends. Also, events {3,5,7} in nd events {3,6} in. The next tble summrizes these results: Event Site Number Blocks s 1 1 1 c s 2 1 3 d,e s 3 2 3 d s 4 2 5,b s 5 2 7 e s 6 3 3,c s 7 3 6 b This tble shows the nme of the event, the site where it occurred, the locl number of event nd finlly the blocks which it must rech. This lst column is very importnt, since the rison d être for ech send event is to rech some block nd thus provide some event with the required components for its timestmp. Intuitively, one could think tht if some send event hs two or more blocks in its lst column, then deferred send is required. However, this is not necessrily the cse. Events s 1,s 3,s 5 nd s 7 hve just one reching block. On the other side, events s 2 nd s 4 hve two reching blocks. Nevertheless, in ech cse, the two events occurred in the sme site, so it is sufficient for the send to rech the first of the two events. Moreover, it is necessry to void nother send event to rech these sites with more updted informtion. Then, for instnce,s 2 hve just to rech block d nd this locl timestmp will be propgted into block e. But, one must void ny other send event coming from nd occurring fter s 2. The lst rgument cnnot be pplied to s 6, so it must be considered inserting deferred send which reches block nd then block c or vice vers. Therefore, Lemm 4 hs been tken into ccount nd set D hs been built. Figure 5 depicts visul scheme of the lst sitution. It shows ll the events with timestmps in S nd the required send events. Note tht the deferred Fig. 5. Fig. 6. s 1 s 2 s 3 s 6 Scheme for building the witness distributed history Witness distributed history s 4 d b c s 7 s 5 e <4,5,3> <6,5, <1,6,3> <3,3,5> <3,7,7> send is not shown, becuse its event number must no be fixed priori. Finlly, pplying n exhustive ssignment for send events to its reching blocks nd testing if the required timestmps re generted, we verify the possibility of S. One of the witness distributed histories ppers in figure 6. Note tht it is different from the originl history of figure 4. In fct, there could exist mny underlying distributed histories for given set S. However this new history keeps some of the originl messges, which re depicted with solid lines. The new messges were drwn with dotted lines. As it cn be pprecited, the deferred send is locted in s the second event. Nevertheless, it is not necessry, becuse ll the informtion it ws supposed to provide, ws crried by send event s 4. VII. CONCLUSIONS AND FUTURE WORK Vector clocks hve been helpful tool in determining the cuslity reltions mong events in distributed system. Knowing their properties will ensure more relible time stmping system, s we cn check dditionl requirements. For instnce, if we hd n efficient procedure for determining if some set of vector clocks is possible or not, then we cn reject report of timestmps if they constitute n impossible or flsified set. We re interested in clssifying the process of deciding whether some vector clock set is possible or impossible. This problem is clerly soluble, t
lest using n exhustive strtegy. But, it would be importnt to verify if this problem is P or NPcomplete. Thus, future work is focused on discovering the nture of the possible set problem on vector clocks. It must be defined if it hs polynomil time solution or if it cn only be solved by n exhustive pproch. Besides, there re severl interesting properties nd theorems bout possible nd impossible sets tht deserve to be explored (e.g., closure properties, specil cses, geometric interprettions, etc.). REFERENCES [1] Ahmd, M. et l. Cusl memory: definitions, implementtion nd progrmming. Distributed Computing, 1995. [2] Bldoni, R. nd Rynl, M. Fundmentls of Distributed Systems: A Prcticl Tour of Vector Clocks Systems. IEEE Distributed Systems Online, Februry, 2002. [3] Fidge, C. Logicl Time in Distributed Computing Systems. Computer, Vol 24(8), August, 1991. [4] Lmport, L. Time, Clocks, nd the Ordering of Events in Distributed System. Communictions of the ACM, Vol 21(7):558-565, July, 1978. [5] Mttern, F. Virtul Time nd Globl Sttes of Distributed Systems. Proceedings of the Interntionl Workshop on Prllel nd Distributed Algorithms, 215-226, 1989. [6] Schwrz, R. nd Mttern, F. Detecting Cusl Reltionships in Distributed Systems: In Serch of the Holy Gril. Distributed Computing, 1994. [7] Torres-Rojs, F. Prtilly Ordered Sets nd Logicl Clocks for Distributed Systems. Proceedings of Conferenci Ltinomericn en Informátic, 2000. [8] Torres-Rojs, F. nd Meneses, E. Alguns Propieddes Interesntes de los Relojes Vectoriles. Proceedings of Jornds Chilens de Computción, 2003. [9] Yng, Z. nd Mrslnd T.A. Globl Sttes nd Time in Distributed Systems. IEEE Computer Society Press, 1994.