A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

Size: px
Start display at page:

Download "A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints"

Transcription

1 A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints Roberto BALDONI Francesco QUAGLIA Bruno CICIANI Dipartimento di Informatica e Sistemistica, Università La Sapienza Via Salaria 113, 198 Roma, Italy E.mail: fbaldoni,quaglia,cicianig@dis.uniroma1.it Abstract A useless checkpoint corresponds to the occurrence of a checkpoint and communication pattern called Z-cycle. A recent result shows that ensuring a computation without Z-cycles is a particular application of a property, namely Virtual Precedence (VP), defined on an interval-based abstraction of a computation. In this paper we first propose a taxonomy of communication-induced checkpointing protocols based on the way they ensure the VP property. Then we derive a sufficient condition ensuring no Z-cycles in a distributed computation. This condition defines a checkpoint and communication pattern, namely suspect Z-cycle, such that if no suspect Z-cycle exists in a distributed computation then no Z-cycle exists. We present finally a communicationinduced checkpointing protocol that avoids useless checkpoints by preventing on-the-fly the formation of suspect Z- cycles and discuss its performance with respect to other protocols. 1 Introduction The study of checkpoint and communication patterns is a fundamental issue in many areas of distributed computing (including rollback-recovery, distributed-debugging etc.) where it is necessary to determine consistent global checkpoints [5]. A checkpoint and communication pattern of a distributed computation consists of a set of local checkpoints and a dependency relation defined over them due to interprocess communication [14, 18]. A local checkpoint (or simply checkpoint) is a local state of a process dumped onto stable storage, while dependencies are caused by chains of messages in the computation called zigzag paths (Z-paths for short) [14]. A Z-path from a checkpoint X to a checkpoint Y is a particular sequence of messages [m 1 ; : : : ; m q ] such that: (i) the sending of m 1 occurs on a process after X is taken, (ii) the delivery of m q occurs on a process before Y is taken, and (iii) the sending of a message m i (i > 1) belongs to the same, or to a successive, checkpoint interval of the delivery of the message m i?1 (a checkpoint interval is the set of events between two successive checkpoints in the same process). Z-paths can be split in This work is partially supported by Scientific Cooperation Network of the European Community OLOS. two families: the causal Z-paths which are actually causal paths (as an example a causal Z-path from the checkpoint A to C formed by [m 3 ; m 4 ] is shown in Figure 1) and the non-causal Z-paths in which there exists at least one message m i whose send precedes the delivery of m i?1 in the same checkpoint interval (Figure 1 shows the non-causal Z- path from B to A formed by [m 1 ; m 2 ]). Due to the presence of non-causal Z-paths, a message chain could start after a checkpoint and terminate before that checkpoint. In this case a dependency relation is established between a checkpoint and itself. This pattern has been formalized by Netzer and Xu with the name Z-cycle [14]. As an example, the message chain [m 3 ; m 4 ; m 1 ; m 2 ] shown in Figure 1 involves A in a Z-cycle. A checkpoint involved in a Z-cycle is called useless as it cannot belong to any consistent global checkpoint [14]. In the context of rollback-recovery, the presence of useless checkpoints is the direct cause of the domino effect [15]. A distributed computation satisfies the No-Z-Cycle property (N ZC) if there is no Z-cycle in the computation (i.e., no useless checkpoint). A Taxonomy of Checkpointing Protocols. In a recent paper ([9]) Helary et al. showed that ensuring N ZC in a checkpoint and communication pattern of a distributed computation is a particular application of a property, namely Virtual Precedence (VP), defined on an interval-based abstraction of a distributed computation. An interval-based abstraction of a distributed computation satisfies VP if, and only if, it is possible to associate a timestamping function on intervals with the following characteristics: (i) intervals which are connected by a message must be timestamped in a non-decreasing way (safety part) and (ii) the timestamp of a process must increase after communication (liveness part). It is easy to check that if we consider each single event as an interval of the abstraction, the timestamp function boils down to the Lamport s scalar clock [11] or the Fidge-Mattern s vector time [7, 13]. In the checkpointing problem, intervals of the abstraction correspond to checkpoint intervals and, then, the abstraction of the distributed computation corresponds to a checkpoint and communication pattern. Helary et al. [9] proved that: VP, N ZC. They showed also that communication-induced 1 checkpointing 1 Communication-induced protocols, upon the receipt of a message, direct a process to take a checkpoint if a predicate is evaluated to true [1, 6].

2 protocols ensuring N ZC, such as those in [4, 16, 18], can be seen as particular instantiations of a meta protocol. In other words, they showed, for the first time, that all communication-induced protocols derive actually from the same source. This contrasts with a previous common intuition that rigidly separated communicationinduced protocols into two distinct families: model-based (e.g., [1, 2, 16, 18]) and index-based (e.g. [3, 4, 12, 17]). However communication-induced protocols differ from the way they use the timestamping function to ensure the VP property. Some protocols, namely VP-enforced protocols, assume the existence of a timestamping function consistent with rules (i) and (ii). That function is enforced by means of additional (forced) checkpoints. If, before the execution of a communication event, one of the two rules is going to be violated, then the protocol takes a forced checkpoint and timestamps the new created interval according to the timestamping function. Protocols that follow this approach are in [4, 9, 1, 17]. Other protocols, namely VP-accordant protocols, prevent the formation of a specific checkpoint and communication pattern which, in turn, avoids the occurrence of Z- cycles. Thus, upon the receipt of a message, if at least one bad checkpoint and communication pattern is going to be formed, then the protocol takes a forced checkpoint to break that pattern. As VP, N ZC, also for a VP-accordant protocol there will exist a timestamping function that could be used to timestamp checkpoint intervals of the computation produced by the protocol consistently with (i) and (ii). Examples of such protocols are in [1, 2, 16, 18]. Aims of the Paper. In this paper we first introduce a sufficient condition for satisfying the N ZC property. This condition is based on a property which stipulates that there is no Suspect Z-Cycle (SZC) 2 in the computation (N SZC property). A suspect Z-cycle is a part of a Z-cycle with several constraints on its structure. We prove that: N SZC ) N ZC. Second, we present a VP-accordant checkpointing protocol. This protocol is based on the prevention of SZCs and lies on the following basic hypothesis: (1) the usable knowledge of the computation at a certain event cannot be more than the one included in the causal past of that event, (2) the computation is asynchronous i.e., no bound exists on the process speed and on the message transfer delay. The prevention of SZCs is done by exploiting the control information of the incoming application messages and the local history of a process. We assume each process P i selects some local states to be local checkpoints. These checkpoints can be either basic (i.e., triggered by a special event from the underlying application) or forced (i.e., triggered whenever the receipt of a message m is closing an SZC). Basic checkpoints are generated by the application according to an own strategy (e.g. periodic checkpointing, random checkpointing etc.). The proposed algorithm piggybacks on each application message a matrix (n n) of integers as control information, where n represents the number of processes of 2 In the rest of the paper we denote in uppercase a specific checkpoint and communication pattern and in calligraphic style properties related to a checkpoint and communication pattern. B m 2 m 1 A m 3 m4 C CHECKPOINT Figure 1. Z-Paths and Z-Cycles. the computation. Finally, simulation results are presented. The main observation that comes out is that the number of forced checkpoints taken by the protocol we propose is generally low and almost independent of the basic checkpointing strategy. This independence is a highly desirable feature as the basic checkpointing strategy is defined at the application level and a checkpointing protocol, which is usually implemented at system level, cannot influence that choice. The paper is structured in five sections. Section 2 introduces the model of the distributed computation and concatenation operators that will be used to express checkpoint and communication patterns. In Section 3 the SZC pattern is introduced and the result N SZC ) N ZC is proved. Section 4 presents the VP-accordant checkpointing protocol derived from the sufficient condition. In the final section results of the simulation study are presented. 2 Model and Preliminary Definitions We assume a distributed computation consisting of a set of n processes fp 1 ; P 2 ; : : : ; P n g. Processes do not share memory and do not share a common clock value. They communicate only by exchanging messages. Each pair of processes is connected by an asynchronous directed logical channel. Transmission delays over channels are unpredictable but finite. A process produces a sequence of events; each event moves the process from one local state to another. We assume events are produced by the execution of internal, send or delivery statements. The send and delivery events of a message m are denoted respectively by send(m) and deliver(m). In process P i an event e precedes an event e, denoted e P e, iff e is produced by P i before e. An event e of process P i precedes an event e of process P j due to message m, denoted e m e, iff (e = send(m)) ^ (e = deliver(m)). Lamport s Happened- Before relation [11], denoted as e!, is the transitive closure of the union of the relations P and m. Hence, a computation can be modeled as a partial order b H = (H; e!) where H is the set of all events produced by the computation. A local state of a process saved on stable storage is called a local checkpoint of the process, then the set of local checkpoints is a subset of the set of local states. The x-th checkpoint of process P i is denoted as C i;x where x is called the rank of the checkpoint. Each time a checkpoint is taken the rank is increased by one. We assume that each process P i takes an initial checkpoint C i;1 (corresponding to the initial state of the process) and that after each event a checkpoint will eventually be taken. A checkpoint interval I i;x

3 is the set of events between C i;x and C i;x+1. A checkpoint and communication pattern of a distributed computation is a pair ( b H; C bh ) where b H is a distributed computation and C bh is a set of local checkpoints defined on b H. 2.1 Message Chains Definition 2.1 A message chain is a sequence of messages = [m 1 ; : : : ; m n ] such that 8k : 1 k n? 1 ) (deliver(m k ) 2 I i;x ) ^ (send(m k+1 ) 2 I i;y ) ^ (x y) As an example, in Figure 1 we have a message chain formed by messages [m 1 ; m 2 ]. A particular case of message chain is the causal message chain: Definition 2.2 A message chain = [m 1 ; : : : ; m n ] is causal if 8k : 1 k n? 1 ) deliver(m k ) P send(m k+1 ), otherwise, the chain is non causal. The message chains [m 3 ; m 4 ] and [m 1 ; m 2 ] shown in Figure 1 are, respectively, causal and non-causal. A chain with only one message is always causal. For the sake of clarity, the Greek letter indicates a causal message chain. We denote with :first (resp. :last) the first (resp. last) message of a message chain. jj denotes the number of messages forming the chain. We use the operator minus to denote the removal of the last (or of the first) message from a chain; in particular? :last (resp.? :first) denotes a chain obtained from by removing its last (resp. first) message. 2.2 Concatenation Operators Causal Concatenation. The causal concatenation, denoted by the operator, can be applied to combine two objects (an object can be either a checkpoint or a message chain) in a causal manner. Such a concatenation is defined as follows: Definition 2.3 An object a is causally concatenated to an object b, denoted a b, iff: 1) a = C i;x, b = and 9v : send(:first) 2 I i;x+v ; or 2) a =, b = C i;x and 9v > : deliver(:last) 2 I i;x?v ; or 3) a =, b = and deliver(:last) P send( :first). Examples of causal concatenation are shown in Figure 2.a ( C i;x, C i;x and ). Note that a Z-path from a checkpoint A to a checkpoint B due to a chain can be expressed as A B. Non-Causal Concatenation. The non-causal concatenation, denoted by the operator, can be applied to combine message chains. Such a concatenation is defined as follows: Definition 2.4 A message chain is non-causally concatenated to a message chain in the checkpoint interval I k;y, denoted k;y, iff: (deliver(:last) 2 I k;y ) ^ (send( :first) 2 I k;y ) ^ (send( :first) P deliver(:last)). Figure 2. Examples of Applying of the Concatenation Operators. b I k;y Figure 3. The Structure of a Z-Cycle. +1 An example of non-causal concatenation i;x is shown in Figure 2.b. 2.3 Z-Cycles and the No-Z-Cycle Property By using previous notations, we express the notion of Z- Cycle (ZC) introduced by Netzer and Xu [14]. Basically, a ZC is a checkpoint and communication pattern involving a checkpoint C i;x and a chain b such that b C i;x b (an example of such a concatenation is shown in Figure 3.a). However, it is always possible to separate b into two nonempty sub-chains, a causal sub-chain and a sub-chain such that b = k;y (this concatenation is shown in Figure 3.b). This observation gives rise to the following Z-cycle definition: Definition 2.5 A ZC is a checkpoint and communication pattern ZC(C i;x ; k;y ) such that: C i;x k;y. Let us finally introduce the No-Z-Cycle property. Property 2.1 A checkpoint and communication pattern of a distributed computation ( b H; C bh ) satisfies the No-Z-Cycle property (N ZC) iff no ZC exists in ( b H; C bh ). 3 A Sufficient Condition for N ZC We first introduce the notion of Suspect Z-Cycle (SZC) which is a checkpoint and communication pattern that satisfies certain constraints on its structure. Then, a sufficient condition for the N ZC property based on the absence of SZCs is presented. Let us consider the set of causal chains starting after C i;x whose recipient of :last is denoted M (C i;x ; ). A causal chain is prime in M (C i;x ; ) iff there does not exist a causal chain 2 M (C i;x ; ) such that deliver( :last) P deliver(:last). In other words, the first causal chain bringing to the knowledge of the existence of C i;x is prime. By using this notion we introduce the concept of SZC.

4 Pj Cj;z m C m k;y m prime in M(; ) Pj Cj;z m Cj;z+1 prime in M(; ) Figure 4. An SZC involving C i;x ; an Example of Non SZC. Definition 3.1 An SZC is a checkpoint and communication pattern SZC(I j;z ; C i;x ; ; I k;y ) such that: 9m; 9m : C j;z m C i;x k;y m 8 < (i) send(m) 2 I j;z with (ii) is prime in M (C i;x ; ) : (iii) 6 9e 2 I j;z+1 : e!deliver(:last) e Note that an SZC is not necessarily a part of a ZC. Figure 4 shows an SZC involving C i;x and an example of non-szc as it contradicts restriction (iii) (due to the presence of the causal chain ). Other examples of checkpoint and communication patterns that violate restriction (i) or (ii) are left to the reader. Let us now prove that if there is a ZC in a distributed computation then there is an SZC in that computation, assuming the size of the non-causal message chain of the ZC equal to one. Then we generalize the result to a chain of any size: Lemma 3.1 If there exists ZC(C i;x ; k;y ) such that jj = 1 then there exists SZC(I k;y ; C i;x ; ; I k;y ). Proof (By Contradiction) Let us assume the existence of ZC(C i;x ; k;y ) with = m and suppose that SZC(I k;y ; C i;x ; ; I k;y ) does not exists. Consider a prime causal chain in M (C i;x ; ) (this chain exists as the set M (C i;x ; ) contains at least ). We have two cases: 1. deliver( :last)!send(m). e This is impossible as send(m)!deliver( e :last) and the relation! e is acyclic; 2. send(m)!deliver( e :last). As m C i;x k;y m, send(m) 2 Ik;y and is prime in M (C i;x ; ), there must exist an event e 2 I k;y+1 such that e! e deliver( :last) in order to violate constraint (iii) of Definition 3.1. This is not possible as deliver( :last) 2 I k;y and the relation! e is acyclic. Hence, the assumption is contradicted and the claim follows. 2 Lemma 3.2 If there exists ZC(C i;x ; k;y ) then there exists an SZC. Proof Let us consider ZC(C i;x ; k;y ). If jj = 1 then the claim follows from Lemma 3.1. Otherwise, let consider I j;z such that send(:last) 2 I j;z and a causal chain prime in M (C i;x ; ) (this chain exists as the set M (C i;x ; ) contains at least ). There are two cases: 1. deliver( :last)!send(:first). e We get ZC(C i;x ; [ ] b;s ) where b;s = (see Figure 5.a), hence j j < jj. Note that the checkpoint interval I b;s exists, otherwise the message chain is causal. But this is impossible as it would lead to a cycle in the relation! e (i.e., send(:first)!deliver( e :last)). 2. send(:first)!deliver( e :last). Then we get the Z-cycle ZC(C i;x ; k;y ). In such a case we have two sub-cases: 2.1) 6 9e 2 I j;z+1 : e!deliver( e :last). By definition 3.1, we get SZC(I j;z ; C i;x ; ; I k;y ) and the claim follows; 2.2) 9e 2 I j;z+1 : e!deliver( e :last). Hence there exists at least one causal message chain starting after C j;z+1 that is received by before deliver( :last) or deliver( :last) = deliver( :last). If send(:first)!deliver( e :last) (see Figure 5.b), then we get ZC(C j;z+1 ; k;y ) where =? :last (hence j j < jj). If deliver( :last)!send(:first). e We get ZC(C j;z+1 ; [ ] b;s ) where b;s =? :last, hence j j < jj (see Figure 5.c). Note that the checkpoint interval I b;s exists, otherwise the message chain? :last is causal. But this is impossible as it would lead to a cycle in the relation! e (i.e., send(:first)!deliver( e :last)). In both sub-cases of case 2.2, we get a Z-cycle involving C j;z+1 with j j < jj. If we fall in case 1 or in case 2.2, the previous case analysis can be repeated on the obtained Z-cycle. After a finite number of steps either we fall in case 2.1 or we get a ZC with j j = 1 as the size of the message chain associated with the Z-cycle decreases monotonically. In the first case the claim follows by the case analysis, in the second by Lemma Theorem 3.3 (Sufficiency Theorem) If a checkpoint and communication pattern ( b H; C bh ) of a distributed computation satisfies the No-Suspect-Z-Cycle (N SZC) property, i.e., no SZC exists in ( b H; C bh ), then it satisfies the N ZC property. Proof By Lemma 3.2 if a ZC exists then an SZC exists in ( b H; C bh ). Thus, in terms of properties, :(N ZC) ) :(N SZC). Hence N SZC ) N ZC. 2 4 A VP-Accordant Checkpointing Protocol We consider a system formed by three layers: the communication layer, the checkpointing layer and the applica-

5 P i C b;s P a C i;x C i;x P i C j;z :last P j C j;z+1 P i P j P b C j;z :last C b;s C i;x C j;z+1 (c) Figure 5. Proof of Lemma 3.2. tion layer. We assume the checkpointing layer formed by processes that take local checkpoints either when receiving a special event from the application layer (i.e., basic checkpoints) or induced by communication (i.e., forced checkpoints). An application takes basic checkpoints according to its own strategy which is out of the control of the checkpointing layer. Examples of strategy for taking basic checkpoints are random checkpointing and periodic checkpointing. Messages arrive at processes from a communication system and they will be delivered to the application layer. Hence, upon the arrival of a message m, a process has to decide on-the-fly (i.e., without additional delay) if a forced checkpoint has to be taken before delivering m to the application layer; the previous decision has to be based only on the local information and on the control information piggybacked on the application messages (no control message is allowed, and the content of an application message cannot be interpreted by the checkpointing protocol). The protocol presented in this section ensures no useless checkpoint by preventing on-the-fly the formation of SZCs. In order to track the formation of SZC(I j;z ; C i;x ; ; I k;y ), upon the arrival of a message :last, process has to verify whether conditions for the existence of that checkpoint and communication pattern are satisfied. In the following paragraphs we introduce the data structures to accomplish this task. Tracking is prime in M (C i;x ; ). To detect if is prime in M (C i;x ; ), a vector clock mechanism is used considering checkpoints of processes as relevant events of the computation [13]. Each process maintains a vector clock V C k whose size corresponds to the number of processes. V C k [i] stores the maximum checkpoint rank of P i seen by and V C k [k] stores the rank of the last checkpoint taken by. V C k is initialized to zero except the k-th entry which is initialized to one. Each application message m sent by piggybacks the current value of V C k (denoted m:v C). Following the updating rule of a vector clock, upon the delivery of a message m, V C k is updated from m:v C by taking a component-wise maximum. A causal message chain including message m as :last is prime in some M (C i; ; ) (see Section 3), if, upon the delivery of m by process, the following predicate holds: 9i : (m:v C[i] > V C k [i]). Tracking k;y m. To detect if there exists a non-causal concatenation between a prime causal message chain and a message m in the interval I k;y, process maintains a boolean variable after first send k. This variable is set to TRUE when a send event occurs. It is set to FALSE each time a local checkpoint is taken. Hence, upon the delivery of a message m (with m = :last), detects that k;y m if the following predicate hold: after first send k ^ (9i : (m:v C[i] > V C k [i])). Tracking C j;z m C i;x. Each process maintains a vector of integers Imm P red k of size n and a matrix of integers P red k, of size n n. Imm P red k [`] represents the the maximum rank of the checkpoint interval from which process P` sent a message m which has been delivered by in its current checkpoint interval, say I k;y?1 (in other words C`;Imm P red[`] is an immediate predecessor of checkpoint ). All the entries of this vector are set to -1 each time a checkpoint is taken by. P red k [i; j] represents, to the knowledge of, the maximum rank of the checkpoint interval from which process P j sent a message m which has been delivered by P i in a checkpoint interval I i;x?1 with x V C k [i]. All the entries of the matrix P red k are initialized to -1, and its content is piggybacked on each message m sent by (m:p red). The rules to update its entries are the following: 1) whenever a checkpoint is taken by : 8j P red k [k; j] = max(p red k [k; j]; Imm P red k [j]); 2) upon the arrival of a message m at : 8`; t P red k [`; t] = max(p red k [`; t]; m:p red[`; t]). Tracking 6 9e 2 I j;z+1 : e!deliver(:last). e Upon the arrival of a message m included in a prime causal chain (i.e., 9i : (m:v C[i] > V C k [i])), in order to track the above condition, we need to know if there exists a j such that m:p red[i; j] + 1 does not belong to the causal past of the delivery of m. This knowledge is encoded in m:v C[j] and V C k [j]. Hence, the predicate becomes: (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j])). Preventing SZCs. Upon the arrival of a message m at process in I k;y, if the predicate after first send k ^ (9i : (m:v C[i] > V C k [i]) ^ (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j]))) holds, then process detects that at least one SZC(I j;p redk [i;j]; C i;x ; ; I k;y ) is being formed with m = :last and V C k [i] < x m:v C[i]. In this case directs a forced checkpoint +1 before the delivery of m. The behavior of process is the following (all the procedures and the message handler are executed in atomic fashion):

6 init : take a checkpoint; after first send k := F ALSE; 8i : i 6= k V C k [i] := ; V C k [k] := 1; 8i; 8j P red k [i; j] :=?1; 8h Imm P red k [h] :=?1; when m arrives at from P l : if after first send k ^ (9i : (m:v C[i] > V C k [i])^ (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j]))) then take ckpt(); % forced checkpoint % 8i V C k [i] := max(v C k [i]; m:v C[i]); 8i; 8j P red k [i; j] := max(m:p red[i; j]; P red k [i; j]); Imm P red k [l] := max(imm P red k [l]; m:v C[l]); procedure send(m; P j ): m:content = data; m:v C := V C k ; m:p red := P red k ; send m to P j ; after first send k := T RUE; when a basic checkpoint is scheduled from : take ckpt(); procedure take ckpt(): take a checkpoint; 8h P red k [k; h] := max(p red k [k; h]; Imm P red k [h]); 8h Imm P red k [h] :=?1; V C k [k] := V C k [k] + 1; after first send k := F ALSE; We would like finally to remark that, from an operational point of view, the elements of the diagonal of the matrix P red are never used by the protocol. Hence, when implementing the protocol, the vector clock V C can be embedded in that diagonal. Thus, the resulting control information piggybacked on application messages boils down to a matrix of n n integers. 5 A Performance Study This section presents simulation results of a performance study to compare a VP-enforced protocol, namely the Briatico et al. protocol [4] (hereafter BCS), and the protocol presented in this paper (P). Among the set of checkpointing protocols, we chose BCS, first, for its simplicity of implementation, and, second, because other simulation studies ([3]) have shown that, in the class of VP-enforced protocols, BCS exhibits good performance, in terms of reduction of forced checkpoints 3. In the BCS protocol an integer is assumed to timestamp checkpoint intervals. Thus, each process endows a variable (the timestamp) denoted ts k. The timestamp is updated by process according to the following rules: 1. when starting the execution, ts k is set to zero; 2. when sending a message m, a copy of ts k is piggybacked on message m (denoted m:ts); 3. when taking a basic checkpoint, ts k := ts k + 1; 3 Recent papers show two techniques that can be applied to BCS to reduce the number of forced checkpoints: invalidation of basic checkpoints [12] and re-usage of timestamping values [3]. As outlined in [8], we would like to remark that these two techniques can be used in any other checkpointing protocol to boost its performance. This is the reason why we decided to compare only pure communication-induced protocols. 4. when a message m arrives at, if m:ts > ts k then a forced checkpoint is taken by and ts k := m:ts. Simulation Model and Results. The performance comparison studies, for each protocol, the number of forced checkpoints per message delivery (R) as a function of the average checkpoint interval size (for example, R equal to.2 means a forced checkpoint is taken, on the average, each 5 message deliveries) under two distinct strategies adopted by the application layer for generating basic checkpoint events to processes: S1: each process receives N basic checkpoint events periodically and the period between two succesive basic checkpoint events is the same at all processes; S2: each process receives N basic checkpoint events randomly distributed in the whole execution (the receipt of such events has a distinct distribution at each process). We simulate an uniform point-to-point environment in which each process can send a message to any other and the destination of each message is an uniformly distributed random variable. We assume a system with n = 8 processes; each process executes internal, send and receive operations with probability p i = :9, p s = :5 and p r = :5, respectively. The time to execute an operation in a process and the message propagation time are exponentially distributed with mean value equal to 1 and 5 time units respectively. Let Average Checkpoint Interval (ACI) be the average distance, in terms of events, between two basic checkpoints. Experiments were conducted varying ACI from 1 to 1 events and measuring the value of R. Each simulation run consists of one million of events and for each value of ACI we did several simulation runs with different seeds and the result were within five percent of each other, thus, variance is not reported in the plots. Results of the simulation study are reported in Figure 6. We would like to remark that strategy S1 is the most favourable to BCS as the timestamps increase on average at the same speed at all processes. As an extreme, if all processes would take basic checkpoints at the same physical time, no forced checkpoint will be ever taken. The behavior of P is flat around.1. Strategy S2 represents a bad scenario for BCS as the distributions of the basic checkpoint events at distinct processes are non-correlated. So timestamps increase at different speeds at distinct processes and, then, BCS performance depends on ACI as depicted in Figure 6. The behavior of P is, also in this case, flat and quite close to that under strategy S1. From previous plots, a main observation comes out. Performance of P is more stable compared to the one of BCS with respect to ACI and the basic checkpointing strategy used. This comes from the fact that a VP-accordant protocol is not influenced by the speed a timestamp increases in a process. Its performance depends only on the structure of checkpoint and communication patterns in the computation which is not directly related to ACI and the basic checkpointing strategy used. This makes a VP-accordant protocol particularly appealing to be implemented in a checkpointing layer on a general-purpose system.

7 R P (S1) BCS (S1) P (S2) BCS (S2) [4] D. Briatico, A. Ciuffoletti and L. Simoncini, A Distributed Domino-Effect Free Recovery Algorithm, in Proc. IEEE Int. Symposium on Reliability Distributed Software and Database, 1984, pp [5] K.M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems, ACM Transactions on Computer Systems, vol. 3, no. 1, 1985, pp Summary Average Checkpoint Interval (Events) Figure 6. R vs. ACI. This paper provided a sufficient condition for producing distributed computations without useless checkpoints. The condition is based on the notion of a particular checkpoint and communication pattern called suspect Z-cycle. Based on that condition, we designed a communication-induced checkpointing protocol that prevents the formation of suspect Z-cycles in a distributed computation by means of additional (forced) checkpoints. The protocol lies on the following basic hypothesis: (i) the usable knowledge of the computation at a certain event cannot be more than the one included in the causal past of that event, (ii) the computation is asynchronous. Compared to other communicationinduced checkpointing protocols proposed in the literature, the one presented in this paper takes a low number of forced checkpoints and this number is independent of both the average checkpoint interval size and the basic checkpointing strategy adopted. Acknowledgments. The authors would like to thank Jean-Michel Helary (IRISA), Michel Raynal (IRISA), Ravi Prakash (UTD) and the anonymous referees for comments which have improved the presentation of the paper. A special thank goes to Luca De Santis for his help in the simulation study. References [1] R. Baldoni, J.M. Helary and M. Raynal, Rollback Dependency Trackability: a Minimal Characterization and its Protocol (revised version). IRISA Tech. Report n. 1173, [2] R. Baldoni, J.M. Helary, A. Mostefaoui and M. Raynal, A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability, Proc. IEEE Int. Symposium on Fault Tolerant Computing, 1997, pp [3] R. Baldoni, F. Quaglia and P. Fornara, An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems, Proc. IEEE Int. Symposium on Reliable Distributed Systems, 1997, pp [6] E.N. Elnozahy, D.B. Johnson and Y.M. Wang, A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Technical Report No.CMU-CS , School of Computer Science, Carnegie Mellon University, [7] C. Fidge, Logical Time in Distributed Computing Systems, IEEE Computer, pp , August [8] J.M. Helary, A. Mostefaoui, and M. Raynal, Points de Controle Coherents dans les Systemes Repartis (in French) Technique et Science Informatique, vol. 17, no. 5, [9] J.M. Helary, A. Mostefaoui, and M. Raynal, Virtual Precedence in Asynchronous Systems: Concepts and Applications, Proc. 11th Workshop on Distributed Algorithms, LNCS press, [1] J.M. Helary, A. Mostefaoui, R.H.B. Netzer and M. Raynal, Preventing Useless Checkpoints in Distributed Computations, Proc. IEEE Int. Symposium on Reliable Distributed Systems, 1997, pp [11] L. Lamport, Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, vol. 21, no. 7, 1978, pp [12] D. Manivannan and M. Singhal, A Low-Overhead Recovery Technique Using Quasi Synchronous Checkpointing, Proc. IEEE Int. Conference on Distributed Computing Systems, 1996, pp [13] F. Mattern, Virtual Time and Global States of Distributed Systems, In Proc. of the International Workshop on Parallel and Distributed Algorithms, 1989, pp [14] R.H.B. Netzer and J. Xu, Necessary and Sufficient Conditions for Consistent Global Snapshots, IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, 1995, pp [15] B. Randell, System Structure for Software Fault Tolerance, IEEE Transactions on Software Engineering, vol. SE1, no. 2, 1975, pp [16] D.L. Russell, State Restoration in Systems of Communicating Processes, IEEE Transactions on Software Engineering, vol. SE6, no. 2, 198, pp [17] K. Vankatesh, T. Radakrishanan and H.L. Li, Optimal Checkpointing and Local Recording for Domino-Free Rollback-Recovery, Information Processing Letters, vol. 25, 1987, pp [18] Y.M. Wang, Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints, IEEE Transactions on Computers, vol. 46, no. 4, 1997, pp

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability Roberto BALDONI Jean-Michel HELARY y Achour MOSTEFAOUI y Michel RAYNAL y Abstract Considering an application

More information

I R I S A P U B L I C A T I O N I N T E R N E N o VIRTUAL PRECEDENCE IN ASYNCHRONOUS SYSTEMS: CONCEPT AND APPLICATIONS

I R I S A P U B L I C A T I O N I N T E R N E N o VIRTUAL PRECEDENCE IN ASYNCHRONOUS SYSTEMS: CONCEPT AND APPLICATIONS I R I P U B L I C A T I O N I N T E R N E 079 N o S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A VIRTUAL PRECEDENCE IN ASYNCHRONOUS SYSTEMS: CONCEPT AND APPLICATIONS JEAN-MICHEL HÉLARY,

More information

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash: Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to

More information

Rollback-Dependency Trackability: A Minimal Characterization and Its Protocol

Rollback-Dependency Trackability: A Minimal Characterization and Its Protocol Information and Computation 165, 144 173 (2001) doi:10.1006/inco.2000.2906, available online at http://www.idealibrary.com on Rollback-Dependency Trackability: A Minimal Characterization and Its Protocol

More information

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:

More information

More Properties of Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability

More Properties of Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 21, 239-257 (2005) More Properties of Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability JICHIANG TSAI *, SY-YEN KUO ** AND

More information

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal Quasi-Synchronous heckpointing: Models, haracterization, and lassication D. Manivannan Mukesh Singhal Department of omputer and Information Science The Ohio State University olumbus, OH 43210 (email: fmanivann,singhalg@cis.ohio-state.edu)

More information

C i,0 C i,1 C i,2 P i m 1 m 2 m 3 C j,0 C j,1 C j,2 P j m 4 m 5 C k,0 C k,1 C k,2!!q l.!!!!!!!!!!!!! I k,1 I k,2

C i,0 C i,1 C i,2 P i m 1 m 2 m 3 C j,0 C j,1 C j,2 P j m 4 m 5 C k,0 C k,1 C k,2!!q l.!!!!!!!!!!!!! I k,1 I k,2 Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability Jichiang Tsai, Yi-Min Wang and Sy-Yen Kuo MSR-TR-98-13 March 1998 Abstract- In this paper, we

More information

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong Outline Introduction Asynchronous distributed systems, distributed computations,

More information

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal TR No. OSU-ISR-5/96-TR33, Dept. of omputer and Information Science, The Ohio State University. Quasi-Synchronous heckpointing: Models, haracterization, and lassication D. Manivannan Mukesh Singhal Department

More information

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication Min Shen, Ajay D. Kshemkalyani, TaYuan Hsu University of Illinois at Chicago Min Shen, Ajay D. Kshemkalyani, TaYuan Causal

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014 Overview of Ordering and Logical Time Prof. Dave Bakken Cpt. S 464/564 Lecture January 26, 2014 Context This material is NOT in CDKB5 textbook Rather, from second text by Verissimo and Rodrigues, chapters

More information

Distributed Algorithms Time, clocks and the ordering of events

Distributed Algorithms Time, clocks and the ordering of events Distributed Algorithms Time, clocks and the ordering of events Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems R. BALDONI ET AL. 1 Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems Supplemental material Roberto Baldoni, Silvia Bonomi, Marco Platania, and Leonardo Querzoni 1 ALGORITHM PSEUDO-CODE

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H Cuts Cuts A cut C is a subset of the global history of H C = h c 1 1 hc 2 2...hc n n A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1,ec 2 2,...ec n n C = h

More information

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at MAD Models & Algorithms for Distributed systems -- /5 -- download slides at http://people.rennes.inria.fr/eric.fabre/ 1 Today Runs/executions of a distributed system are partial orders of events We introduce

More information

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017 416 Distributed Systems Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017 1 Important Lessons (last lecture) Clocks on different systems will always behave differently Skew and drift

More information

Time. Today. l Physical clocks l Logical clocks

Time. Today. l Physical clocks l Logical clocks Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s

More information

Slides for Chapter 14: Time and Global States

Slides for Chapter 14: Time and Global States Slides for Chapter 14: Time and Global States From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, Addison-Wesley 2012 Overview of Chapter Introduction Clocks,

More information

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark 8 example: P i P j (5 widgets) (order 10, $100) cji 8 ed state P i : , P j : , c ij : , c ji : Distributed Systems

More information

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Xianbing Wang 1, Yong-Meng Teo 1,2, and Jiannong Cao 3 1 Singapore-MIT Alliance, 2 Department of Computer Science,

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 5: Time in Distributed Systems Overview Time and Synchronization Logical Clocks Vector Clocks Distributed Systems Asynchronous systems:

More information

Absence of Global Clock

Absence of Global Clock Absence of Global Clock Problem: synchronizing the activities of different part of the system (e.g. process scheduling) What about using a single shared clock? two different processes can see the clock

More information

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:

More information

Distributed Mutual Exclusion Based on Causal Ordering

Distributed Mutual Exclusion Based on Causal Ordering Journal of Computer Science 5 (5): 398-404, 2009 ISSN 1549-3636 2009 Science Publications Distributed Mutual Exclusion Based on Causal Ordering Mohamed Naimi and Ousmane Thiare Department of Computer Science,

More information

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P On Consistent Checkpointing in Distributed Systems Guohong Cao, Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43201 E-mail: fgcao, singhalg@cis.ohio-state.edu

More information

Time is an important issue in DS

Time is an important issue in DS Chapter 0: Time and Global States Introduction Clocks,events and process states Synchronizing physical clocks Logical time and logical clocks Global states Distributed debugging Summary Time is an important

More information

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) Week 1: Introduction, Logical clocks, Snapshots Dr. Borzoo Bonakdarpour Department of Computing and Software McMaster University Dr. Borzoo Bonakdarpour Distributed Algorithms

More information

Causality and physical time

Causality and physical time Logical Time Causality and physical time Causality is fundamental to the design and analysis of parallel and distributed computing and OS. Distributed algorithms design Knowledge about the progress Concurrency

More information

Rollback-Dependency Trackability: Visible Characterizations

Rollback-Dependency Trackability: Visible Characterizations Rollback-Dependency Trackability: Visible Characterizations Roberto Baldoni Dipartimento di Informatica e Systemistica Universit6 di Roma La Sapienza Via Salaria 113, Roma, Italy E.mail: baldoniadis.uniromal.

More information

Parallel Performance Evaluation through Critical Path Analysis

Parallel Performance Evaluation through Critical Path Analysis Parallel Performance Evaluation through Critical Path Analysis Benno J. Overeinder and Peter M. A. Sloot University of Amsterdam, Parallel Scientific Computing & Simulation Group Kruislaan 403, NL-1098

More information

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks Plausible Clocks with Bounded Inaccuracy Causality & Concurrency a b exists a path from a to b Brad Moore, Paul Sivilotti Computer Science & Engineering The Ohio State University paolo@cse.ohio-state.edu

More information

Distributed Systems Fundamentals

Distributed Systems Fundamentals February 17, 2000 ECS 251 Winter 2000 Page 1 Distributed Systems Fundamentals 1. Distributed system? a. What is it? b. Why use it? 2. System Architectures a. minicomputer mode b. workstation model c. processor

More information

Early consensus in an asynchronous system with a weak failure detector*

Early consensus in an asynchronous system with a weak failure detector* Distrib. Comput. (1997) 10: 149 157 Early consensus in an asynchronous system with a weak failure detector* André Schiper Ecole Polytechnique Fe dérale, De partement d Informatique, CH-1015 Lausanne, Switzerland

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Topics Clock Synchronization Physical Clocks Clock Synchronization Algorithms

More information

Chapter 11 Time and Global States

Chapter 11 Time and Global States CSD511 Distributed Systems 分散式系統 Chapter 11 Time and Global States 吳俊興 國立高雄大學資訊工程學系 Chapter 11 Time and Global States 11.1 Introduction 11.2 Clocks, events and process states 11.3 Synchronizing physical

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Shared Memory vs Message Passing

Shared Memory vs Message Passing Shared Memory vs Message Passing Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Revised: 15 February 2004 Abstract This paper determines the computational strength of the shared memory abstraction

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 & Clocks, Clocks, and the Ordering of Events in a Distributed System. L. Lamport, Communications of the ACM, 1978 Notes 15: & Clocks CS 347 Notes

More information

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks Vector Clocks and Distributed Snapshots Today. Logical Time: Vector clocks 2. Distributed lobal Snapshots CS 48: Distributed Systems Lecture 5 Kyle Jamieson 2 Motivation: Distributed discussion board Distributed

More information

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 06. Logical clocks Paul Krzyzanowski Rutgers University Fall 2017 2014-2017 Paul Krzyzanowski 1 Logical clocks Assign sequence numbers to messages All cooperating processes can agree

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Convergence of Time Decay for Event Weights

Convergence of Time Decay for Event Weights Convergence of Time Decay for Event Weights Sharon Simmons and Dennis Edwards Department of Computer Science, University of West Florida 11000 University Parkway, Pensacola, FL, USA Abstract Events of

More information

Time in Distributed Systems: Clocks and Ordering of Events

Time in Distributed Systems: Clocks and Ordering of Events Time in Distributed Systems: Clocks and Ordering of Events Clocks in Distributed Systems Needed to Order two or more events happening at same or different nodes (Ex: Consistent ordering of updates at different

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector

More information

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Efficient Dependency Tracking for Relevant Events in Concurrent Systems Distributed Computing manuscript No. (will be inserted by the editor) Anurag Agarwal Vijay K. Garg Efficient Dependency Tracking for Relevant Events in Concurrent Systems Received: date / Accepted: date

More information

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced Primary Partition \Virtually-Synchronous Communication" harder than Consensus? Andre Schiper and Alain Sandoz Departement d'informatique Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne (Switzerland)

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Ordering events. Lamport and vector clocks. Global states. Detecting failures. Required reading for this topic } Leslie Lamport,"Time, Clocks, and the Ordering

More information

Parallel & Distributed Systems group

Parallel & Distributed Systems group Happened Before is the Wrong Model for Potential Causality Ashis Tarafdar and Vijay K. Garg TR-PDS-1998-006 July 1998 PRAESIDIUM THE UNIVERSITY OF TEXAS DISCIPLINA CIVITATIS AT AUSTIN Parallel & Distributed

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No! A subtle problem An obvious problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be S is anyway executed after LC = t Fixes: if e is internal/send and

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

6.852: Distributed Algorithms Fall, Class 10

6.852: Distributed Algorithms Fall, Class 10 6.852: Distributed Algorithms Fall, 2009 Class 10 Today s plan Simulating synchronous algorithms in asynchronous networks Synchronizers Lower bound for global synchronization Reading: Chapter 16 Next:

More information

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Xianbing Wang, Yong-Meng Teo, and Jiannong Cao Singapore-MIT Alliance E4-04-10, 4 Engineering Drive 3, Singapore 117576 Abstract

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Distributed Computing. Synchronization. Dr. Yingwu Zhu Distributed Computing Synchronization Dr. Yingwu Zhu Topics to Discuss Physical Clocks Logical Clocks: Lamport Clocks Classic paper: Time, Clocks, and the Ordering of Events in a Distributed System Lamport

More information

Causality and Time. The Happens-Before Relation

Causality and Time. The Happens-Before Relation Causality and Time The Happens-Before Relation Because executions are sequences of events, they induce a total order on all the events It is possible that two events by different processors do not influence

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

Figure 10.1 Skew between computer clocks in a distributed system

Figure 10.1 Skew between computer clocks in a distributed system Figure 10.1 Skew between computer clocks in a distributed system Network Instructor s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 Pearson Education 2001

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

Time, Clocks, and the Ordering of Events in a Distributed System

Time, Clocks, and the Ordering of Events in a Distributed System Time, Clocks, and the Ordering of Events in a Distributed System Motivating example: a distributed compilation service FTP server storing source files, object files, executable file stored files have timestamps,

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

Determining Consistent States of Distributed Objects Participating in a Remote Method Call

Determining Consistent States of Distributed Objects Participating in a Remote Method Call Determining Consistent States of Distributed Objects Participating in a Remote Method Call Magdalena S lawińska and Bogdan Wiszniewski Faculty of Electronics, Telecommunications and Informatics Gdańsk

More information

Easy Consensus Algorithms for the Crash-Recovery Model

Easy Consensus Algorithms for the Crash-Recovery Model Reihe Informatik. TR-2008-002 Easy Consensus Algorithms for the Crash-Recovery Model Felix C. Freiling, Christian Lambertz, and Mila Majster-Cederbaum Department of Computer Science, University of Mannheim,

More information

Genuine atomic multicast in asynchronous distributed systems

Genuine atomic multicast in asynchronous distributed systems Theoretical Computer Science 254 (2001) 297 316 www.elsevier.com/locate/tcs Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper Departement d Informatique, Ecole

More information

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

Optimal Resilience Asynchronous Approximate Agreement

Optimal Resilience Asynchronous Approximate Agreement Optimal Resilience Asynchronous Approximate Agreement Ittai Abraham, Yonatan Amit, and Danny Dolev School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel {ittaia, mitmit,

More information

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems Anurag Agarwal Dept of Computer Sciences The University of Texas at Austin Austin, TX 78712-233, USA anurag@cs.utexas.edu Vijay

More information

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS I R I P U B L I C A T I O N I N T E R N E N o 1599 S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS ROY FRIEDMAN, ACHOUR MOSTEFAOUI,

More information

Bee s Strategy Against Byzantines Replacing Byzantine Participants

Bee s Strategy Against Byzantines Replacing Byzantine Participants Bee s Strategy Against Byzantines Replacing Byzantine Participants by Roberto Baldoni, Silvia Banomi, Shlomi Dolev, Michel Raynal, Amitay Shaer Technical Report #18-05 September 21, 2018 The Lynne and

More information

6. Conclusion 23. T. R. Allen and D. A. Padua. Debugging fortran on a shared memory machine.

6. Conclusion 23. T. R. Allen and D. A. Padua. Debugging fortran on a shared memory machine. 6. Conclusion 23 References [AP87] T. R. Allen and D. A. Padua. Debugging fortran on a shared memory machine. In Proc. International Conf. on Parallel Processing, pages 721{727, 1987. [Dij65] E. W. Dijkstra.

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

A Realistic Look At Failure Detectors

A Realistic Look At Failure Detectors A Realistic Look At Failure Detectors C. Delporte-Gallet, H. Fauconnier, R. Guerraoui Laboratoire d Informatique Algorithmique: Fondements et Applications, Université Paris VII - Denis Diderot Distributed

More information

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Antonio FERNÁNDEZ Ernesto JIMÉNEZ Michel RAYNAL LADyR, GSyC, Universidad Rey Juan Carlos, 28933

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

On queueing in coded networks queue size follows degrees of freedom

On queueing in coded networks queue size follows degrees of freedom On queueing in coded networks queue size follows degrees of freedom Jay Kumar Sundararajan, Devavrat Shah, Muriel Médard Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

More information

Eventually consistent failure detectors

Eventually consistent failure detectors J. Parallel Distrib. Comput. 65 (2005) 361 373 www.elsevier.com/locate/jpdc Eventually consistent failure detectors Mikel Larrea a,, Antonio Fernández b, Sergio Arévalo b a Departamento de Arquitectura

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information

Design of a Sliding Window over Asynchronous Event Streams

Design of a Sliding Window over Asynchronous Event Streams 1 Design of a Sliding Window over Asynchronous Event Streams Yiling Yang 1,2, Yu Huang 1,2, Jiannong Cao 3, Xiaoxing Ma 1,2, Jian Lu 1,2 1 State Key Laboratory for Novel Software Technology Nanjing University,

More information

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG Why Synchronization? You want to catch a bus at 6.05 pm, but your watch is

More information

S. Neogy 1 A. Sinha 1 P. K. Das 2 1 Department of Computer Science & Engg., Jadavpur University, India sarmisthaneogy@gmail.com 2 Faculty of Engg. & Tech., Mody Institute of Technology & Science, India

More information

I R I S A P U B L I C A T I O N I N T E R N E LOGICAL TIME: A WAY TO CAPTURE CAUSALITY IN DISTRIBUTED SYSTEMS M. RAYNAL, M. SINGHAL ISSN

I R I S A P U B L I C A T I O N I N T E R N E LOGICAL TIME: A WAY TO CAPTURE CAUSALITY IN DISTRIBUTED SYSTEMS M. RAYNAL, M. SINGHAL ISSN I R I P U B L I C A T I O N I N T E R N E N o 9 S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A LOGICAL TIME: A WAY TO CAPTURE CAUSALITY IN DISTRIBUTED SYSTEMS M. RAYNAL, M. SINGHAL ISSN

More information

Formal Methods for Monitoring Distributed Computations

Formal Methods for Monitoring Distributed Computations Formal Methods for Monitoring Distributed Computations Vijay K. Garg Parallel and Distributed Systems Lab, Department of Electrical and Computer Engineering, The University of Texas at Austin, FRIDA 15

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS Svein Willassen Abstract Timestamps stored on digital media play an important role in digital investigations. However, the evidentiary value

More information

TECHNICAL REPORT YL DISSECTING ZAB

TECHNICAL REPORT YL DISSECTING ZAB TECHNICAL REPORT YL-2010-0007 DISSECTING ZAB Flavio Junqueira, Benjamin Reed, and Marco Serafini Yahoo! Labs 701 First Ave Sunnyvale, CA 94089 {fpj,breed,serafini@yahoo-inc.com} Bangalore Barcelona Haifa

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence Hirotsugu Kakugawa and Toshimitsu Masuzawa Department of Computer Science Graduate School of Information Science and Technology

More information

Distributed Systems Time and Global State

Distributed Systems Time and Global State Distributed Systems Time and Global State Allan Clark School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/ds Autumn Term 2012 Distributed Systems Time and Global State

More information

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,b,, Bi Li b,a,c, Nicolas Nisse b,a, Hervé Rivano d, Min-Li Yu e a Univ. Nice Sophia Antipolis, CNRS,

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information