A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

Size: px

Start display at page:

Download "A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints"

Dominick Charles
5 years ago
Views:

1 A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints Roberto BALDONI Francesco QUAGLIA Bruno CICIANI Dipartimento di Informatica e Sistemistica, Università La Sapienza Via Salaria 113, 198 Roma, Italy E.mail: fbaldoni,quaglia,cicianig@dis.uniroma1.it Abstract A useless checkpoint corresponds to the occurrence of a checkpoint and communication pattern called Z-cycle. A recent result shows that ensuring a computation without Z-cycles is a particular application of a property, namely Virtual Precedence (VP), defined on an interval-based abstraction of a computation. In this paper we first propose a taxonomy of communication-induced checkpointing protocols based on the way they ensure the VP property. Then we derive a sufficient condition ensuring no Z-cycles in a distributed computation. This condition defines a checkpoint and communication pattern, namely suspect Z-cycle, such that if no suspect Z-cycle exists in a distributed computation then no Z-cycle exists. We present finally a communicationinduced checkpointing protocol that avoids useless checkpoints by preventing on-the-fly the formation of suspect Z- cycles and discuss its performance with respect to other protocols. 1 Introduction The study of checkpoint and communication patterns is a fundamental issue in many areas of distributed computing (including rollback-recovery, distributed-debugging etc.) where it is necessary to determine consistent global checkpoints [5]. A checkpoint and communication pattern of a distributed computation consists of a set of local checkpoints and a dependency relation defined over them due to interprocess communication [14, 18]. A local checkpoint (or simply checkpoint) is a local state of a process dumped onto stable storage, while dependencies are caused by chains of messages in the computation called zigzag paths (Z-paths for short) [14]. A Z-path from a checkpoint X to a checkpoint Y is a particular sequence of messages [m 1 ; : : : ; m q ] such that: (i) the sending of m 1 occurs on a process after X is taken, (ii) the delivery of m q occurs on a process before Y is taken, and (iii) the sending of a message m i (i > 1) belongs to the same, or to a successive, checkpoint interval of the delivery of the message m i?1 (a checkpoint interval is the set of events between two successive checkpoints in the same process). Z-paths can be split in This work is partially supported by Scientific Cooperation Network of the European Community OLOS. two families: the causal Z-paths which are actually causal paths (as an example a causal Z-path from the checkpoint A to C formed by [m 3 ; m 4 ] is shown in Figure 1) and the non-causal Z-paths in which there exists at least one message m i whose send precedes the delivery of m i?1 in the same checkpoint interval (Figure 1 shows the non-causal Z- path from B to A formed by [m 1 ; m 2 ]). Due to the presence of non-causal Z-paths, a message chain could start after a checkpoint and terminate before that checkpoint. In this case a dependency relation is established between a checkpoint and itself. This pattern has been formalized by Netzer and Xu with the name Z-cycle [14]. As an example, the message chain [m 3 ; m 4 ; m 1 ; m 2 ] shown in Figure 1 involves A in a Z-cycle. A checkpoint involved in a Z-cycle is called useless as it cannot belong to any consistent global checkpoint [14]. In the context of rollback-recovery, the presence of useless checkpoints is the direct cause of the domino effect [15]. A distributed computation satisfies the No-Z-Cycle property (N ZC) if there is no Z-cycle in the computation (i.e., no useless checkpoint). A Taxonomy of Checkpointing Protocols. In a recent paper ([9]) Helary et al. showed that ensuring N ZC in a checkpoint and communication pattern of a distributed computation is a particular application of a property, namely Virtual Precedence (VP), defined on an interval-based abstraction of a distributed computation. An interval-based abstraction of a distributed computation satisfies VP if, and only if, it is possible to associate a timestamping function on intervals with the following characteristics: (i) intervals which are connected by a message must be timestamped in a non-decreasing way (safety part) and (ii) the timestamp of a process must increase after communication (liveness part). It is easy to check that if we consider each single event as an interval of the abstraction, the timestamp function boils down to the Lamport s scalar clock [11] or the Fidge-Mattern s vector time [7, 13]. In the checkpointing problem, intervals of the abstraction correspond to checkpoint intervals and, then, the abstraction of the distributed computation corresponds to a checkpoint and communication pattern. Helary et al. [9] proved that: VP, N ZC. They showed also that communication-induced 1 checkpointing 1 Communication-induced protocols, upon the receipt of a message, direct a process to take a checkpoint if a predicate is evaluated to true [1, 6].

2 protocols ensuring N ZC, such as those in [4, 16, 18], can be seen as particular instantiations of a meta protocol. In other words, they showed, for the first time, that all communication-induced protocols derive actually from the same source. This contrasts with a previous common intuition that rigidly separated communicationinduced protocols into two distinct families: model-based (e.g., [1, 2, 16, 18]) and index-based (e.g. [3, 4, 12, 17]). However communication-induced protocols differ from the way they use the timestamping function to ensure the VP property. Some protocols, namely VP-enforced protocols, assume the existence of a timestamping function consistent with rules (i) and (ii). That function is enforced by means of additional (forced) checkpoints. If, before the execution of a communication event, one of the two rules is going to be violated, then the protocol takes a forced checkpoint and timestamps the new created interval according to the timestamping function. Protocols that follow this approach are in [4, 9, 1, 17]. Other protocols, namely VP-accordant protocols, prevent the formation of a specific checkpoint and communication pattern which, in turn, avoids the occurrence of Z- cycles. Thus, upon the receipt of a message, if at least one bad checkpoint and communication pattern is going to be formed, then the protocol takes a forced checkpoint to break that pattern. As VP, N ZC, also for a VP-accordant protocol there will exist a timestamping function that could be used to timestamp checkpoint intervals of the computation produced by the protocol consistently with (i) and (ii). Examples of such protocols are in [1, 2, 16, 18]. Aims of the Paper. In this paper we first introduce a sufficient condition for satisfying the N ZC property. This condition is based on a property which stipulates that there is no Suspect Z-Cycle (SZC) 2 in the computation (N SZC property). A suspect Z-cycle is a part of a Z-cycle with several constraints on its structure. We prove that: N SZC ) N ZC. Second, we present a VP-accordant checkpointing protocol. This protocol is based on the prevention of SZCs and lies on the following basic hypothesis: (1) the usable knowledge of the computation at a certain event cannot be more than the one included in the causal past of that event, (2) the computation is asynchronous i.e., no bound exists on the process speed and on the message transfer delay. The prevention of SZCs is done by exploiting the control information of the incoming application messages and the local history of a process. We assume each process P i selects some local states to be local checkpoints. These checkpoints can be either basic (i.e., triggered by a special event from the underlying application) or forced (i.e., triggered whenever the receipt of a message m is closing an SZC). Basic checkpoints are generated by the application according to an own strategy (e.g. periodic checkpointing, random checkpointing etc.). The proposed algorithm piggybacks on each application message a matrix (n n) of integers as control information, where n represents the number of processes of 2 In the rest of the paper we denote in uppercase a specific checkpoint and communication pattern and in calligraphic style properties related to a checkpoint and communication pattern. B m 2 m 1 A m 3 m4 C CHECKPOINT Figure 1. Z-Paths and Z-Cycles. the computation. Finally, simulation results are presented. The main observation that comes out is that the number of forced checkpoints taken by the protocol we propose is generally low and almost independent of the basic checkpointing strategy. This independence is a highly desirable feature as the basic checkpointing strategy is defined at the application level and a checkpointing protocol, which is usually implemented at system level, cannot influence that choice. The paper is structured in five sections. Section 2 introduces the model of the distributed computation and concatenation operators that will be used to express checkpoint and communication patterns. In Section 3 the SZC pattern is introduced and the result N SZC ) N ZC is proved. Section 4 presents the VP-accordant checkpointing protocol derived from the sufficient condition. In the final section results of the simulation study are presented. 2 Model and Preliminary Definitions We assume a distributed computation consisting of a set of n processes fp 1 ; P 2 ; : : : ; P n g. Processes do not share memory and do not share a common clock value. They communicate only by exchanging messages. Each pair of processes is connected by an asynchronous directed logical channel. Transmission delays over channels are unpredictable but finite. A process produces a sequence of events; each event moves the process from one local state to another. We assume events are produced by the execution of internal, send or delivery statements. The send and delivery events of a message m are denoted respectively by send(m) and deliver(m). In process P i an event e precedes an event e, denoted e P e, iff e is produced by P i before e. An event e of process P i precedes an event e of process P j due to message m, denoted e m e, iff (e = send(m)) ^ (e = deliver(m)). Lamport s Happened- Before relation [11], denoted as e!, is the transitive closure of the union of the relations P and m. Hence, a computation can be modeled as a partial order b H = (H; e!) where H is the set of all events produced by the computation. A local state of a process saved on stable storage is called a local checkpoint of the process, then the set of local checkpoints is a subset of the set of local states. The x-th checkpoint of process P i is denoted as C i;x where x is called the rank of the checkpoint. Each time a checkpoint is taken the rank is increased by one. We assume that each process P i takes an initial checkpoint C i;1 (corresponding to the initial state of the process) and that after each event a checkpoint will eventually be taken. A checkpoint interval I i;x

3 is the set of events between C i;x and C i;x+1. A checkpoint and communication pattern of a distributed computation is a pair ( b H; C bh ) where b H is a distributed computation and C bh is a set of local checkpoints defined on b H. 2.1 Message Chains Definition 2.1 A message chain is a sequence of messages = [m 1 ; : : : ; m n ] such that 8k : 1 k n? 1 ) (deliver(m k ) 2 I i;x ) ^ (send(m k+1 ) 2 I i;y ) ^ (x y) As an example, in Figure 1 we have a message chain formed by messages [m 1 ; m 2 ]. A particular case of message chain is the causal message chain: Definition 2.2 A message chain = [m 1 ; : : : ; m n ] is causal if 8k : 1 k n? 1 ) deliver(m k ) P send(m k+1 ), otherwise, the chain is non causal. The message chains [m 3 ; m 4 ] and [m 1 ; m 2 ] shown in Figure 1 are, respectively, causal and non-causal. A chain with only one message is always causal. For the sake of clarity, the Greek letter indicates a causal message chain. We denote with :first (resp. :last) the first (resp. last) message of a message chain. jj denotes the number of messages forming the chain. We use the operator minus to denote the removal of the last (or of the first) message from a chain; in particular? :last (resp.? :first) denotes a chain obtained from by removing its last (resp. first) message. 2.2 Concatenation Operators Causal Concatenation. The causal concatenation, denoted by the operator, can be applied to combine two objects (an object can be either a checkpoint or a message chain) in a causal manner. Such a concatenation is defined as follows: Definition 2.3 An object a is causally concatenated to an object b, denoted a b, iff: 1) a = C i;x, b = and 9v : send(:first) 2 I i;x+v ; or 2) a =, b = C i;x and 9v > : deliver(:last) 2 I i;x?v ; or 3) a =, b = and deliver(:last) P send( :first). Examples of causal concatenation are shown in Figure 2.a ( C i;x, C i;x and ). Note that a Z-path from a checkpoint A to a checkpoint B due to a chain can be expressed as A B. Non-Causal Concatenation. The non-causal concatenation, denoted by the operator, can be applied to combine message chains. Such a concatenation is defined as follows: Definition 2.4 A message chain is non-causally concatenated to a message chain in the checkpoint interval I k;y, denoted k;y, iff: (deliver(:last) 2 I k;y ) ^ (send( :first) 2 I k;y ) ^ (send( :first) P deliver(:last)). Figure 2. Examples of Applying of the Concatenation Operators. b I k;y Figure 3. The Structure of a Z-Cycle. +1 An example of non-causal concatenation i;x is shown in Figure 2.b. 2.3 Z-Cycles and the No-Z-Cycle Property By using previous notations, we express the notion of Z- Cycle (ZC) introduced by Netzer and Xu [14]. Basically, a ZC is a checkpoint and communication pattern involving a checkpoint C i;x and a chain b such that b C i;x b (an example of such a concatenation is shown in Figure 3.a). However, it is always possible to separate b into two nonempty sub-chains, a causal sub-chain and a sub-chain such that b = k;y (this concatenation is shown in Figure 3.b). This observation gives rise to the following Z-cycle definition: Definition 2.5 A ZC is a checkpoint and communication pattern ZC(C i;x ; k;y ) such that: C i;x k;y. Let us finally introduce the No-Z-Cycle property. Property 2.1 A checkpoint and communication pattern of a distributed computation ( b H; C bh ) satisfies the No-Z-Cycle property (N ZC) iff no ZC exists in ( b H; C bh ). 3 A Sufficient Condition for N ZC We first introduce the notion of Suspect Z-Cycle (SZC) which is a checkpoint and communication pattern that satisfies certain constraints on its structure. Then, a sufficient condition for the N ZC property based on the absence of SZCs is presented. Let us consider the set of causal chains starting after C i;x whose recipient of :last is denoted M (C i;x ; ). A causal chain is prime in M (C i;x ; ) iff there does not exist a causal chain 2 M (C i;x ; ) such that deliver( :last) P deliver(:last). In other words, the first causal chain bringing to the knowledge of the existence of C i;x is prime. By using this notion we introduce the concept of SZC.

4 Pj Cj;z m C m k;y m prime in M(; ) Pj Cj;z m Cj;z+1 prime in M(; ) Figure 4. An SZC involving C i;x ; an Example of Non SZC. Definition 3.1 An SZC is a checkpoint and communication pattern SZC(I j;z ; C i;x ; ; I k;y ) such that: 9m; 9m : C j;z m C i;x k;y m 8 < (i) send(m) 2 I j;z with (ii) is prime in M (C i;x ; ) : (iii) 6 9e 2 I j;z+1 : e!deliver(:last) e Note that an SZC is not necessarily a part of a ZC. Figure 4 shows an SZC involving C i;x and an example of non-szc as it contradicts restriction (iii) (due to the presence of the causal chain ). Other examples of checkpoint and communication patterns that violate restriction (i) or (ii) are left to the reader. Let us now prove that if there is a ZC in a distributed computation then there is an SZC in that computation, assuming the size of the non-causal message chain of the ZC equal to one. Then we generalize the result to a chain of any size: Lemma 3.1 If there exists ZC(C i;x ; k;y ) such that jj = 1 then there exists SZC(I k;y ; C i;x ; ; I k;y ). Proof (By Contradiction) Let us assume the existence of ZC(C i;x ; k;y ) with = m and suppose that SZC(I k;y ; C i;x ; ; I k;y ) does not exists. Consider a prime causal chain in M (C i;x ; ) (this chain exists as the set M (C i;x ; ) contains at least ). We have two cases: 1. deliver( :last)!send(m). e This is impossible as send(m)!deliver( e :last) and the relation! e is acyclic; 2. send(m)!deliver( e :last). As m C i;x k;y m, send(m) 2 Ik;y and is prime in M (C i;x ; ), there must exist an event e 2 I k;y+1 such that e! e deliver( :last) in order to violate constraint (iii) of Definition 3.1. This is not possible as deliver( :last) 2 I k;y and the relation! e is acyclic. Hence, the assumption is contradicted and the claim follows. 2 Lemma 3.2 If there exists ZC(C i;x ; k;y ) then there exists an SZC. Proof Let us consider ZC(C i;x ; k;y ). If jj = 1 then the claim follows from Lemma 3.1. Otherwise, let consider I j;z such that send(:last) 2 I j;z and a causal chain prime in M (C i;x ; ) (this chain exists as the set M (C i;x ; ) contains at least ). There are two cases: 1. deliver( :last)!send(:first). e We get ZC(C i;x ; [ ] b;s ) where b;s = (see Figure 5.a), hence j j < jj. Note that the checkpoint interval I b;s exists, otherwise the message chain is causal. But this is impossible as it would lead to a cycle in the relation! e (i.e., send(:first)!deliver( e :last)). 2. send(:first)!deliver( e :last). Then we get the Z-cycle ZC(C i;x ; k;y ). In such a case we have two sub-cases: 2.1) 6 9e 2 I j;z+1 : e!deliver( e :last). By definition 3.1, we get SZC(I j;z ; C i;x ; ; I k;y ) and the claim follows; 2.2) 9e 2 I j;z+1 : e!deliver( e :last). Hence there exists at least one causal message chain starting after C j;z+1 that is received by before deliver( :last) or deliver( :last) = deliver( :last). If send(:first)!deliver( e :last) (see Figure 5.b), then we get ZC(C j;z+1 ; k;y ) where =? :last (hence j j < jj). If deliver( :last)!send(:first). e We get ZC(C j;z+1 ; [ ] b;s ) where b;s =? :last, hence j j < jj (see Figure 5.c). Note that the checkpoint interval I b;s exists, otherwise the message chain? :last is causal. But this is impossible as it would lead to a cycle in the relation! e (i.e., send(:first)!deliver( e :last)). In both sub-cases of case 2.2, we get a Z-cycle involving C j;z+1 with j j < jj. If we fall in case 1 or in case 2.2, the previous case analysis can be repeated on the obtained Z-cycle. After a finite number of steps either we fall in case 2.1 or we get a ZC with j j = 1 as the size of the message chain associated with the Z-cycle decreases monotonically. In the first case the claim follows by the case analysis, in the second by Lemma Theorem 3.3 (Sufficiency Theorem) If a checkpoint and communication pattern ( b H; C bh ) of a distributed computation satisfies the No-Suspect-Z-Cycle (N SZC) property, i.e., no SZC exists in ( b H; C bh ), then it satisfies the N ZC property. Proof By Lemma 3.2 if a ZC exists then an SZC exists in ( b H; C bh ). Thus, in terms of properties, :(N ZC) ) :(N SZC). Hence N SZC ) N ZC. 2 4 A VP-Accordant Checkpointing Protocol We consider a system formed by three layers: the communication layer, the checkpointing layer and the applica-

5 P i C b;s P a C i;x C i;x P i C j;z :last P j C j;z+1 P i P j P b C j;z :last C b;s C i;x C j;z+1 (c) Figure 5. Proof of Lemma 3.2. tion layer. We assume the checkpointing layer formed by processes that take local checkpoints either when receiving a special event from the application layer (i.e., basic checkpoints) or induced by communication (i.e., forced checkpoints). An application takes basic checkpoints according to its own strategy which is out of the control of the checkpointing layer. Examples of strategy for taking basic checkpoints are random checkpointing and periodic checkpointing. Messages arrive at processes from a communication system and they will be delivered to the application layer. Hence, upon the arrival of a message m, a process has to decide on-the-fly (i.e., without additional delay) if a forced checkpoint has to be taken before delivering m to the application layer; the previous decision has to be based only on the local information and on the control information piggybacked on the application messages (no control message is allowed, and the content of an application message cannot be interpreted by the checkpointing protocol). The protocol presented in this section ensures no useless checkpoint by preventing on-the-fly the formation of SZCs. In order to track the formation of SZC(I j;z ; C i;x ; ; I k;y ), upon the arrival of a message :last, process has to verify whether conditions for the existence of that checkpoint and communication pattern are satisfied. In the following paragraphs we introduce the data structures to accomplish this task. Tracking is prime in M (C i;x ; ). To detect if is prime in M (C i;x ; ), a vector clock mechanism is used considering checkpoints of processes as relevant events of the computation [13]. Each process maintains a vector clock V C k whose size corresponds to the number of processes. V C k [i] stores the maximum checkpoint rank of P i seen by and V C k [k] stores the rank of the last checkpoint taken by. V C k is initialized to zero except the k-th entry which is initialized to one. Each application message m sent by piggybacks the current value of V C k (denoted m:v C). Following the updating rule of a vector clock, upon the delivery of a message m, V C k is updated from m:v C by taking a component-wise maximum. A causal message chain including message m as :last is prime in some M (C i; ; ) (see Section 3), if, upon the delivery of m by process, the following predicate holds: 9i : (m:v C[i] > V C k [i]). Tracking k;y m. To detect if there exists a non-causal concatenation between a prime causal message chain and a message m in the interval I k;y, process maintains a boolean variable after first send k. This variable is set to TRUE when a send event occurs. It is set to FALSE each time a local checkpoint is taken. Hence, upon the delivery of a message m (with m = :last), detects that k;y m if the following predicate hold: after first send k ^ (9i : (m:v C[i] > V C k [i])). Tracking C j;z m C i;x. Each process maintains a vector of integers Imm P red k of size n and a matrix of integers P red k, of size n n. Imm P red k [`] represents the the maximum rank of the checkpoint interval from which process P` sent a message m which has been delivered by in its current checkpoint interval, say I k;y?1 (in other words C`;Imm P red[`] is an immediate predecessor of checkpoint ). All the entries of this vector are set to -1 each time a checkpoint is taken by. P red k [i; j] represents, to the knowledge of, the maximum rank of the checkpoint interval from which process P j sent a message m which has been delivered by P i in a checkpoint interval I i;x?1 with x V C k [i]. All the entries of the matrix P red k are initialized to -1, and its content is piggybacked on each message m sent by (m:p red). The rules to update its entries are the following: 1) whenever a checkpoint is taken by : 8j P red k [k; j] = max(p red k [k; j]; Imm P red k [j]); 2) upon the arrival of a message m at : 8`; t P red k [`; t] = max(p red k [`; t]; m:p red[`; t]). Tracking 6 9e 2 I j;z+1 : e!deliver(:last). e Upon the arrival of a message m included in a prime causal chain (i.e., 9i : (m:v C[i] > V C k [i])), in order to track the above condition, we need to know if there exists a j such that m:p red[i; j] + 1 does not belong to the causal past of the delivery of m. This knowledge is encoded in m:v C[j] and V C k [j]. Hence, the predicate becomes: (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j])). Preventing SZCs. Upon the arrival of a message m at process in I k;y, if the predicate after first send k ^ (9i : (m:v C[i] > V C k [i]) ^ (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j]))) holds, then process detects that at least one SZC(I j;p redk [i;j]; C i;x ; ; I k;y ) is being formed with m = :last and V C k [i] < x m:v C[i]. In this case directs a forced checkpoint +1 before the delivery of m. The behavior of process is the following (all the procedures and the message handler are executed in atomic fashion):

6 init : take a checkpoint; after first send k := F ALSE; 8i : i 6= k V C k [i] := ; V C k [k] := 1; 8i; 8j P red k [i; j] :=?1; 8h Imm P red k [h] :=?1; when m arrives at from P l : if after first send k ^ (9i : (m:v C[i] > V C k [i])^ (9j : m:p red[i; j] + 1 > max(m:v C[j]; V C k [j]))) then take ckpt(); % forced checkpoint % 8i V C k [i] := max(v C k [i]; m:v C[i]); 8i; 8j P red k [i; j] := max(m:p red[i; j]; P red k [i; j]); Imm P red k [l] := max(imm P red k [l]; m:v C[l]); procedure send(m; P j ): m:content = data; m:v C := V C k ; m:p red := P red k ; send m to P j ; after first send k := T RUE; when a basic checkpoint is scheduled from : take ckpt(); procedure take ckpt(): take a checkpoint; 8h P red k [k; h] := max(p red k [k; h]; Imm P red k [h]); 8h Imm P red k [h] :=?1; V C k [k] := V C k [k] + 1; after first send k := F ALSE; We would like finally to remark that, from an operational point of view, the elements of the diagonal of the matrix P red are never used by the protocol. Hence, when implementing the protocol, the vector clock V C can be embedded in that diagonal. Thus, the resulting control information piggybacked on application messages boils down to a matrix of n n integers. 5 A Performance Study This section presents simulation results of a performance study to compare a VP-enforced protocol, namely the Briatico et al. protocol [4] (hereafter BCS), and the protocol presented in this paper (P). Among the set of checkpointing protocols, we chose BCS, first, for its simplicity of implementation, and, second, because other simulation studies ([3]) have shown that, in the class of VP-enforced protocols, BCS exhibits good performance, in terms of reduction of forced checkpoints 3. In the BCS protocol an integer is assumed to timestamp checkpoint intervals. Thus, each process endows a variable (the timestamp) denoted ts k. The timestamp is updated by process according to the following rules: 1. when starting the execution, ts k is set to zero; 2. when sending a message m, a copy of ts k is piggybacked on message m (denoted m:ts); 3. when taking a basic checkpoint, ts k := ts k + 1; 3 Recent papers show two techniques that can be applied to BCS to reduce the number of forced checkpoints: invalidation of basic checkpoints [12] and re-usage of timestamping values [3]. As outlined in [8], we would like to remark that these two techniques can be used in any other checkpointing protocol to boost its performance. This is the reason why we decided to compare only pure communication-induced protocols. 4. when a message m arrives at, if m:ts > ts k then a forced checkpoint is taken by and ts k := m:ts. Simulation Model and Results. The performance comparison studies, for each protocol, the number of forced checkpoints per message delivery (R) as a function of the average checkpoint interval size (for example, R equal to.2 means a forced checkpoint is taken, on the average, each 5 message deliveries) under two distinct strategies adopted by the application layer for generating basic checkpoint events to processes: S1: each process receives N basic checkpoint events periodically and the period between two succesive basic checkpoint events is the same at all processes; S2: each process receives N basic checkpoint events randomly distributed in the whole execution (the receipt of such events has a distinct distribution at each process). We simulate an uniform point-to-point environment in which each process can send a message to any other and the destination of each message is an uniformly distributed random variable. We assume a system with n = 8 processes; each process executes internal, send and receive operations with probability p i = :9, p s = :5 and p r = :5, respectively. The time to execute an operation in a process and the message propagation time are exponentially distributed with mean value equal to 1 and 5 time units respectively. Let Average Checkpoint Interval (ACI) be the average distance, in terms of events, between two basic checkpoints. Experiments were conducted varying ACI from 1 to 1 events and measuring the value of R. Each simulation run consists of one million of events and for each value of ACI we did several simulation runs with different seeds and the result were within five percent of each other, thus, variance is not reported in the plots. Results of the simulation study are reported in Figure 6. We would like to remark that strategy S1 is the most favourable to BCS as the timestamps increase on average at the same speed at all processes. As an extreme, if all processes would take basic checkpoints at the same physical time, no forced checkpoint will be ever taken. The behavior of P is flat around.1. Strategy S2 represents a bad scenario for BCS as the distributions of the basic checkpoint events at distinct processes are non-correlated. So timestamps increase at different speeds at distinct processes and, then, BCS performance depends on ACI as depicted in Figure 6. The behavior of P is, also in this case, flat and quite close to that under strategy S1. From previous plots, a main observation comes out. Performance of P is more stable compared to the one of BCS with respect to ACI and the basic checkpointing strategy used. This comes from the fact that a VP-accordant protocol is not influenced by the speed a timestamp increases in a process. Its performance depends only on the structure of checkpoint and communication patterns in the computation which is not directly related to ACI and the basic checkpointing strategy used. This makes a VP-accordant protocol particularly appealing to be implemented in a checkpointing layer on a general-purpose system.

7 R P (S1) BCS (S1) P (S2) BCS (S2) [4] D. Briatico, A. Ciuffoletti and L. Simoncini, A Distributed Domino-Effect Free Recovery Algorithm, in Proc. IEEE Int. Symposium on Reliability Distributed Software and Database, 1984, pp [5] K.M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems, ACM Transactions on Computer Systems, vol. 3, no. 1, 1985, pp Summary Average Checkpoint Interval (Events) Figure 6. R vs. ACI. This paper provided a sufficient condition for producing distributed computations without useless checkpoints. The condition is based on the notion of a particular checkpoint and communication pattern called suspect Z-cycle. Based on that condition, we designed a communication-induced checkpointing protocol that prevents the formation of suspect Z-cycles in a distributed computation by means of additional (forced) checkpoints. The protocol lies on the following basic hypothesis: (i) the usable knowledge of the computation at a certain event cannot be more than the one included in the causal past of that event, (ii) the computation is asynchronous. Compared to other communicationinduced checkpointing protocols proposed in the literature, the one presented in this paper takes a low number of forced checkpoints and this number is independent of both the average checkpoint interval size and the basic checkpointing strategy adopted. Acknowledgments. The authors would like to thank Jean-Michel Helary (IRISA), Michel Raynal (IRISA), Ravi Prakash (UTD) and the anonymous referees for comments which have improved the presentation of the paper. A special thank goes to Luca De Santis for his help in the simulation study. References [1] R. Baldoni, J.M. Helary and M. Raynal, Rollback Dependency Trackability: a Minimal Characterization and its Protocol (revised version). IRISA Tech. Report n. 1173, [2] R. Baldoni, J.M. Helary, A. Mostefaoui and M. Raynal, A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability, Proc. IEEE Int. Symposium on Fault Tolerant Computing, 1997, pp [3] R. Baldoni, F. Quaglia and P. Fornara, An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems, Proc. IEEE Int. Symposium on Reliable Distributed Systems, 1997, pp [6] E.N. Elnozahy, D.B. Johnson and Y.M. Wang, A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Technical Report No.CMU-CS , School of Computer Science, Carnegie Mellon University, [7] C. Fidge, Logical Time in Distributed Computing Systems, IEEE Computer, pp , August [8] J.M. Helary, A. Mostefaoui, and M. Raynal, Points de Controle Coherents dans les Systemes Repartis (in French) Technique et Science Informatique, vol. 17, no. 5, [9] J.M. Helary, A. Mostefaoui, and M. Raynal, Virtual Precedence in Asynchronous Systems: Concepts and Applications, Proc. 11th Workshop on Distributed Algorithms, LNCS press, [1] J.M. Helary, A. Mostefaoui, R.H.B. Netzer and M. Raynal, Preventing Useless Checkpoints in Distributed Computations, Proc. IEEE Int. Symposium on Reliable Distributed Systems, 1997, pp [11] L. Lamport, Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, vol. 21, no. 7, 1978, pp [12] D. Manivannan and M. Singhal, A Low-Overhead Recovery Technique Using Quasi Synchronous Checkpointing, Proc. IEEE Int. Conference on Distributed Computing Systems, 1996, pp [13] F. Mattern, Virtual Time and Global States of Distributed Systems, In Proc. of the International Workshop on Parallel and Distributed Algorithms, 1989, pp [14] R.H.B. Netzer and J. Xu, Necessary and Sufficient Conditions for Consistent Global Snapshots, IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, 1995, pp [15] B. Randell, System Structure for Software Fault Tolerance, IEEE Transactions on Software Engineering, vol. SE1, no. 2, 1975, pp [16] D.L. Russell, State Restoration in Systems of Communicating Processes, IEEE Transactions on Software Engineering, vol. SE6, no. 2, 198, pp [17] K. Vankatesh, T. Radakrishanan and H.L. Li, Optimal Checkpointing and Local Recording for Domino-Free Rollback-Recovery, Information Processing Letters, vol. 25, 1987, pp [18] Y.M. Wang, Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints, IEEE Transactions on Computers, vol. 46, no. 4, 1997, pp

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability Roberto BALDONI Jean-Michel HELARY y Achour MOSTEFAOUI y Michel RAYNAL y Abstract Considering an application