Time Free Self-Stabilizing Local Failure Detection

Size: px
Start display at page:

Download "Time Free Self-Stabilizing Local Failure Detection"

Transcription

1 Research Report 33/2004, TU Wien, Institut für Technische Informatik July 6, 2004 Time Free Self-Stabilizing Local Failure Detection Martin Hutle and Josef Widder Embedded Computing Systems Group 182/2 Technische Universität Wien Treitlstraße 3/2 A-1040 Wien, Austria, EU +43 (1) July 6, 2004 Abstract It is widely acknowledged that failure detection is a useful building block for reliable distributed systems. Since many applications rely on it, implementations of failure detectors should often be as reliable as possible. In this paper we present a failure detector implementation which tries to reconcile two approaches: Self-stabilization and weak timing models. We introduce two time free self-stabilizing local failure detector implementations. The first handles an unbounded number of messages which may stem from the unstable period, but requires unbounded space. The second, more practical, implementation requires just bounded space while assuming a known upper bound on the number of messages a reasonable assumption for many networks. Keywords: Fault Tolerance, Self-Stabilization, Unreliable Failure Detectors, Timing Models Supported by the Austrian bmvit FIT-IT project DCBA, project no

2 1 Introduction Unreliable failure detectors [4] are a well-known and practical way to overcome the impossibility of asynchronous consensus [7]. We focus on failure detectors in sparse networks [11, 12], i.e. networks, where processes need not have a direct link to all other processes in the system. Such a low-level model of computation for failure detectors provides more efficiency and blends nicely with the fast failure detectors approach [10, 1]. Although it is possible to implement global failure detectors [11] in sparse networks, we focus on their local pendants: Processes monitor only their direct neighbors. In this paper we look at an eventually local failure detector. Such a failure detector satisfies local completeness and eventually local accuracy. A failure detector fulfills local completeness, if every process eventually suspects every neighbor that has crashed. Respectively, a failure detector fulfills eventual local accuracy, if every process eventually stops suspecting every non-crashed neighbor. Note that in the special case of a fully connected network an eventually local failure detector becomes an eventually perfect failure detector [4]. FD implementations should be as reliable as possible. We aim at two approaches here: (1) Weakening the system timing models as far as possible. Much recent work focuses on this topic see [2] and [13, 14]. (2) Improve the reliability of the failure detector by referring to the self-stabilization paradigm [6]. In order to reconcile these approaches, algorithms have to be devised that stabilize even in the presence of permanent faults [3, 8, 5]. Regarding (1), our algorithms do not need to know any upper bound on the message end-toend delay between processes. The only information we need is an upper bound on the ratio Θ between the upper and lower bound on the communication delay. Regarding (2), the self-stabilization paradigm was initially proposed by Dijkstra [6]. It requires an algorithm to recover from any (invalid) state in finite time. In detail, we assume our system stabilizes at some unknown time t GST, after which the timing assumptions hold, the number of permanent faults is bounded and no further state corruption at correct processes may occur. On the other hand, at t GST, processes may be in an arbitrary state and also arbitrary messages may be in transfer between the processes. Our analysis reveals, that the number of these messages is an important parameter. In detail, we devise two algorithms. The first one copes with an unbounded number of messages but requires unbounded space. Since we are mainly interested in practical solutions, the unbounded space requirement is not satisfying. We hence devise a second algorithm which requires just bounded space. The required memory depends on M, the a priori known upper bound on the number of messages that may be in transit simultaneously. Since real networks are finite, we consider this upper bound as not very restrictive. For many networks, M can be analytically derived since the capacity of links (determined by memory allocated to queues at diverse network layers) is bounded as well. However, this bound has no influence on the run-time of our algorithm and the space requirements are only logarithmic in M. So even if it is difficult to find a tight upper bound, one can still use an extremely over dimensioned one. The second approach is therefore of greater practical relevance than the first one. Our failure detector implementation is straightforward if self-stabilization is not required: Processes count round trips to neighbors. If no message from some neighbor q was received in a time interval in which Θ round trips to some other neighbor occurred, q must have crashed. In the first solution which has unbounded space, it is always possible to increase the local round number, and old messages can easily be recognized and dismissed such that it can be shown that stabilization can be reached in bounded time. When considering bounded memory, the major problem is that messages are reused and that due to the desired time freedom, messages cannot 1

3 be identified as old resp. faulty (since they must be used again). We solve the problem by introducing M + 2 phases and show that within bounded time a phase must be reached during which that all invalid messages are dismissed. There is, however, the problem of deadlocks in pure time free self-stabilizing solutions. Such an algorithm requires some local event which is unlocked from time to time in order to prevent that all processes wait on messages and do not send any [9]. Such an event needs to happen just eventually and the intervals between two such events have no influence on the correctness of the algorithm but just on its stabilization time. It has only to be triggered an infinite number of times, and the intervals between two such events have to be finite. We assume the existence of such an event for our algorithms. Self-stabilizing failure detector implementations were given in [3]. They use a property which requires that after a process receives m messages from one neighbor it has received at least one message from every other neighbor. Although this seems to have some similarities to our timing assumption, their approach requires local clocks, whereas our algorithms are completely time and timer free. The paper is organized as following: In Section 2 we give the system model and in Section 3 we define the eventually local failure detector. The algorithm that uses unbounded space and handles an unbounded number of messages is presented in Section 4. In Section 5 we show an implementation that requires just bounded space, but needs an upper bound on the channel capacity. We finally conclude in Section 6. 2 System Model Our distributed system comprises a finite set Π of processes, connected by a not necessarily fully connected network. We assume the existence of a global real-time clock with values from R, which is just used for analysis and is not available to the processes. Two nodes connected by a direct link are called neighbors, the set of all neighbors of a process p is denoted by nb(p), the size deg(p) = nb(p) of this set is called the degree of the node. We assume the communication graph has p Π : deg(p) > f, where f is the number of processes that may crash. We denote with v p (t) the value of variable v at process p and time t. If p makes a step at t, v p (t) denotes the value of v before a step, and v p (t) the value after the step. 2.1 Timing and Communication We consider time free algorithms, i.e. processes have no access to hardware clocks or an external time base. Neighboring processes communicate by message passing. The time interval a message m is in transit consists of three parts: Local message preparation (includes queuing) at the sender, transmission over the link, and local receive computation (includes queuing) at the receiver. We denote t m s the instant the preparation of message m starts. The instant the receive computation is finished we denote as t m r. In our system model we say that message m is in transit during real-time interval (t m s, t m r ]. We denote with Q(p, q, t) the set of messages which are in transit from p to q or vice versa at time t, and with Q(p, t) = q Π Q(p, q, t) all messages in transit from or to p. Consequently, Q (p, q, t) denotes the set of all messages from p to q or vice versa after a step at time t. Further, δ m = t m r tm s is the end-to-end computational + transmission delay of message m sent from one correct process to another. Our timing model stipulates an upper bound τ + for the transmission delay as well as a lower bound τ such that 0 < τ δ τ + <, where τ and τ + are not known in advance and need to hold after some unknown global stabilization time t GST only. Since τ + <, every message sent from a correct process to another one after t GST 2

4 is eventually received. Links need not provide FIFO transmission. A measure for the timing uncertainty is the transmission delay ratio Θ = τ + /τ. The presented failure detector, implementations will have a priori knowledge of some integer Ξ with Ξ > Θ. This is the only timing assumption required for our implementations. In particular neither τ nor τ + need to be known. In order to prevent deadlocks every process has some local mechanism that generates local deadlock prevention events from time to time. For timing analysis we postulate there exists an upper bound η on the duration between two such local events at every process. η is not known to processes. Note that the actual value of η has no influence on the correctness of our algorithms. Practically one would implement this e.g. with timers or clocks although any local mechanisms which gives some (even inaccurate) notion of elapsed time can be employed. Note, however, that our solution is time free in a sense that clocks are not used to timeout messages such that we share to advantages of time free designs. 2.2 Self-stabilization and failures We assume that processes can fail by crashing. We denote with C the set of correct processes, i.e. processes that never crash. On the other hand, set F comprises all processes that eventually crash. At some unknown time t GST the system stabilizes. Before t GST, the system may behave arbitrary. In detail, arbitrary state corruptions may occur no timing assumptions hold messages may be lost or spontaneously generated At and after t GST, no further state corruptions may occur, however the system may still be in an illegitimate state. links are reliable and follow the timing assumptions. Moreover, every message that is in Q(p, t GST ) is delivered before t GST + τ +. Process crashes may occur at any time (before and after t GST ). 3 Failure Detectors As most other failure detectors, our algorithm outputs a list of suspected processes. Formally, a failure detector history is a function H(p, t) : Π R 2 Π. If a process q is in H(p, t) at some time t, we say p suspects q. We now define an eventually local perfect failure detector. Such a failure detector has to satisfy the same properties as an eventually perfect failure detector [4], but just for neighbors: Local Completeness Every process that crashes is eventually suspected by all correct neighbors. Formally, p F q nb(p) C : t 0 t t 0 p H(q, t) Eventually Local Accuracy Eventually, no correct process is suspected by any correct neighbor. Formally, p C q nb(p) C : t 0 t t 0 p / H(q, t) The class of local eventual perfect failure detectors is denoted by P l. 3

5 1 state variables 2 q Π : lastmsg p [q] N 3 4 if received (p, k) from q 5 if k > lastmsg p [q] 6 lastmsg p [q] k 7 if k = max r Π {lastmsg p [r]} and s, s q : lastmsg p [s] = lastmsg p [q] 8 suspect {r Π k lastmsg p [r] Ξ} 9 send (p, k + 1) to all neighbors if received (q, k) from q 12 send (q, k) to q on deadlock-prevention-event do 15 send (p, max q Π {lastmsg p [q]} + 1) to all neighbors Figure 1: Algorithm for process p with no bounds on number of messages on a link. 4 Unbounded Link Capacity In this section we describe a novel implementation of P l which handles an unbounded number of messages on the links. More precisely Q(p, t GST ) for all processes p. This algorithm, however, requires unbounded memory. We show that the algorithm stabilizes even if there are infinitely many messages in transit at t GST. This result is hence also a solution for systems where links have known or unknown bounds on the maximum number of messages on the links. The algorithm for a process p is given in Figure 1. With every neighbor of p, (p, k) messages are exchanged, where k is an integer. When a neighbor q receives such a message, q just returns it to p (lines 11-12) and no further processing is done. For every neighbor q, p holds a variable lastmsg p [q], where it stores the highest integer k such that p received a (p, k) reply from q. We also denote lastmsg p,q for lastmsg p [q]. The highest value among all lastmsg p,q determines the round for process p. Thus we define round p (t) = max q Π {lastmsg p,q(t)} and round p (t) = max q Π {lastmsg p,q (t)} respectively. Every time a new round is reached (by receiving a message (p, k) such that k > round p ), p sends a (p, round p + 1) message to all neighbors. Note that the fastest neighbor determines the round progress, i.e. round p + 1 is started when the first neighbor returns the (p, round p + 1) message to p. By our timing assumptions, this requires at least 2τ time. The reply of the slowest neighbor requires at most 2τ + time. At this time, round p has reached at most 2τ + /2τ < Ξ additional rounds. Thus, for every correct neighbor the difference round p lastmsg p,q is less than Ξ, whereas for every faulty neighbor p eventually stops updating lastmsg p,q. So, we set H(p, t) to the set of processes with round p lastmsg p,q Ξ (line 8). In line 14, from times to times the last message is resent to every neighbor. As described in the introduction, this prevents the algorithm from deadlock, if messages are lost during the instable period. Note that this has no influence on the operation of the algorithm, since all messages with k lastmsg p,q are dropped, therefore only the first message that is received has an influence on the behavior of p. We assume that line 14 is activated at least after η time steps. Note that this is not required by the algorithm but just for timing analysis. We start analysis with some preliminary lemmas. 4

6 Lemma 1 (Monotonicity). After t GST, round p (t) is monotonically increasing with time t, i.e. t GST t 1 t round p (t 1 ) round p (t) round p (t). Proof. Obviously, since round p is the maximum of all lastmsg p,q, and by lines 5 and 6 lastmsg p,q is monotonically increasing. Lemma 2 (Progress). There is a time t, t GST broadcasts (p, round p (t) + 1) at time t. Proof. At time t GST we have to distinguish two cases: t < t GST + max{2τ +, η}, such that p (1) There is at least one neighbor q of p, such that at least one message (p, l), with l > round p (t GST ) is in Q(p, q, t). Let such a message (p, k) be the first of them to be received by p at some time t t GST. Obviously, t < t GST + 2τ +. Since by assumption this is the first message after t GST that changes round p, we have round p (t) = round p (t GST ). Thus, lastmsg p,q (t) round p (t) < k, p executes lines 6 and 9 and therefore broadcasts a (p, round p(t) + 1) message, where round p(t) = k. (2) No such message exists. Then by time t = t GST + η, line 15 is executed, and also (p, round p (t) + 1) is broadcast. Thus by time t GST + max(2τ +, η) the required message is broadcast in both cases. Lemma 3 (Stabilization). For every message (p, k), which is received by p at some time t t GST + 2τ +, it holds that k round p (t) + 1. Proof. Since sending a message to a neighbor and receiving the answer takes at most 2τ +, there is a time t 1 t GST when p has broadcast (p, k). However, since we are after t GST, k must be equal to round p (t 1) + 1, since p can broadcast only (p, round p (t 1) + 1) messages (lines 9,15). By Lemma 1, round p is monotonically increasing with time. Therefore from t t 1 follows round p(t) + 1 round p (t) + 1 round p (t 1 ) + 1 = k. Let t stable = t GST + max(2τ +, η), which is the time where we have progress (Lemma 2) and correct message pattern (Lemma 3). Lemma 4 (Fastest Progress). Let correct process p broadcast (p, k) at some time t t stable for the first time. Then p does not broadcast (p, k + l) before time t + 2lτ. Proof. By induction on l. For l = 1 assume by contradiction that p broadcasts (p, k + 1) before time t + 2τ. If (p, k + 1) is sent by line 15 it was sent also by line 9 before (since we are after t stable and Lemma 2 at least one message has been sent by line 9), which would not be the first time. Since (p, k + 1) is sent by line 9, p received a (p, k) message which was sent before time t + τ by some q (as response line 11) and before time t by p. Contradiction. Now assume p broadcasts (p, k + l 1) not before t + 2(l 1)τ the first time. With the same argumentation, p does not broadcast (p, k + l) before time t + 2lτ. Lemma 5 (Slowest Progress). Let correct process p broadcast (p, k) at some time t > t stable for the first time. Then p broadcasts (p, k + l) by time t + 2lτ +. Proof. By induction on l. Since p broadcasts (p, k) at time t, all neighbors of p receive this message by time t + τ + and every correct neighbor (since deg(p) > f there exists at least one) returns it. These messages are received by p by time t + 2τ +. Consider the time of the reception of the first of these messages. Because of Lemma 3 (note that k = round p(t) + 1), p receives no message (p, k ) with k > k by t + 2τ +, thus p broadcasts (p, k + 1). For l > 1, assume p broadcasts (p, k) by time t + 2(l 1)τ +. With the same argumentation, p does broadcasts (p, k + l) by time t + 2lτ +. 5

7 Lemma 6. For every time t > t stable + 2τ + Ξ and every correct neighbor q of correct process p, it holds that round p (t) lastmsg p,q (t) < Ξ. Proof. Since all variables are non-decreasing, the condition could be only violated by increasing round p. So we consider only times t, where round p (t) > round p(t). By Lemma 3 we have round p (t) = round p(t) + 1. By the Lemmas 2,3 and 4, a message (p, round p (t) Ξ + 1) was broadcast by p at time t s t 2τ Ξ, by Lemma 5, t s > t stable, so this message really exists. This message is received from every correct neighbor q by time t r t s + 2τ +, thus lastmsg p,q (t r) round p (t) Ξ + 1 > round p (t) Ξ. Because of Ξ > Θ we have t r t s + 2τ + t 2τ Ξ + 2τ + t, and so from Lemma 1 follows lastmsg p,q(t) lastmsg p,q(t r ). Hence we get round p(t) lastmsg p,q(t) < Ξ. Theorem 1 (Local Completeness). The algorithm in Figure 1 fulfills local completeness, i.e. eventually every non-correct neighbor of p is suspected by p. Proof. Let t crash be the time q crashes, and t = max{t crash, t GST }. Then no message from q to p is received after t + τ +. After this, lastmsg p,q round p remains unchanged. By Lemma 5, round p reaches round p (t) + Ξ by time max{t crash + τ +, t stable } + 2Ξτ +. Since round p is nondecreasing, q remains suspected from then on. Theorem 2 (Eventual Local Accuracy). The algorithm in Figure 1 fulfills eventual local accuracy, i.e. eventually p stops suspecting every correct neighbor of p. Proof. Follows directly from Lemma 6 and line 8 of the algorithm. Corollary 1. The algorithm in Figure 1 implements a self-stabilizing eventually perfect local failure detector in a sparse network. After stabilization, a crashed process is suspected Ξ rounds after it crashed, i.e. the worst case failure detection time is 2Ξτ +. 5 Bounded Memory and Link Capacity In this section we give a solution to failure detection which requires that the number of messages which can be in transit at the same time over one link is bounded, and this bound is known in advance. In contrast to the algorithm of the previous section, this one requires just bounded memory size. We believe that this result is of practical interest. Real computers have bounded memory, which is not only used to store variables of our algorithms, but also to store messages in various queues. Since, queues are the significant parts of links, the assumptions that the number of messages are bounded seems reasonable to us. The algorithm is depicted in Figure 2. In concept, it works close to the one in the last section. However, since our integers are bounded, eventually we need to wrap-around the round number. We call one such cycle a phase. To avoid that messages of the previous phase interfere with the current one, we use phase numbers. Since also the range of the phase numbers is bounded due to the bounded memory assumption, we have to ensure that there are sufficiently many distinct phase numbers such that no interference is possible. We show in our analysis, that if there are at most M messages in all links of a process, M + 2 phases are sufficient to ensure stabilization. The idea behind this is that there exists at least one phase which cannot be shorten by faults messages from the unstable period. The second difference to the algorithm in Figure 1 is that this algorithm broadcasts only on a phase switch, whereas the previous one broadcasts every round. By our assumption, Q(p, t GST ) M < for all processes p. For any phase number ph we further define next(ph) = (ph + 1) mod (M + 2) and prev(ph) = (ph + M + 1) mod (M + 2). 6

8 1 state variables 2 phase p {0,..., M + 1} 3 q Π : lastmsg p [q] {0,..., Ξ} 4 5 if received (p, ph, k) from q 6 if ph = phase p and k > lastmsg p [q] 7 if k < Ξ 8 lastmsg p [q] k 9 send (p, phase p, k + 1) to q 10 else 11 suspect {r Π lastmsg p [r] = 0} 12 phase p (phase p + 1) mod (M + 2) 13 r Π : lastmsg p [r] 0 14 send (p, phase p, 1) to all neighbors if received (q, ph, k) from q 17 send (q, ph, k) to q on deadlock-prevention-event do 20 q Π : send (p, phase p, lastmsg p [q] + 1) to q Figure 2: Algorithm for process p with known upper bound on number of messages on links. Lemma 7. For every process p, in any execution of our algorithm, there exists at least one phase number ph 0, such that no message (p, ph 0, k) is in Q(p, t GST ) and ph 0 phase p (t GST ). Proof. Obviously, Q(p, t GST ) = x M. At time t GST process p can be in one phase only. The number of phase numbers is M + 2 > x + 1 such that at least 1 phase number remains. We now define two properties, that define the legitimate states for process p. In more detail, when the stability property is fulfilled, it is ensured that there are no malicious messages in transit anymore. The progress property ensures that the system is not deadlocked, i.e. there are sufficiently many messages in transit to keep the failure detector working. Definition 1 (Stability). For a process p, the predicate PS(p, t) holds at time t iff there is no next(phase p (t)) message in transit. Formally, PS(p, t) k : (p, next(phase p (t)), k) Q(p, t) Definition 2 (Progress). For a process p, the predicate PP(p, t) holds at time t iff there is at least one correct neighbor q of p, from or to which a message with k > lastmsg p,q (p, t) and current phase is in transit. Formally, PP(p, t) q (C nb(p)) k > lastmsg p,q (p, t) : (p, phase p (t), k) Q(p, q, t) We start by showing closure of progress, i.e. if PP holds once after t GST it holds forever. Lemma 8. If there is time t 0 t GST, where PP(p, t 0 ) holds, then PP(p, t) holds also for all times t > t 0. Formally, t 0 t GST : PP(p, t 0 ) t > t 0 PP(p, t) Proof. Assume by contradiction that there is a time t > t 0, where PP(p, t) does not hold anymore for the first time. Since by assumption the predicate held before that, for some nonfaulty q, either lastmsg p,q (t) lastmsg p,q(t) or (p, phase p (t), k) / Q (p, q, t) anymore. In both cases, p has received a (p, phase p (t), k ) message with k > lastmsg p,q (t) and thus sends either a (p, phase p (t), lastmsg p,q (t) + 1) or sends a (p, phase p (t), round p (t) + 1) message and sets lastmsg p,q = 0. In both cases the property holds also at time t. Contradiction. 7

9 The following lemma ensures convergence to PP, i.e. bounded time after t GST it is guaranteed that PP holds. Lemma 9. By time t GST + η, PP(p, t) holds forever. Proof. By time t GST + η, p sends a (p, phase p, lastmsg p,q + 1) to every neighbor q. Since f < deg(p), at least one of them is non-faulty and thus PP holds for p at this time. By Lemma 8 after that PP holds forever. We have seen that our algorithm stabilizes such that PP always holds after bounded time after t GST. We now turn our attention to the PS property and start with some preliminary lemmas. Lemma 10 (Fastest Progress). Assume p starts phase ph = phase p (t) by broadcasting (p, ph, 1) at time t, and PS(p, t) holds. Then p does not broadcast (p, next(ph), 1) by time t + 2τ Ξ > t + 2τ + Proof. Note that p broadcasts (p, next(ph), 1) only if it receives a (p, ph, Ξ) message from one of its neighbors. We show by induction on Ξ that this is the case not before t + 2τ Ξ. For Ξ = 1, note that sending (p, ph, Ξ) to some neighbor q and receiving the answer takes at least time 2τ, and since by PS(p, t) no other messages with phase ph are in transit, p cannot receive (p, ph, 1) before t + 2τ. Because p is still in phase ph, no message with other phases were broadcast, thus PS still holds. Assume p receives a Ξ 1 message not before t + 2τ (Ξ 1). By the same argumentation, this message is not received before t + 2τ Ξ and PS holds. Since Ξ > Θ (compare Section 2.1), t + 2τ Ξ > t + 2τ +. Lemma 11 (Slowest Progress). Assume p starts phase ph by broadcasting (p, ph, 1) at time t. Then p broadcasts (p, next(ph), 1) by time t + 2τ + Ξ. Proof. Note that p broadcasts (p, next(ph), 1) if it receives a (p, ph, Ξ) message from one of its neighbors and is still in phase ph. If p is no more in phase ph we are done, so we show by induction on Ξ that p receives a (p, ph, Ξ) message by time t + 2τ +. Sending a message to a neighbor and back requires at most time 2τ +, thus by time t + 2τ + receives (p, ph, 1). For the inductive step assume p receives (p, ph, Ξ 1) by time t + 2τ + (Ξ 1). Then by the same argumentation p receives (p, ph, Ξ) by time t + 2τ + Ξ Lemma 12. Assume phase p (t) = ph. Then phase p (t 1 ) = prev(ph) for some times t 1 > t > t GST only if p was in all phases in the time interval [t, t 1 ]. Proof. By line 12 of the algorithm, p changes its phase only to next(phase p (t)) and thus has to adopt all other values before reaching prev(phase p (t)). We now show that PS is reached shortly after t GST. Lemma 13. PS(p, t) holds at time t = t GST + 2τ +. Proof. We have to show that no messages (p, l, k) for some k and l = next(phase p (t)) is in transit at time t. Such a message cannot be in Q(p, t GST ) since all these messages are received by t GST + τ +. Such a message cannot be a reply from one of p s neighbors to a faulty message which was in Q(p, t GST ) either since all these responses must be received by p before t. Thus, message (p, l, k) can only be in transit at time t if p was in phase l at some time t 1, t GST t 1 t. It remains to show that this is not possible. Since p is in phase prev(l) at time t it must, by Lemma 12, have been in all phases (0..M +1) between time t 1 and t, thus there must be some time t 2, t 1 t 2 t such that phase p (t 2 ) = 8

10 prev(ph 0 ), i.e. phase ph 0 was started then. It follows that PS(p, t 2 ). By Lemma 10 this phase cannot be terminated by some time t 3 > t 2 + 2τ + t which is a contradiction to p being in phase prev(l) at time t. It remains to show closure, i.e. if PS is reached once, it holds forever. Definition 3. We define σ(p, t, ph) as the first time after t, where p reaches phase ph. Formally, σ(p, t, ph) = min{t > t phase p (t ) = ph t (t < t < t phase p (t ) = ph)} Lemma 14. From PS(p, t) where t t GST follows that PS(p, t ) holds for all times t, t t < σ(p, t, next(phase p (t))). Proof. Obviously, since phase p remains unchanged, no spontaneous messages are generated after t GST and p sends phase p (t) messages only. Lemma 15. Let PS(p, t) hold at time σ(p, t, ph) where t t GST. t = σ(p, t, next(ph)). Then PS(p, t ) at time Proof. By Lemma 10 p terminates phase ph after σ(p, t, ph) + 2τ +. All messages which are in transit to p at time σ(p, t, ph) are received by time σ(p, t, ph)+τ +. All messages for other phases than ph are ignored by p (and hence no messages are sent). All messages for phases other than ph which are in transit from p to its neighbors are answered by them by line 17. The answers are received by p by time σ(p, t, ph) + 2τ + and ignored as well since p is still in phase ph. Hence no messages for other phases than ph are in transit at time t. Since next(next(ph)) ph the Lemma holds. Lemma 16. After time t stable = t GST + 2τ +, PS holds at all phase switch times. Proof. By Lemma 13 PS(p, t) holds at time t = t GST + 2τ +. By Lemma 14 follows that PS(p, t ) holds for all times t, t t < σ(p, t, next(phase p (t))). From an inductive application of Lemma 15 it follows that PS holds at all phase switch times after that. From these lemmas it follows that after some times, all phases are sufficiently long to timeout processes. Thus we show local completeness and local accuracy in the following. Theorem 3 (Local Completeness). The algorithm in Figure 2 fulfills local completeness, i.e. eventually every non-correct neighbor of p is suspected by p. Proof. Assume neighbor q of p has crashed. By Lemma 9, PP holds by time t GST + η. Note that every message (p, ph, k) from q, with k > lastmsg p,q and ph = phase p causes either a message (p, ph, k + 1) (for k < Ξ) or a (p, ph + 1, 1) message. Consequently, eventually p reaches k = Ξ and switches to the next phase (lines 11-14). When p reaches k = Ξ in the next phase, lastmsg p,q = 0, since there was no message from q. According to line 11, p suspects q. Theorem 4 (Eventual Local Accuracy). The algorithm in Figure 2 fulfills eventual local accuracy, i.e. eventually p stops suspecting every correct neighbor of p. Proof. By Lemma 16 and Lemma 10 all phases that are started after t GST + 2τ + are longer than 2τ +. This is sufficiently long for all answers of correct process p s correct neighbors q to p s (p, ph, 1) message are received by p before it executes line 11 at some time t. It follows lastmsg pq (t) > 0 for every correct neighbor q when p executes line 11 such that no correct processes will ever be suspected by p. Corollary 2. The algorithm in Figure 2 implements a self-stabilizing eventually perfect local failure detector in a sparse network. When a process crashes in a phase (after replying at least one message) it is suspected at the end of the next phase. Thus, it is easy to see that the worst case failure detection time once the failure detector has stabilized is (4Ξ 1)τ +. 9

11 6 Conclusions We presented two time free implementations of P l which are self-stabilizing. The results described in this paper are in the context of sparse networks. Obviously they apply to fully connected networks as well. We provided two algorithms for a self-stabilizing local failure detector in a sparse network, using little timing assumptions. Obviously, the existence of a bound on the channel capacity can be identified as an additional design parameter. Whereas the solution with unbounded space is only of theoretical interest, it is the second approach that is relevant in practice. The assumption of bounded channel capacity is valid in most systems. Moreover, the chosen bound has no effect on the runtime of the algorithm and the message size is also only logarithmic in this bound. It is an open question, whether there is a solution for unbounded channel capacity and bounded space. Although unbounded channel capacity may be practically not interesting, the subject is at least of theoretical meaning. However, it remains the question whether there is a solution given that there is an unknown bound on channel capacity requiring just bounded space. References [1] Marcos Aguilera, Gérard Le Lann, and Sam Toueg. On the impact of fast failure detectors on real-time fault-tolerant systems. In Proceedings of the 16th International Symposium on Distributed Computing (DISC 02), volume 2508 of LNCS, pages Springer Verlag, Oct [2] Marcos K. Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. On implementing Omega with weak reliability and synchrony assumptions. In Proceeding of the 22nd Annual ACM Symposium on Principles of Distributed Computing (PODC 03), [3] J. Beauquier and S. Kekkonen. Fault-tolerance and self-stabilization: Impossibility results and solutions using self-stabilizing failure detectors. International Journal of Systems Science, 28(11): , [4] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2): , March [5] Ariel Daliot, Danny Dolev, and Hanna Parnas. Linear time byzantine self-stabilizing clock synchronization. In Proceedings of the 7th International Conference on Principles of Distributed Systems, Dec to appear. [6] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11): , [7] Michael J. Fischer, Nancy A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty processor. Journal of the ACM, 32(2): , April [8] Felix Gärtner. On crash failures and self-stabilization. Presentation at Journées Internationales sur l auto-stabilisation, CIRM, Luminy, France, October 21-25, 2002, October [9] Mohamed G. Gouda and Nicholas J. Multari. Stabilizing communication protocols. IEEE Transactions on Computers, 40(4): , April

12 [10] J.-F. Hermant and Gérard Le Lann. Fast asynchronous uniform consensus in real-time distributed systems. IEEE Transactions on Computers, 51(8): , August [11] Martin Hutle. An efficient failure detector for sparsely connected networks. In Proc. IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 04), Innsbruck, Austria, February [12] Martin Hutle. On omega in sparse networks. In Proc. 10th International Symposium Pacific Rim Dependable Computing (PRDC 04), Papeete, Tahiti, March [13] Gérard Le Lann and Ulrich Schmid. How to implement a timer-free perfect failure detector in partially synchronous systems. Technical Report 183/1-127, Department of Automation, Technische Universität Wien, January (submitted). [14] Josef Widder. Booting clock synchronization in partially synchronous systems. In Proceedings of the 17th International Symposium on Distributed Computing (DISC 03), volume 2848 of LNCS, pages Springer Verlag, October

Asynchronous Models For Consensus

Asynchronous Models For Consensus Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector

More information

Approximation of δ-timeliness

Approximation of δ-timeliness Approximation of δ-timeliness Carole Delporte-Gallet 1, Stéphane Devismes 2, and Hugues Fauconnier 1 1 Université Paris Diderot, LIAFA {Carole.Delporte,Hugues.Fauconnier}@liafa.jussieu.fr 2 Université

More information

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x Failure Detectors Seif Haridi haridi@kth.se 1 Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to detect failures Heartbeats, timeouts,

More information

Consensus when failstop doesn't hold

Consensus when failstop doesn't hold Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has

More information

Failure Detection with Booting in Partially Synchronous Systems

Failure Detection with Booting in Partially Synchronous Systems Proc. Fifth European Dependable Computing Conference (EDCC-5) Budapest, Hungary c Springer Verlag. LNCS January 31, 2005 Failure Detection with Booting in Partially Synchronous Systems Josef Widder 1,

More information

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Antonio FERNÁNDEZ Ernesto JIMÉNEZ Michel RAYNAL LADyR, GSyC, Universidad Rey Juan Carlos, 28933

More information

THE chase for the weakest system model that allows

THE chase for the weakest system model that allows 1 Chasing the Weakest System Model for Implementing Ω and Consensus Martin Hutle, Dahlia Malkhi, Ulrich Schmid, Lidong Zhou Abstract Aguilera et al. and Malkhi et al. presented two system models, which

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Xianbing Wang 1, Yong-Meng Teo 1,2, and Jiannong Cao 3 1 Singapore-MIT Alliance, 2 Department of Computer Science,

More information

Optimal Resilience Asynchronous Approximate Agreement

Optimal Resilience Asynchronous Approximate Agreement Optimal Resilience Asynchronous Approximate Agreement Ittai Abraham, Yonatan Amit, and Danny Dolev School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel {ittaia, mitmit,

More information

Shared Memory vs Message Passing

Shared Memory vs Message Passing Shared Memory vs Message Passing Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Revised: 15 February 2004 Abstract This paper determines the computational strength of the shared memory abstraction

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

The Theta-Model: Achieving Synchrony without Clocks

The Theta-Model: Achieving Synchrony without Clocks Distributed Computing manuscript No. (will be inserted by the editor) Josef Widder Ulrich Schmid The Theta-Model: Achieving Synchrony without Clocks Supported by the FWF project Theta (proj. no. P17757-N04)

More information

Self-stabilizing Byzantine Agreement

Self-stabilizing Byzantine Agreement Self-stabilizing Byzantine Agreement Ariel Daliot School of Engineering and Computer Science The Hebrew University, Jerusalem, Israel adaliot@cs.huji.ac.il Danny Dolev School of Engineering and Computer

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

Easy Consensus Algorithms for the Crash-Recovery Model

Easy Consensus Algorithms for the Crash-Recovery Model Reihe Informatik. TR-2008-002 Easy Consensus Algorithms for the Crash-Recovery Model Felix C. Freiling, Christian Lambertz, and Mila Majster-Cederbaum Department of Computer Science, University of Mannheim,

More information

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Xianbing Wang, Yong-Meng Teo, and Jiannong Cao Singapore-MIT Alliance E4-04-10, 4 Engineering Drive 3, Singapore 117576 Abstract

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Early consensus in an asynchronous system with a weak failure detector*

Early consensus in an asynchronous system with a weak failure detector* Distrib. Comput. (1997) 10: 149 157 Early consensus in an asynchronous system with a weak failure detector* André Schiper Ecole Polytechnique Fe dérale, De partement d Informatique, CH-1015 Lausanne, Switzerland

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Distributed Consensus

Distributed Consensus Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement

More information

Eventually consistent failure detectors

Eventually consistent failure detectors J. Parallel Distrib. Comput. 65 (2005) 361 373 www.elsevier.com/locate/jpdc Eventually consistent failure detectors Mikel Larrea a,, Antonio Fernández b, Sergio Arévalo b a Departamento de Arquitectura

More information

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Martin Hutle martin.hutle@epfl.ch André Schiper andre.schiper@epfl.ch École Polytechnique Fédérale de Lausanne

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Crash-resilient Time-free Eventual Leadership

Crash-resilient Time-free Eventual Leadership Crash-resilient Time-free Eventual Leadership Achour MOSTEFAOUI Michel RAYNAL Corentin TRAVERS IRISA, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France {achour raynal travers}@irisa.fr

More information

Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures

Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures Research Report 56/2007. Technische Universität Wien, Institut für Technische Informatik (Revised Version of Technical Report 183/1-126, Department of Automation, Technische Universität Wien, Jan. 2003)

More information

Failure Detection and Consensus in the Crash-Recovery Model

Failure Detection and Consensus in the Crash-Recovery Model Failure Detection and Consensus in the Crash-Recovery Model Marcos Kawazoe Aguilera Wei Chen Sam Toueg Department of Computer Science Upson Hall, Cornell University Ithaca, NY 14853-7501, USA. aguilera,weichen,sam@cs.cornell.edu

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Technical Report Department for Mathematics and Computer Science University of Mannheim TR-2006-008 Felix C. Freiling

More information

Model Checking of Fault-Tolerant Distributed Algorithms

Model Checking of Fault-Tolerant Distributed Algorithms Model Checking of Fault-Tolerant Distributed Algorithms Part I: Fault-Tolerant Distributed Algorithms Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef Widder LOVE 2016 @ TU Wien Josef Widder (TU

More information

Impossibility of Distributed Consensus with One Faulty Process

Impossibility of Distributed Consensus with One Faulty Process Impossibility of Distributed Consensus with One Faulty Process Journal of the ACM 32(2):374-382, April 1985. MJ Fischer, NA Lynch, MS Peterson. Won the 2002 Dijkstra Award (for influential paper in distributed

More information

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!

More information

A Realistic Look At Failure Detectors

A Realistic Look At Failure Detectors A Realistic Look At Failure Detectors C. Delporte-Gallet, H. Fauconnier, R. Guerraoui Laboratoire d Informatique Algorithmique: Fondements et Applications, Université Paris VII - Denis Diderot Distributed

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007 Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 Hyun Chul Chung, Jennifer L. Welch Department of Computer Science & Engineering Texas A&M University

More information

Consensus. Consensus problems

Consensus. Consensus problems Consensus problems 8 all correct computers controlling a spaceship should decide to proceed with landing, or all of them should decide to abort (after each has proposed one action or the other) 8 in an

More information

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems ACM SIGACT News Distributed Computing Column 17 Sergio Rajsbaum Abstract The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These

More information

Wait-Free Dining Under Eventual Weak Exclusion

Wait-Free Dining Under Eventual Weak Exclusion Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Srikanth Sastry Texas A&M University Department of Computer Science College Station, TX 77843-3112, USA {pike,yantao,sastry}@cs.tamu.edu

More information

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion Srikanth Sastry Computer Science and Engr Texas A&M University College Station, TX, USA sastry@cse.tamu.edu Scott M. Pike

More information

Combining Shared Coin Algorithms

Combining Shared Coin Algorithms Combining Shared Coin Algorithms James Aspnes Hagit Attiya Keren Censor Abstract This paper shows that shared coin algorithms can be combined to optimize several complexity measures, even in the presence

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

Reliable Broadcast for Broadcast Busses

Reliable Broadcast for Broadcast Busses Reliable Broadcast for Broadcast Busses Ozalp Babaoglu and Rogerio Drummond. Streets of Byzantium: Network Architectures for Reliable Broadcast. IEEE Transactions on Software Engineering SE- 11(6):546-554,

More information

Randomized Protocols for Asynchronous Consensus

Randomized Protocols for Asynchronous Consensus Randomized Protocols for Asynchronous Consensus Alessandro Panconesi DSI - La Sapienza via Salaria 113, piano III 00198 Roma, Italy One of the central problems in the Theory of (feasible) Computation is

More information

Failure detection and consensus in the crash-recovery model

Failure detection and consensus in the crash-recovery model Distrib. Comput. (2000) 13: 99 125 c Springer-Verlag 2000 Failure detection and consensus in the crash-recovery model Marcos Kawazoe Aguilera 1, Wei Chen 2, Sam Toueg 1 1 Department of Computer Science,

More information

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced Primary Partition \Virtually-Synchronous Communication" harder than Consensus? Andre Schiper and Alain Sandoz Departement d'informatique Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne (Switzerland)

More information

On the weakest failure detector ever

On the weakest failure detector ever On the weakest failure detector ever The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Guerraoui, Rachid

More information

The Weighted Byzantine Agreement Problem

The Weighted Byzantine Agreement Problem The Weighted Byzantine Agreement Problem Vijay K. Garg and John Bridgman Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712-1084, USA garg@ece.utexas.edu,

More information

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Michel RAYNAL, Julien STAINER Institut Universitaire de France IRISA, Université de Rennes, France Message adversaries

More information

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George) Time Lakshmi Ganesh (slides borrowed from Maya Haridasan, Michael George) The Problem Given a collection of processes that can... only communicate with significant latency only measure time intervals approximately

More information

Distributed Systems Byzantine Agreement

Distributed Systems Byzantine Agreement Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.

More information

Tolerating Permanent and Transient Value Faults

Tolerating Permanent and Transient Value Faults Distributed Computing manuscript No. (will be inserted by the editor) Tolerating Permanent and Transient Value Faults Zarko Milosevic Martin Hutle André Schiper Abstract Transmission faults allow us to

More information

Bridging the Gap: Byzantine Faults and Self-stabilization

Bridging the Gap: Byzantine Faults and Self-stabilization Bridging the Gap: Byzantine Faults and Self-stabilization Thesis submitted for the degree of DOCTOR of PHILOSOPHY by Ezra N. Hoch Submitted to the Senate of The Hebrew University of Jerusalem September

More information

Gradient Clock Synchronization

Gradient Clock Synchronization Noname manuscript No. (will be inserted by the editor) Rui Fan Nancy Lynch Gradient Clock Synchronization the date of receipt and acceptance should be inserted later Abstract We introduce the distributed

More information

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS A Dissertation by YANTAO SONG Submitted to the Office of Graduate Studies of Texas A&M University in partial

More information

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

The Heard-Of Model: Computing in Distributed Systems with Benign Failures The Heard-Of Model: Computing in Distributed Systems with Benign Failures Bernadette Charron-Bost Ecole polytechnique, France André Schiper EPFL, Switzerland Abstract Problems in fault-tolerant distributed

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Wait-Free Dining Under Eventual Weak Exclusion

Wait-Free Dining Under Eventual Weak Exclusion Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Kaustav Ghoshal Texas A&M University Department of Computer Science College Station, TX 77843-3112, USA {pike, yantao, kghoshal}@tamu.edu

More information

Byzantine behavior also includes collusion, i.e., all byzantine nodes are being controlled by the same adversary.

Byzantine behavior also includes collusion, i.e., all byzantine nodes are being controlled by the same adversary. Chapter 17 Byzantine Agreement In order to make flying safer, researchers studied possible failures of various sensors and machines used in airplanes. While trying to model the failures, they were confronted

More information

Byzantine Agreement. Chapter Validity 190 CHAPTER 17. BYZANTINE AGREEMENT

Byzantine Agreement. Chapter Validity 190 CHAPTER 17. BYZANTINE AGREEMENT 190 CHAPTER 17. BYZANTINE AGREEMENT 17.1 Validity Definition 17.3 (Any-Input Validity). The decision value must be the input value of any node. Chapter 17 Byzantine Agreement In order to make flying safer,

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra I.B.M Thomas J. Watson Research Center, Hawthorne, New York and Sam Toueg Cornell University, Ithaca, New York We introduce

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Ordering events. Lamport and vector clocks. Global states. Detecting failures. Required reading for this topic } Leslie Lamport,"Time, Clocks, and the Ordering

More information

On Stabilizing Departures in Overlay Networks

On Stabilizing Departures in Overlay Networks On Stabilizing Departures in Overlay Networks Dianne Foreback 1, Andreas Koutsopoulos 2, Mikhail Nesterenko 1, Christian Scheideler 2, and Thim Strothmann 2 1 Kent State University 2 University of Paderborn

More information

Towards optimal synchronous counting

Towards optimal synchronous counting Towards optimal synchronous counting Christoph Lenzen Joel Rybicki Jukka Suomela MPI for Informatics MPI for Informatics Aalto University Aalto University PODC 5 July 3 Focus on fault-tolerance Fault-tolerant

More information

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:

More information

Causality and Time. The Happens-Before Relation

Causality and Time. The Happens-Before Relation Causality and Time The Happens-Before Relation Because executions are sequences of events, they induce a total order on all the events It is possible that two events by different processors do not influence

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

Genuine atomic multicast in asynchronous distributed systems

Genuine atomic multicast in asynchronous distributed systems Theoretical Computer Science 254 (2001) 297 316 www.elsevier.com/locate/tcs Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper Departement d Informatique, Ecole

More information

Do we have a quorum?

Do we have a quorum? Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register

More information

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence Hirotsugu Kakugawa and Toshimitsu Masuzawa Department of Computer Science Graduate School of Information Science and Technology

More information

Simultaneous Consensus Tasks: A Tighter Characterization of Set-Consensus

Simultaneous Consensus Tasks: A Tighter Characterization of Set-Consensus Simultaneous Consensus Tasks: A Tighter Characterization of Set-Consensus Yehuda Afek 1, Eli Gafni 2, Sergio Rajsbaum 3, Michel Raynal 4, and Corentin Travers 4 1 Computer Science Department, Tel-Aviv

More information

Asynchronous Leasing

Asynchronous Leasing Asynchronous Leasing Romain Boichat Partha Dutta Rachid Guerraoui Distributed Programming Laboratory Swiss Federal Institute of Technology in Lausanne Abstract Leasing is a very effective way to improve

More information

How to solve consensus in the smallest window of synchrony

How to solve consensus in the smallest window of synchrony How to solve consensus in the smallest window of synchrony Dan Alistarh 1, Seth Gilbert 1, Rachid Guerraoui 1, and Corentin Travers 2 1 EPFL LPD, Bat INR 310, Station 14, 1015 Lausanne, Switzerland 2 Universidad

More information

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont Benchmarking Model Checkers with Distributed Algorithms Étienne Coulouma-Dupont November 24, 2011 Introduction The Consensus Problem Consensus : application Paxos LastVoting Hypothesis The Algorithm Analysis

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Byzantine agreement with homonyms

Byzantine agreement with homonyms Distrib. Comput. (013) 6:31 340 DOI 10.1007/s00446-013-0190-3 Byzantine agreement with homonyms Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Anne-Marie Kermarrec Eric Ruppert Hung Tran-The

More information

6.852: Distributed Algorithms Fall, Class 24

6.852: Distributed Algorithms Fall, Class 24 6.852: Distributed Algorithms Fall, 2009 Class 24 Today s plan Self-stabilization Self-stabilizing algorithms: Breadth-first spanning tree Mutual exclusion Composing self-stabilizing algorithms Making

More information

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA)

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Christian Mueller November 25, 2005 1 Contents 1 Clock synchronization in general 3 1.1 Introduction............................

More information

Snap-Stabilizing PIF and Useless Computations

Snap-Stabilizing PIF and Useless Computations Snap-Stabilizing PIF and Useless Computations Alain Cournier Stéphane Devismes Vincent Villain LaRIA, CNRS FRE 2733 Université de Picardie Jules Verne, Amiens (France) Abstract A snap-stabilizing protocol,

More information

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model?

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model? Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model? Jaap-Henk Hoepman Department of Computer Science, University of Twente, the Netherlands hoepman@cs.utwente.nl

More information

EFFICIENT COUNTING WITH OPTIMAL RESILIENCE. Lenzen, Christoph.

EFFICIENT COUNTING WITH OPTIMAL RESILIENCE. Lenzen, Christoph. https://helda.helsinki.fi EFFICIENT COUNTING WITH OPTIMAL RESILIENCE Lenzen, Christoph 2017 Lenzen, C, Rybicki, J & Suomela, J 2017, ' EFFICIENT COUNTING WITH OPTIMAL RESILIENCE ' SIAM Journal on Computing,

More information

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS I R I P U B L I C A T I O N I N T E R N E N o 1599 S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS ROY FRIEDMAN, ACHOUR MOSTEFAOUI,

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

Byzantine Agreement. Gábor Mészáros. CEU Budapest, Hungary

Byzantine Agreement. Gábor Mészáros. CEU Budapest, Hungary CEU Budapest, Hungary 1453 AD, Byzantium Distibuted Systems Communication System Model Distibuted Systems Communication System Model G = (V, E) simple graph Distibuted Systems Communication System Model

More information

A Connectivity Model for Agreement in Dynamic Systems

A Connectivity Model for Agreement in Dynamic Systems A Connectivity Model for Agreement in Dynamic Systems Carlos Gómez-Calzado 1, Arnaud Casteigts 2, Alberto Lafuente 1, and Mikel Larrea 1 1 University of the Basque Country UPV/EHU, Spain {carlos.gomez,

More information

Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony. Eli Gafni. Computer Science Department U.S.A.

Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony. Eli Gafni. Computer Science Department U.S.A. Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony (Extended Abstract) Eli Gafni (eli@cs.ucla.edu) Computer Science Department University of California, Los Angeles Los Angeles, CA 90024

More information

Variations on Itai-Rodeh Leader Election for Anonymous Rings and their Analysis in PRISM

Variations on Itai-Rodeh Leader Election for Anonymous Rings and their Analysis in PRISM Variations on Itai-Rodeh Leader Election for Anonymous Rings and their Analysis in PRISM Wan Fokkink (Vrije Universiteit, Section Theoretical Computer Science CWI, Embedded Systems Group Amsterdam, The

More information

On the weakest failure detector ever

On the weakest failure detector ever Distrib. Comput. (2009) 21:353 366 DOI 10.1007/s00446-009-0079-3 On the weakest failure detector ever Rachid Guerraoui Maurice Herlihy Petr Kuznetsov Nancy Lynch Calvin Newport Received: 24 August 2007

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1 Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1 Stavros Tripakis 2 VERIMAG Technical Report TR-2004-26 November 2004 Abstract We introduce problems of decentralized

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

Optimal clock synchronization revisited: Upper and lower bounds in real-time systems

Optimal clock synchronization revisited: Upper and lower bounds in real-time systems Research Report 71/006, Technische Universität Wien, Institut für Technische Informatik, 006 July 5, 006 Optimal clock synchronization revisited: Upper and lower bounds in real-time systems Heinrich Moser

More information