Wait-Free Dining Under Eventual Weak Exclusion

Size: px
Start display at page:

Download "Wait-Free Dining Under Eventual Weak Exclusion"

Transcription

1 Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Kaustav Ghoshal Texas A&M University Department of Computer Science College Station, TX , USA {pike, yantao, Technical Report: TAMU-CS-TR

2 Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Kaustav Ghoshal Texas A&M University Department of Computer Science College Station, TX , USA Abstract We explore dining philosophers under eventual weak exclusion in environments subject to permanent crash faults. Eventual weak exclusion permits neighboring diners to eat concurrently only finitely many times, but requires that, for each run, there exists a (potentially unknown) time after which live neighbors never eat simultaneously. This safety property models systems where resources can be recovered from crashed processes or where sharing violations precipitate only transient faults. Although dining under eventual weak exclusion can be solved in synchronous environments, the problem is unsolvable in asynchronous systems, where crash faults can precipitate permanent starvation failures among other diners. We present the first wait-free solution to this problem under partial synchrony, along with a careful proof of correctness. Our solution is related to the hygienic algorithm of Chandy and Misra insofar as we use forks for safety and a dynamic ordering of process priorities for progress. Additionally, we use a local refinement of the eventually perfect failure detector 3P 1 to guarantee wait-freedom in the presence of crash faults. This localized oracle provides information only about immediate neighbors, and, as such, is fundamental to the scalability of our approach, since it is implementable in sparse communication graphs that are partitionable by crash faults. Our results have potential applications to duty-cycle scheduling in sensor networks, as well as distributed daemon refinement for self-stabilizing algorithms. Keywords: Dining Philosophers, Failure Detectors, Fault Tolerance, Wait-Freedom, Stabilization Technical Report: TAMU-CS-TR , Texas A&M University, May This work was supported by the Advanced Research Program of the Texas Higher Education Coordinating Board under Project Number

3 1 Introduction Distributed processes require access to a variety of run-time resources such as memory, bandwidth, ports, and CPU cycles. A resource is a generic concept which can denote any entity necessary for computation or communication. Resources can be physical (e.g., routers, wireless access points), logical (e.g., time slots, channels), or virtual (e.g., file handles, virtual machines). Resources are often finite in supply and non-trivial in cost. In distributed environments, demand for resource utilization often exceeds supply, thereby creating a scarcity of resources which must be shared or allocated among competing processes. Problem Statement. We consider the dining philosophers problem [8, 15] in environments subject to permanent crash faults, and explore its solvability in the presence of unreliable failure detectors [4]. Dining is a fundamental model of static resource allocation, where processes in a distributed system require periodic access to a fixed subset of mutually exclusive resources. Processes with overlapping resource needs are connected as neighbors in an abstract conflict graph. Additionally, each diner is either thinking, hungry, or eating. These states correspond to three basic phases of computation: executing independently, requesting shared resources, and utilizing resources in a critical section, respectively. The progress specification for dining requires that every hungry process eventually eats. Various forms of the safety specification restrict concurrent eating among neighbors. The traditional safety constraint of strong exclusion requires that no two neighbors eat simultaneously. Strong exclusion is trivially unsolvable in faulty environments, because if any diner crashes while eating, all adjacent hungry neighbors must starve; otherwise, they would violate safety. This result is independent of whether crashes can be detected reliably. As such, strong exclusion is best suited only for resources that can be permanently corrupted by crashes. The more flexible model of weak exclusion requires only that no two live neighbors eat simultaneously. In particular, live diners can eat simultaneously with crashed neighbors. Recent results on fault-tolerant mutual exclusion indicate that this variation of dining is solvable in environments with Trusting oracles, a relatively powerful class of failure detectors that can reliably detect certain crashes [7]. Weak exclusion remains unsolvable, however, when considering less powerful detectors that are implementable in weaker models of partial synchrony. For example, with only an eventually perfect failure detector 3P, neighbors of a crashed process can still starve [17]. Primary Results. We explore dining under the safety constraint of eventual weak exclusion, and show that it is solvable using the oracle 3P in asynchronous environments subject to arbitrarily many crashes. Eventual weak exclusion, 3WX, permits neighboring diners to eat concurrently only finitely many times. More specifically, 3WX requires that, for each run, there exists a time after which no two live neighbors eat simultaneously. The time to convergence may be unknown, and it can vary from run to run. Still, this model of safety is sufficiently powerful to serve as a scheduling primitive for several practical applications. In particular, 3WX models systems where resources can be recovered from crashed processes, and/or where sharing violations precipitate only transient faults. We sketch some relevant examples in Section 2. Our approach is based on the eventually perfect failure detector 3P from the original hierarchy of Chandra-Toueg oracles [4]. Detectors in this class always suspect crashed processes, and eventually stop suspecting correct processes. As originally defined in [4], the scope of 3P is global, insofar as it provides information about every process in the system. One drawback of global oracles is that the communication overhead limits their practicality for large-scale networks. As such, other researchers have investigated scope-restricted variations of Chandra-Toueg detectors which only require information from or about a subset of processes [1, 12, 18]. Our dining solution uses a variant of 3P defined in [3, 14] for which 2

4 information is only required about immediate neighbors. This local refinement, 3P 1, can be implemented efficiently in sparse, large-scale, and even partitionable networks. As such, it underscores the practicality of our solution as a scalable approach to fault-tolerant schedulers. Significance. Dining is essentially a scheduling problem subject to local, overlapping constraints on shared resource utilization. As such, dining under eventual weak exclusion defines a class of schedulers that are permitted to make finitely many scheduling mistakes. Our solution is wait-free [13] insofar as no correct process will starve (i.e., cease to make progress) even if some or all other processes fault by crashing. In the presence of crash faults under partial synchrony, our work demonstrates a trade-off between schedulers that make finitely many transient mistakes (by scheduling live neighbors to eat), versus schedulers that make permanent mistakes (by never scheduling correct processes to eat in the wake of crash faults). On a practical level, our results provide a new technique for isolating the impact of crash faults. These results are demonstrated by presenting a wait-free dining algorithm and a careful proof of correctness. 2 Background and Technical Framework The dining philosophers problem is a classic model of static resource allocation in distributed systems. Originally proposed by Dijkstra for a ring topology [8], it was later generalized by Lynch for overlapping mutual exclusion problems on arbitrary graphs [15]. A dining instance is modeled by an undirected conflict graph DP = (Π, E), where each vertex p Π represents a diner, and each edge (p, q) E represents a set of resources shared between neighbors p and q. Each diner is either thinking, hungry, or eating. These abstract states correspond to three basic phases of an ordinary process: executing independently, requesting shared resources, and utilizing shared resources in a critical section, respectively. Initially, every process is thinking. Although processes may think forever, they are also permitted to become hungry at any time. By contrast, eating is always finite (but not necessarily bounded). Hungry neighbors are said to be in conflict, because they compete for shared but mutually exclusive resources. A correct solution to wait-free dining under eventual weak exclusion (3WX ) is a conflict-resolution algorithm that schedules diner transitions from hungry to eating, subject to the following two requirements: Safety: Progress: For every run, there exists a time after which no two live neighbors eat simultaneously. Every correct hungry process eventually eats, regardless of how many processes crash. The progress requirement guarantees fairness among hungry neighbors. In particular, dining solutions are not permitted to starve hungry processes by never scheduling them to eat. This property is also known elsewhere as starvation freedom and no lockout. In the presence of arbitrarily many crash faults, a dining algorithm that satisfies progress is called wait-free [13]. The safety requirement of eventual weak exclusion, 3WX, permits finitely many scheduling mistakes. A mistake occurs when two live neighbors are scheduled to eat simultaneously. We motivate two scenarios for which this model of exclusion is useful. Example 1: Consider a wireless network of sensors that must cover a given surveillance environment. Such sensors typically have finite power supplies; as such, every node will eventually crash due to battery depletion. Ideally, the life-span of the network should greatly exceed the life-span of its constituent nodes. With node redundancy, sensors can coordinate their activities so that only a subset are required to cover the environment during so-called active duty cycles, while the remainder can sleep to conserve energy. 3

5 In this scenario, the resources being shared are time slices, and the eating nodes are on duty. For stronger models of exclusion, the crash of a single sensor may precipitate starvation among its neighbors, leading to a loss of coverage, because nearby sensors are no longer scheduled to go on duty. By contrast, wait-free eventual weak exclusion can maintain coverage, but may schedule redundant neighbors for duty, even when fewer could have sufficed. This may have implications for energy conservation in the short run, but it does not affect the correctness of surveillance coverage. Since 3WX permits only finitely many scheduling mistakes, the dining algorithm eventually converges to a more energy-efficient long-term duty schedule without compromising short-term coverage. Example 2: Self-stabilization is a non-masking technique for tolerating transient faults [9]. A stabilizing algorithm automatically converges from any configuration to a closed set of safe states, and then continues ordinary execution thereafter. Stabilizing algorithms are often easier to design under interleaving semantics via a centralized daemon, for which at most one process is activated to take a step at any given time. In practice, however, many stabilizing algorithms need to execute correctly under parallel semantics as well. This can often be achieved via automatic model conversions, say, from read/write atomicity to message passing. An underlying distributed daemon schedules consistent sets of processes to take steps concurrently. Many existing transformations [2, 16] use dining algorithms to implement such daemons. Although such approaches have considered daemons that tolerate transient faults (such as data corruption), they do not address the need to provide wait-free scheduling guarantees in the presence of permanent crashes. In this scenario, diners become hungry in order to execute subsequent actions of the stabilization protocol. If crash faults precipitate starvation on the dining layer, however, correct processes will fail to take infinitely many steps a key requirement for convergence in stabilization. By contrast, wait-free dining under eventual weak exclusion guarantees that correct nodes will be scheduled infinitely often, regardless of crashes. This advantage comes at the expense of finitely many scheduling errors which may precipitate transient faults such as sharing violations. After the final scheduling mistake of any run, the stabilizing protocol may be left in an arbitrary configuration. But since stabilization guarantees convergence, the application will eventually recover to a safe state and then continue execution thereafter without further scheduling errors. Computational Model. We consider asynchronous environments where message delay, clock drift, and relative process speeds are unbounded. A system is modeled by a set of n distributed processes Π = {p 1, p 2,..., p n } that communicate only by asynchronous message passing. We assume that the communication graph is the same as the underlying conflict graph. Each pair of processes in the conflict graph is connected by a reliable channel, such that every message sent to a correct process is eventually received by that process, and messages are neither lost, duplicated, nor corrupted. Additionally, we posit a discrete global clock T, with the set of natural numbers IN as its range of clock ticks. T is merely a conceptual device to simplify our presentation; since T is inaccessible to processes in Π, it cannot be used to advantage by any algorithm executed by the system. Fault Patterns. Processes may fault only by crashing. Processes cannot crash at will, but only as the result of a crash fault, which occurs when a process ceases execution (without warning) and never recovers [6]. A fault pattern F models the occurrence of crash faults in a given run. Specifically, F is a function from the global time range T to the powerset of processes 2 Π, where F (t) denotes the subset of processes that have crashed by time t. Since crash faults are permanent, F is monotonically non-decreasing. We say that p is faulty in F if p F (t) for some time t; otherwise, we say that p is correct in F. Additionally, a process p is live at time t if p has not crashed by time t. That is, p / F (t). Thus, correct processes are always live, and faulty processes are live only prior to crashing. 4

6 Failure Detectors. An unreliable failure detector can be viewed as a distributed oracle that can be queried for (possibly incorrect) information about crashes in Π. Each process has access to its own local detector module that outputs the set of processes currently suspected of having crashed. Unreliable failure detectors are characterized by the kinds of mistakes they can make. Mistakes can include false-negatives (i.e., not suspecting a crashed process), as well as false-positives (i.e., wrongfully suspecting a correct process). In Chandra and Toueg s original definition [4], each class is defined by two properties: completeness and accuracy. Completeness restricts false negatives, while accuracy restricts false positives. Eventually perfect failure detectors (denoted by the class 3P) are defined by strong completeness and eventual strong accuracy. Informally, 3P detectors always suspect crashed processes, and eventually stop suspecting correct processes. Our algorithm uses a locally scope-restricted refinement of this detector class called 3P 1. It satisfies the foregoing properties, but only with respect to immediate neighbors [3, 14]: Local Strong Completeness: Every crashed process is eventually and permanently suspected by all correct neighbors. Local Eventual Accuracy: For every run, there exists a time after which no correct process is suspected by any correct neighbor. Thus, 3P 1 may commit an arbitrary number of false-positive mistakes by suspecting correct neighbors at most finitely many times during the computational prefix of any run. After some point, however, 3P 1 detectors converge to being well-founded, after which they provide reliable information about neighbors. Although the time to convergence is not known and may vary from run to run, this detector is still sufficiently powerful to mask crash faults under eventual weak exclusion, as we present in the next section. Additionally, 3P 1 can be implemented in many practical models of partial synchrony by using adaptive time-out mechanisms to estimate and eventually converge to the de facto, but unknown, timing bounds on parameters such as message delay and relative process speed [4, 10, 11] 3 A Wait-Free Dining Algorithm Under 3WX Our solution is related to the classic hygienic dining algorithm of Chandy and Misra [5]. In hygienic dining, a unique fork is associated with each edge in the conflict graph. Processes connected by a conflict edge share the corresponding fork, which is used to resolve conflicts over the overlapping set of resources they both need. In order to eat, a hungry process must collect and hold all of its shared forks. This provides a simple basis for safety, since at most one neighbor can hold a given fork at any time. We emphasize that the forks are not the resources. This popular misconception results from drawing conclusions about the dining metaphor that do not apply to the actual problem. In particular, it leads to the incorrect conclusion that each resource can be shared by at most two processes. Note that forks are not even mentioned in the problem specification; they are simply introduced as an implementation strategy to guarantee safety. For clarification, each resource can be shared by arbitrarily many processes, and will form a unique clique in the conflict graph. Neighbors that share many resources in common will be in several conflict cliques together, but will share only one fork to resolve their overlapping set of conflicts. In the original hygienic solution, any process that crashes while holding forks will cause its corresponding hungry neighbors to starve, because the forks necessary for eating cannot be acquired. In our solution, suspicion from 3P 1 serves as a proxy for permanently missing forks. Any process that crashes while holding a fork will never send that fork to its requesting neighbor. As such, the neighbor will not be able to eat unless a proxy for the missing fork can be supplied. The oracle 3P 1 serves exactly this purpose. The 5

7 local strong completeness property guarantees that any crashed process will be eventually and permanently suspected by all correct neighbors. As such, these neighbors will be able to use suspicion in place of the missing fork to proceed to eating. Unfortunately, the local eventually strong accuracy of 3P 1 also allows the oracle to make mistakes. In particular, it can wrongfully suspect live neighbors finitely many times during any run. This creates a potential safety hazard, since two neighbors which suspect each other may both proceed to eating, regardless of which neighbor actually holds the shared fork. These violations of mutual exclusion, however, can only occur during the finite prefix where 3P 1 makes mistakes. In the infinite suffix thereafter, no two correct neighbors can begin eating simultaneously, because they can no longer suspect each other, and at most one of them can hold the shared fork. Next, we describe the variables and actions of the algorithm. The algorithmic pseudo-code is presented as an action system of guarded commands thereafter. 3.1 Algorithm Variables Our algorithm guarantees safety using forks and the local eventual strong accuracy of 3P 1. It guarantees wait-free progress using a dynamic ordering on process priorities, plus the local strong completeness of 3P 1. In addition to the oracle module itself, the following variables are associated with each process. The trivalent variable state i denotes the current dining phase: thinking, hungry, or eating. Each process also has a local integer-valued variable height i and a globally unique process identifier id i. Taken together as an ordered pair, (height i, id i ) determines the priority i of process i. As we describe later, priority will be used to resolve conflicts among hungry neighbors. Note that since process identifiers are unique, every pair of priorities P 1 and P 2 can be totally ordered lexicographically as follows: P 1 > P 2 = (P 1.height > P 2.height) or (P 1.height = P 2.height) (P 1.id > P 2.id) To implement the forks, we introduce two local variables for each pair of neighbors. For process i, we associate a boolean variable fork ij for each neighbor j. Symmetrically, each process j has a boolean variable fork ji corresponding to neighbor i. We interpret these variables as follows: fork ij is true exactly when process i holds the unique fork that it shares with neighbor j. Alternatively, fork ji is true exactly when j holds the fork. When the fork is in transit from one neighbor to the other, both local variables are false. Since the fork is unique and exclusive, it is never the case that both variables are true. In addition to the forks, we also introduce a request token for each pair of neighbors. In general, if a process holds the request token, but needs the fork, then it can request the missing fork by sending the request token to the corresponding neighbor. The tokens are implemented and interpreted the same as forks. For process i, we associate a unique boolean variable token ij for each neighbor j. Symmetrically, each process j has a boolean variable token ji corresponding to neighbor i. 3.2 Algorithm Actions A thinking process can become hungry at any time by executing Action 1 and selecting the corresponding alternative. While hungry, Action 2 is always enabled. When executed, it has the effect of requesting every missing fork for which no previous request is currently pending. This is achieved by sending the request token to the corresponding neighbor, including the current priority of the requesting process. As a result, the local token variable becomes false to indicate that a request has been sent. 6

8 process i, with unique identifier id i, and with local set of neighbors N(i) Code for process i var state i : {thinking, hungry, eating} initially, state i = thinking height i : integer initially, height i = 0 priority i : (height i process-id) initially, priority i = (0, id i ) fork ij : boolean, for each neighbor j initially, fork ij = (i > j) token ij : boolean, for each neighbor j initially, token ij = (i < j) 3P 1 : local eventually perfect detector 1 : {state i = thinking} Action 1 2 : state i := (thinking or hungry) Become Hungry 3 : {state i = hungry} Action 2 4 : j N(i) where (token ij fork ij ) do Request Missing Forks 5 : send-request priority i to j 6 : token ij := false 7 : {receive-request priority j from j N(i)} Action 3 8 : token ij := true Send Fork or 9 : if (state i = thinking (state i = hungry (priority i < priority j ))) Defer Request 10 : then send-fork i to j 11 : fork ij := false 12 : {receive-fork j from j N(i)} Action 4 13 : fork ij := true Obtain Shared Fork 14 : {state i = hungry ( j N(i) :: (fork ij j 3P 1 ))} Action 5 15 : state i := eating Enter Critical Section 16 : {state i = eating} Action 6 17 : Lower(priority i ) Exit Critical Section 18 : state i := thinking Send Deferred Forks 19 : j N(i) where (token ij fork ij ) do 20 : send-fork i to j 21 : fork ij := false 22 : procedure Lower (p : priority) Reduce Priority 23 : ensures p = Lower (p) where Integer Height Lowered 24 : (p.pid = p.pid) and (p < p) Process ID Unchanged Algorithm 1: Wait-Free Dining under Eventual Weak Exclusion 7

9 Any process that receives a request token via Action 3 decides whether or not to send the corresponding fork. If the recipient is thinking, or is hungry but lower priority than the requestor, then the fork must be sent immediately. Otherwise, the request is deferred until a later time specifically, until after eating. Eating processes always defer requests, as do higher-priority hungry neighbors. Deferred requests are represented by holding both the shared fork and the request token. Note that if a hungry process loses a requested fork to a higher-priority neighbor in Action 3, the relinquished fork will be re-requested by subsequently executing Action 2, which is always enabled while hungry. Action 4 simply receives forks, and Action 5 determines when a hungry process can transit to eating. A hungry process i can begin eating, if for each neighbor j, process i either holds the shared fork, or currently suspects j. This is the only action that utilizes the local oracle 3P 1 and it is central to the wait-freedom of the algorithm. In essence, suspicion serves as a local proxy for missing forks. Processes transit back to thinking by executing Action 6, which reduces priority and sends forks for any requests that were previously deferred while hungry or eating. This action invokes a local procedure called Lower that reduces the height component of priority by some positive integer. The magnitude of the reduction is up to the algorithm designer, and can be either statically fixed or dynamically chosen at runtime. Although much of the algorithm so far is fairly straightforward, this action masks a deeper subtlety that arises from an intrinsic limitation on local knowledge. In hygienic dining, a process must reduce its priority below that of all neighbors after eating. This absolute reduction forms the basis for progress, because diners cannot remain high priority after eating to continually overtake lower-priority hungry neighbors. Hygienic solutions typically encode the relative difference in priority in the forks. Since no process changes priority while thinking or hungry, and since every eating process holds all shared forks, it is easy to reduce priority absolutely after eating. In our algorithm, a process may eat without holding all shared forks if it suspects a neighbor while hungry. As such, the process may not know the priority of such neighbors after eating. Our proof of progress shows that reducing priority by an arbitrary amount is still sufficient for wait-freedom. 4 Proof of Correctness Safety: Algorithm 1 satisfies eventual weak exclusion 3WX. That is, for every execution there exists a time after which no two live neighbors eat simultaneously. Proof: The safety proof is by direct construction and depends only on the local eventually strong accuracy property of 3P 1. This property guarantees that for each run there exists a time t after which no correct process is suspected by any correct neighbor. Recall that t is unknown and may vary for each run. We observe that faulty processes cannot prevent safety from being established. Since faulty processes are live for only a finite prefix before crashing, they can eat simultaneously as live neighbors only finitely many times in any run. Consequently, we can restrict our focus to just the correct processes. Consider any execution α of Algorithm 1. Let t denote the time in α after which 3P 1 never suspects correct neighbors. Let p be any correct process that begins eating after time t. By Action 5, process p can only transit from hungry to eating if for each neighbor q, either p holds the shared fork or p suspects q. Since 3P 1 never suspects correct neighbors after time t in execution α, process p must hold every fork it shares with its correct neighbors in order to begin eating. So long as p remains eating, Actions 3 and 6 guarantee that p will defer all fork requests. Thus, p will not relinquish any forks while eating. Furthermore, 3P 1 has already converged in α, so no correct neighbor can suspect p either. Consequently, Action 5 remains disabled for every correct hungry neighbor of p until after p transits back to thinking. We conclude that no pair of correct neighbors can begin overlapping eating sessions after time t. 8

10 Progress: Algorithm 1 satisfies wait-free progress: every correct hungry process eventually eats. Proof: We prove progress by complete induction on the absolute depth of live processes in the total order of priorities. Recall that the set of live processes at time t is the set of correct processes plus the set of faulty processes that have not crashed by time t. Thus, for any fault pattern F over a set of process Π, we define live(t) = Π F (t). Next, we define the depth of a process p at time t as the number of processes in live(t) that have priorities greater than the priority of p. We define depth(p, t) formally as follows: depth(p, t) = { q live(t) : priority q > priority p : q} (1) (a) Conflict Graph (b) Depth Ordering Figure 1: Absolute depth of live processes in a conflict graph at time t. Figure 1 illustrates the depth function. Subfigure 1(a) shows a particular conflict graph with priorities for each process. Recall that priority is an ordered-pair consisting of an integer-valued height and a globally unique process identifier id. In this configuration, process F is crashed and all other processes are currently live. Subfigure 1(b) shows the depth ordering of the live processes. Note that the crashed process F is not included. Process D has the highest global priority, so its corresponding depth is 0. Although process C has a relative depth of 2 in the conflict graph, its absolute depth is 3 in the global depth ordering. Finally, process C is lower than process G, because although both have equal heights, ties are broken in favor of the higher process id; G has the greater priority (and hence lower depth) because G > C. Next, we define a measure for the priority distance between processes. For any two processes p and q, the function dist(p, q) measures the maximum number of times that p can eat while q is hungry, before the priority of p must become less than that of q. 9

11 0, if (prioirity p < priority q ) dist(p, q) = p.height q.height, if (prioirity p > priority q ) (p.id < q.id) p.height q.height + 1, if (prioirity p > priority q ) (p.id > q.id) (2) For example, in Figure 1 the priority distance between processes C and A is = 4. Note how this is value can be different than distances in the depth ordering, which in this case is only 1. Still, priority and depth are related. For example, suppose that depth(p, t) > depth(q, t) at some time t, and that dist(p, q) = k. Let t denote the time after which p eats for the k th time following t. Assuming that q does not eat during the interval [t, t ], we know that depth(p, t ) < depth(q, t ). In other words, p can overtake q at most k times before the depth of p must be deeper than the depth of q. We now prove that every correct hungry process eventually eats, by complete induction on the depth of live processes. Base Case: Suppose a correct process p is hungry at time t and that depth(p, t) = 0. By Action 3, hungry processes only relinquish forks to other higher-priority hungry neighbors. Since depth(p, t) = 0, process p has no higher-priority neighbors; consequently, p will keep every fork it either holds or collects until after it eats. By Action 5, p can only eat if for each of its δ neighbors, p either holds the shared fork or currently suspects that neighbor. Accordingly, let F orks p denote the set of neighbors for which p currently holds the shared fork, and let Suspects p denote the set of neighbors currently suspected by the local oracle 3P 1. This suggests the following progress metric M(p): M(p) = F orks p Suspects p (3) Note that M(p) is bounded above by δ, the total number of neighbors. Moreover, when M(p) = δ, Action 5 is enabled, which, upon execution allows p to eat and thereby satisfy progress. It remains to show that M(p) eventually reaches δ. We partition the neighbors of p into two disjoint sets: correct and faulty. Since p is hungry, Action 2 is enabled, which, upon execution will result in p sending requests for all of its missing forks. Since channels are assumed to be reliable, each fork request sent to a correct neighbor q will eventually be received, thereby causing q to execute Action 3. There are three possible outcomes: 1. If q is thinking, then q will send the shared fork immediately. 2. If q is hungry, then q still sends the shared fork immediately. By hypothesis, p has depth 0, so q must have lower priority. 3. If q is eating, then the fork request is temporarily deferred. As a correct process, however, q will send the shared fork after finite time when it executes Action 6 to stop eating. In all three cases, each correct neighbor immediately or eventually sends the requested fork back to p. Upon receiving each fork in Action 4, the value of M(p) increases by 1 or remains unchanged. The latter outcome occurs only when the corresponding neighbor is already suspected at p. Next, we consider the faulty neighbors of p. If a faulty neighbor q receives a fork request from p prior to crashing, then q sends the shared fork either immediately or (potentially) after eating, just like a correct process. In either case, the value of M(p) increases by 1 or remains unchanged when p receives the fork in Action 4. Alternatively, a faulty neighbor q might crash, either before receiving the fork request, or possibly while eating with the request deferred. At this point, we invoke the strong completeness property of 3P 1, which guarantees that every crashed neighbor is eventually and permanently suspected by p. Let t be the 10

12 unique time at which crashed neighbor q is permanently added to the suspect list at p. If p already holds the shared fork at this time, then M(p) remains unchanged; otherwise, the value of M(p) increases by 1. We conclude that p is guaranteed to collect every shared fork from each of its correct neighbors, and that p either collects the shared fork or permanently suspects each of its faulty neighbors. The union of these sets includes all δ neighbors of p. Consequently, M(p) eventually reaches the value δ after finite time, and so p is guaranteed to eat by Action 5. Remark 1: The base case actually proves that a correct hungry process at any depth will not wait indefinitely on its lower-priority neighbors. In the special case of depth 0, this is enough to establish progress, because all neighbors are lower priority. This result will be applied again in Case 2(a) of the inductive step, which the case where all higher-priority neighbors are correct. Remark 2: We observe that the metric M(p) can actually decrease while p is hungry. This is due to the eventual strong accuracy of 3P 1, which permits the oracle to wrongfully suspect live neighbors finitely many times during any execution. Suppose that 3P 1 suspects a live neighbor q while p is hungry, and that p does not currently hold the shared fork. If q is then removed from the suspect list, the value of M(p) decreases by 1. Is this a problem for progress? As demonstrated above, it is not. If the live neighbor q is actually correct, then p will eventually acquire the shared fork anyway. Alternatively, the live neighbor q is faulty, in which case it will eventually crash and become permanently suspected by p after finite time. In either case, the metric value of M(p) will increase by 1 and nullify the effect of the earlier decrease. To see this argument otherwise, we can introduce a submetric for the oracle itself. While p is hungry, the original metric M(p) can only decrease if its oracle 3P 1 makes a mistake. By eventual strong accuracy, however, 3P 1 only makes finitely many mistakes during any run. After each new mistake, the number of remaining mistakes decreases by 1 until some unknown time after which the oracle never suspects live processes. Thereafter, if p is still hungry, the value of M(p) only increases. Remark 3: The mistakes permitted by eventual strong accuracy may enable p to eat earlier than usual. For example, 3P 1 could initially suspect every neighbor, and thereby enable p to proceed directly to eating. Although such mistakes may have implications for safety, they cannot forestall progress. In particular, we note that with respect to 3P 1 the base case for progress depends only on the local strong completeness of this oracle. Inductive Hypothesis: Suppose for every time t 0, and for every depth d > 0, that every correct hungry process p with depth(p, t) < d eventually eats. We will show that any correct process q that is hungry at time t with depth(q, t) = d eventually eats, regardless of how many processes crash. Suppose that a correct process q is hungry at time t with depth(q, t) = d. In Algorithm 1, no process reduces its priority while hungry. Additionally, no process increases its priority at any time. As such, the depth of q will never become deeper than d so long as q remains hungry. Priority only changes when an eating process executes Action 6 to transit back to thinking. This is achieved by invoking the local procedure Lower(priority i ) on line 17, which reduces the height component of priority i by an arbitrary positive integer. Next, we partition the inductive step into two cases. In the first, some higher-priority live process is faulty. In the second, all higher-priority processes are correct. Case 1: Suppose there exists a live faulty process r Π with depth(r, t) < d at time t. By definition of the function depth, the priority of r must be greater than the priority of q. Since r is faulty, however, there is a finite time t > t at which r crashes. Crashed processes are irrelevant to depth, so the crash of r at time t actually causes the depth of q to rise, even though the priority of q remains unchanged. Consequently, we have depth(q, t ) < d at time t, and so by application of the inductive 11

13 hypothesis, we conclude that q eventually eats. It may be of interest that this case is the same whether or not r is actually a neighbor or q. Case 2: Suppose that every process in Π that is higher-priority than q at time t is correct. All future executions after time t can be divided into two subcases: futures in which every higherpriority process at time t eventually thinks forever, and futures in which some higher-priority process at time t becomes hungry infinitely often. Case 2(a) There exists a time t > t after which all such processes remain thinking forever. As mentioned earlier in Remark 2, the situation for q with respect to its lower-priority neighbors actually reduces to the base case. It remains to show that q will also acquire all of its high forks. Suppose that p is a correct higher-priority neighbor that thinks forever after time t and that q does not currently hold the shared fork. If the fork is already in transit from p to q, then it will arrive after finite time. Otherwise, Action 2 remains enabled while q is hungry, so the fork request has or will be sent to p. Since p thinks forever after time t, the fork request will be granted when p executes Action 3. Once the fork arrives at q it will never be relinquished again to p because p never becomes hungry in the future. Consequently, q will collect and hold the shared fork for each of its high neighbors, and will collect the shared fork or permanently suspect each of its low neighbors. This enables Action 5, so q eventually eats. Case 2(b) There exists a correct higher-priority process r that becomes hungry infinitely often. Recall that depth(q, t) = d at time t. Since r is correct and is higher priority than q at time t, we know that depth(r, t) < d. Every time that r becomes hungry with depth less than d, we can apply the inductive hypothesis to conclude that r eventually eats. Since r is a correct process, it eventually transits from eating back to thinking by executing Action 6. This action reduces the integer-valued height component of its priority by at least 1 every time it is executed. Suppose dist(r, q) = k at time t. Then r can eat at most k times while q remains hungry before the priority of r must be lower than that of q. This suggests the following progress metric for q: Let t i denote the i th time that r executes Action 6 after time t. Then at time t k, process r has reduced its priority k times, and must now be lower priority than q. Consequently, the depth of q at time t k must be less that d, so we can apply the inductive hypothesis to q to conclude that q eventually eats. 5 Conclusions This paper explores the possibility of wait-free dining philosophers in partially synchronous environments subject to process crash faults. Although dining is unsolvable under stronger safety constraints, we present a wait-free solution under the more flexible requirements of eventual weak exclusion 3WX. Our algorithm uses a local refinement of the eventually perfect failure detector 3P 1. This oracle can be implemented in sparse networks than are partitionable by crash faults. As such, our solution is also practical in the sense that it can scale to large networks. From a broader perspective, dining philosophers is essentially a scheduling problem subject to local, overlapping constraints on shared resource utilization. As such, wait-free dining under eventual weak exclusion defines a class of schedulers that is permitted to make finitely many scheduling mistakes. By way of motivation, we illustrated some potential applications of this safety model to duty-cycle scheduling for wireless sensor networks and daemon refinement for stabilizing algorithms. It remains to be seen how this model may be applied to these and other potential scenarios where resources can be recovered from crashed nodes and/or scheduling violations precipitate only transient repairable faults. Error is acceptable as long as we are young, but one must not drag it along into old age. Johann Wolfgang von Goethe, Art and Antiquity, III, 1 (1821) 12

14 References [1] E. Anceaume, A. Fernández, A. Mostéfaoui, G. Neiger, and M. Raynal. A necessary and sufficient condition for transforming limited accuracy failure detectors. J. Comput. Syst. Sci., 68(1): , [2] Joffroy Beauquier, Ajoy Kumar Datta, Maria Gradinariu, and Frédéric Magniette. Self-Stabilizing Local Mutual Exclusion and Daemon Refinement. Chicago J. Theor. Comput. Sci, 2002, [3] Joffroy Beauquier and Synnöve Kekkonen-Moneta. Fault-tolerance and self-stabilization: impossibility results and solutions using self-stabilizing failure detectors. International Journal of Systems Science, 28(11): , [4] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2): , [5] K. Mani Chandy and Jayadev Misra. Parallel Program Design: A Foundation. Addison-Wesley, Reading, MA, USA, [6] Flaviu Cristian. Understanding fault-tolerant distributed systems. Communications of the ACM, 34(2):56 78, [7] Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui, and Petr Kouznetsov. Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput., 65(4): , [8] Edsger W. Dijkstra. Hierarchical ordering of sequential processes. Acta Informatica, 1(2): , Oct Reprinted in Operating Systems Techniques, C.A.R. Hoare and R.H. Perrot, Eds., pp , Academic Press, An earlier version appeared as EWD310. [9] Shlomi Dolev. Self-Stabilization. MIT Press, [10] Cynthia Dwork, Nancy A. Lynch, and Larry Stockmeyer. Consensus in the Presence of Partial Synchrony. Journal of the ACM, 35(2): , April [11] Christof Fetzer, Ulrich Schmid, and Martin Susskraut. On the possibility of consensus in asynchronous systems with finite average response times. In 25th International Conference on Distributed Computing System, pages IEEE Computer Society, [12] Rachid Guerraoui and André Schiper. Γ Accurate Failure Detectors. In Özalp Babaoglu and Keith Marzullo, editors, Distributed Algorithms, 10th International Workshop, WDAG 96, volume 1151 of Lecture Notes in Computer Science, pages , Bologna, Italy, 9 11 October Springer. [13] Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1): , [14] Martin Hutle and Josef Widder. Self-stabilizing failure detector algorithms. In Thomas Fahringer and M. H. Hamza, editors, Parallel and Distributed Computing and Networks, pages IASTED/ACTA Press, [15] Nancy Lynch. Fast allocation of nearby resources in a distributed system. In Proceedings of the 12th ACM Symposium on Theory of Computing (STOC), pages 70 81, [16] Mikhail Nesterenko and Anish Arora. Stabilization-Preserving Atomicity Refinement. Journal of Parallel and Distributed Computing, 62(5): , May [17] Scott M. Pike and Paolo A.G. Sivilotti. Dining Philosophers with Crash Locality 1. In Proceedings of the 24th IEEE International Conference on Distributed Computing Systems (ICDCS), pages IEEE, [18] Michel Raynal and Frédéric Tronel. Restricted failure detectors: Definition and reduction protocols. Information Processing Letters, 72(3 4):91 97, November

Wait-Free Dining Under Eventual Weak Exclusion

Wait-Free Dining Under Eventual Weak Exclusion Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Srikanth Sastry Texas A&M University Department of Computer Science College Station, TX 77843-3112, USA {pike,yantao,sastry}@cs.tamu.edu

More information

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion Srikanth Sastry Computer Science and Engr Texas A&M University College Station, TX, USA sastry@cse.tamu.edu Scott M. Pike

More information

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS A Dissertation by YANTAO SONG Submitted to the Office of Graduate Studies of Texas A&M University in partial

More information

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 Hyun Chul Chung, Jennifer L. Welch Department of Computer Science & Engineering Texas A&M University

More information

Stabilizing Dining with Failure Locality 1

Stabilizing Dining with Failure Locality 1 Stabilizing Dining with Failure Locality 1 Hyun Chul Chung 1,, Srikanth Sastry 2,, and Jennifer L. Welch 1, 1 Texas A&M University, Department of Computer Science & Engineering {h0c8412,welch}@cse.tamu.edu

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

Figure 1. Solution to dining-philosophers with malicious crashes

Figure 1. Solution to dining-philosophers with malicious crashes Dining Philosophers that Tolerate Malicious Crashes Mikhail Nesterenko Λ Department of Computer Science, Kent State University, Kent, OH 44242, mikhail@cs.kent.edu Anish Arora y Dept. of Computer and Information

More information

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Xianbing Wang 1, Yong-Meng Teo 1,2, and Jiannong Cao 3 1 Singapore-MIT Alliance, 2 Department of Computer Science,

More information

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007 Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can

More information

Early consensus in an asynchronous system with a weak failure detector*

Early consensus in an asynchronous system with a weak failure detector* Distrib. Comput. (1997) 10: 149 157 Early consensus in an asynchronous system with a weak failure detector* André Schiper Ecole Polytechnique Fe dérale, De partement d Informatique, CH-1015 Lausanne, Switzerland

More information

Time Free Self-Stabilizing Local Failure Detection

Time Free Self-Stabilizing Local Failure Detection Research Report 33/2004, TU Wien, Institut für Technische Informatik July 6, 2004 Time Free Self-Stabilizing Local Failure Detection Martin Hutle and Josef Widder Embedded Computing Systems Group 182/2

More information

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced Primary Partition \Virtually-Synchronous Communication" harder than Consensus? Andre Schiper and Alain Sandoz Departement d'informatique Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne (Switzerland)

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

6.852: Distributed Algorithms Fall, Class 24

6.852: Distributed Algorithms Fall, Class 24 6.852: Distributed Algorithms Fall, 2009 Class 24 Today s plan Self-stabilization Self-stabilizing algorithms: Breadth-first spanning tree Mutual exclusion Composing self-stabilizing algorithms Making

More information

Asynchronous Models For Consensus

Asynchronous Models For Consensus Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms

More information

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Antonio FERNÁNDEZ Ernesto JIMÉNEZ Michel RAYNAL LADyR, GSyC, Universidad Rey Juan Carlos, 28933

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

A Realistic Look At Failure Detectors

A Realistic Look At Failure Detectors A Realistic Look At Failure Detectors C. Delporte-Gallet, H. Fauconnier, R. Guerraoui Laboratoire d Informatique Algorithmique: Fondements et Applications, Université Paris VII - Denis Diderot Distributed

More information

Shared Memory vs Message Passing

Shared Memory vs Message Passing Shared Memory vs Message Passing Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Revised: 15 February 2004 Abstract This paper determines the computational strength of the shared memory abstraction

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

Easy Consensus Algorithms for the Crash-Recovery Model

Easy Consensus Algorithms for the Crash-Recovery Model Reihe Informatik. TR-2008-002 Easy Consensus Algorithms for the Crash-Recovery Model Felix C. Freiling, Christian Lambertz, and Mila Majster-Cederbaum Department of Computer Science, University of Mannheim,

More information

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Technical Report Department for Mathematics and Computer Science University of Mannheim TR-2006-008 Felix C. Freiling

More information

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052

More information

Approximation of δ-timeliness

Approximation of δ-timeliness Approximation of δ-timeliness Carole Delporte-Gallet 1, Stéphane Devismes 2, and Hugues Fauconnier 1 1 Université Paris Diderot, LIAFA {Carole.Delporte,Hugues.Fauconnier}@liafa.jussieu.fr 2 Université

More information

Eventually consistent failure detectors

Eventually consistent failure detectors J. Parallel Distrib. Comput. 65 (2005) 361 373 www.elsevier.com/locate/jpdc Eventually consistent failure detectors Mikel Larrea a,, Antonio Fernández b, Sergio Arévalo b a Departamento de Arquitectura

More information

Genuine atomic multicast in asynchronous distributed systems

Genuine atomic multicast in asynchronous distributed systems Theoretical Computer Science 254 (2001) 297 316 www.elsevier.com/locate/tcs Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper Departement d Informatique, Ecole

More information

An Asynchronous Message-Passing Distributed Algorithm for the Generalized Local Critical Section Problem

An Asynchronous Message-Passing Distributed Algorithm for the Generalized Local Critical Section Problem algorithms Article An Asynchronous Message-Passing Distributed Algorithm for the Generalized Local Critical Section Problem Sayaka Kamei 1, * and Hirotsugu Kakugawa 2 1 Graduate School of Engineering,

More information

On Stabilizing Departures in Overlay Networks

On Stabilizing Departures in Overlay Networks On Stabilizing Departures in Overlay Networks Dianne Foreback 1, Andreas Koutsopoulos 2, Mikhail Nesterenko 1, Christian Scheideler 2, and Thim Strothmann 2 1 Kent State University 2 University of Paderborn

More information

On the weakest failure detector ever

On the weakest failure detector ever On the weakest failure detector ever The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Guerraoui, Rachid

More information

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Xianbing Wang, Yong-Meng Teo, and Jiannong Cao Singapore-MIT Alliance E4-04-10, 4 Engineering Drive 3, Singapore 117576 Abstract

More information

Self-Stabilizing Byzantine Asynchronous Unison

Self-Stabilizing Byzantine Asynchronous Unison Self-Stabilizing Byzantine Asynchronous Unison Swan Dubois 1,3 Maria Gradinariu Potop-Butucaru 1,4 Mikhail Nesterenko 2,5 Sébastien Tixeuil 1,6 Abstract We explore asynchronous unison in the presence of

More information

Weakening Failure Detectors for k-set Agreement via the Partition Approach

Weakening Failure Detectors for k-set Agreement via the Partition Approach Weakening Failure Detectors for k-set Agreement via the Partition Approach Wei Chen 1, Jialin Zhang 2, Yu Chen 1, Xuezheng Liu 1 1 Microsoft Research Asia {weic, ychen, xueliu}@microsoft.com 2 Center for

More information

Failure Detection and Consensus in the Crash-Recovery Model

Failure Detection and Consensus in the Crash-Recovery Model Failure Detection and Consensus in the Crash-Recovery Model Marcos Kawazoe Aguilera Wei Chen Sam Toueg Department of Computer Science Upson Hall, Cornell University Ithaca, NY 14853-7501, USA. aguilera,weichen,sam@cs.cornell.edu

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information

Consensus. Consensus problems

Consensus. Consensus problems Consensus problems 8 all correct computers controlling a spaceship should decide to proceed with landing, or all of them should decide to abort (after each has proposed one action or the other) 8 in an

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Petri nets. s 1 s 2. s 3 s 4. directed arcs.

Petri nets. s 1 s 2. s 3 s 4. directed arcs. Petri nets Petri nets Petri nets are a basic model of parallel and distributed systems (named after Carl Adam Petri). The basic idea is to describe state changes in a system with transitions. @ @R s 1

More information

Impossibility of Distributed Consensus with One Faulty Process

Impossibility of Distributed Consensus with One Faulty Process Impossibility of Distributed Consensus with One Faulty Process Journal of the ACM 32(2):374-382, April 1985. MJ Fischer, NA Lynch, MS Peterson. Won the 2002 Dijkstra Award (for influential paper in distributed

More information

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Michel RAYNAL, Julien STAINER Institut Universitaire de France IRISA, Université de Rennes, France Message adversaries

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

Crash-resilient Time-free Eventual Leadership

Crash-resilient Time-free Eventual Leadership Crash-resilient Time-free Eventual Leadership Achour MOSTEFAOUI Michel RAYNAL Corentin TRAVERS IRISA, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France {achour raynal travers}@irisa.fr

More information

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

The Heard-Of Model: Computing in Distributed Systems with Benign Failures The Heard-Of Model: Computing in Distributed Systems with Benign Failures Bernadette Charron-Bost Ecole polytechnique, France André Schiper EPFL, Switzerland Abstract Problems in fault-tolerant distributed

More information

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Viveck R. Cadambe EE Department, Pennsylvania State University, University Park, PA, USA viveck@engr.psu.edu Nancy Lynch

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Atomic m-register operations

Atomic m-register operations Atomic m-register operations Michael Merritt Gadi Taubenfeld December 15, 1993 Abstract We investigate systems where it is possible to access several shared registers in one atomic step. We characterize

More information

Combining Shared Coin Algorithms

Combining Shared Coin Algorithms Combining Shared Coin Algorithms James Aspnes Hagit Attiya Keren Censor Abstract This paper shows that shared coin algorithms can be combined to optimize several complexity measures, even in the presence

More information

Do we have a quorum?

Do we have a quorum? Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register

More information

The Underlying Semantics of Transition Systems

The Underlying Semantics of Transition Systems The Underlying Semantics of Transition Systems J. M. Crawford D. M. Goldschlag Technical Report 17 December 1987 Computational Logic Inc. 1717 W. 6th St. Suite 290 Austin, Texas 78703 (512) 322-9951 1

More information

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x Failure Detectors Seif Haridi haridi@kth.se 1 Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to detect failures Heartbeats, timeouts,

More information

On the weakest failure detector ever

On the weakest failure detector ever Distrib. Comput. (2009) 21:353 366 DOI 10.1007/s00446-009-0079-3 On the weakest failure detector ever Rachid Guerraoui Maurice Herlihy Petr Kuznetsov Nancy Lynch Calvin Newport Received: 24 August 2007

More information

Model Checking of Fault-Tolerant Distributed Algorithms

Model Checking of Fault-Tolerant Distributed Algorithms Model Checking of Fault-Tolerant Distributed Algorithms Part I: Fault-Tolerant Distributed Algorithms Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef Widder LOVE 2016 @ TU Wien Josef Widder (TU

More information

Failure detection and consensus in the crash-recovery model

Failure detection and consensus in the crash-recovery model Distrib. Comput. (2000) 13: 99 125 c Springer-Verlag 2000 Failure detection and consensus in the crash-recovery model Marcos Kawazoe Aguilera 1, Wei Chen 2, Sam Toueg 1 1 Department of Computer Science,

More information

arxiv: v2 [cs.dc] 18 Feb 2015

arxiv: v2 [cs.dc] 18 Feb 2015 Consensus using Asynchronous Failure Detectors Nancy Lynch CSAIL, MIT Srikanth Sastry CSAIL, MIT arxiv:1502.02538v2 [cs.dc] 18 Feb 2015 Abstract The FLP result shows that crash-tolerant consensus is impossible

More information

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence

A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence A Self-Stabilizing Minimal Dominating Set Algorithm with Safe Convergence Hirotsugu Kakugawa and Toshimitsu Masuzawa Department of Computer Science Graduate School of Information Science and Technology

More information

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at MAD Models & Algorithms for Distributed systems -- /5 -- download slides at http://people.rennes.inria.fr/eric.fabre/ 1 Today Runs/executions of a distributed system are partial orders of events We introduce

More information

Consistent Fixed Points and Negative Gain

Consistent Fixed Points and Negative Gain 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies Consistent Fixed Points and Negative Gain H. B. Acharya The University of Texas at Austin acharya @ cs.utexas.edu

More information

Asynchronous Leasing

Asynchronous Leasing Asynchronous Leasing Romain Boichat Partha Dutta Rachid Guerraoui Distributed Programming Laboratory Swiss Federal Institute of Technology in Lausanne Abstract Leasing is a very effective way to improve

More information

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS I R I P U B L I C A T I O N I N T E R N E N o 1599 S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS ROY FRIEDMAN, ACHOUR MOSTEFAOUI,

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Distributed Consensus

Distributed Consensus Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement

More information

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems ACM SIGACT News Distributed Computing Column 17 Sergio Rajsbaum Abstract The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Bilateral Proofs of Safety and Progress Properties of Concurrent Programs (Working Draft)

Bilateral Proofs of Safety and Progress Properties of Concurrent Programs (Working Draft) Bilateral Proofs of Safety and Progress Properties of Concurrent Programs (Working Draft) Jayadev Misra December 18, 2015 Contents 1 Introduction 3 2 Program and Execution Model 4 2.1 Program Structure..........................

More information

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model?

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model? Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model? Jaap-Henk Hoepman Department of Computer Science, University of Twente, the Netherlands hoepman@cs.utwente.nl

More information

Self-Similar Algorithms for Dynamic Distributed Systems

Self-Similar Algorithms for Dynamic Distributed Systems Self-Similar Algorithms for Dynamic Distributed Systems K. Mani Chandy and Michel Charpentier Computer Science Department, California Institute of Technology, Mail Stop 256-80 Pasadena, California 91125

More information

The Weighted Byzantine Agreement Problem

The Weighted Byzantine Agreement Problem The Weighted Byzantine Agreement Problem Vijay K. Garg and John Bridgman Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712-1084, USA garg@ece.utexas.edu,

More information

arxiv: v1 [cs.dc] 3 Oct 2011

arxiv: v1 [cs.dc] 3 Oct 2011 A Taxonomy of aemons in Self-Stabilization Swan ubois Sébastien Tixeuil arxiv:1110.0334v1 cs.c] 3 Oct 2011 Abstract We survey existing scheduling hypotheses made in the literature in self-stabilization,

More information

Correctness of Concurrent Programs

Correctness of Concurrent Programs Correctness of Concurrent Programs Trifon Ruskov ruskov@tu-varna.acad.bg Technical University of Varna - Bulgaria Correctness of Concurrent Programs Correctness of concurrent programs needs to be formalized:

More information

Verifying Randomized Distributed Algorithms with PRISM

Verifying Randomized Distributed Algorithms with PRISM Verifying Randomized Distributed Algorithms with PRISM Marta Kwiatkowska, Gethin Norman, and David Parker University of Birmingham, Birmingham B15 2TT, United Kingdom {M.Z.Kwiatkowska,G.Norman,D.A.Parker}@cs.bham.ac.uk

More information

Byzantine agreement with homonyms

Byzantine agreement with homonyms Distrib. Comput. (013) 6:31 340 DOI 10.1007/s00446-013-0190-3 Byzantine agreement with homonyms Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Anne-Marie Kermarrec Eric Ruppert Hung Tran-The

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

CSC501 Operating Systems Principles. Deadlock

CSC501 Operating Systems Principles. Deadlock CSC501 Operating Systems Principles Deadlock 1 Last Lecture q Priority Inversion Q Priority Inheritance Protocol q Today Q Deadlock 2 The Deadlock Problem q Definition Q A set of blocked processes each

More information

Generalized Consensus and Paxos

Generalized Consensus and Paxos Generalized Consensus and Paxos Leslie Lamport 3 March 2004 revised 15 March 2005 corrected 28 April 2005 Microsoft Research Technical Report MSR-TR-2005-33 Abstract Theoretician s Abstract Consensus has

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra I.B.M Thomas J. Watson Research Center, Hawthorne, New York and Sam Toueg Cornell University, Ithaca, New York We introduce

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Ordering events. Lamport and vector clocks. Global states. Detecting failures. Required reading for this topic } Leslie Lamport,"Time, Clocks, and the Ordering

More information

Randomized Protocols for Asynchronous Consensus

Randomized Protocols for Asynchronous Consensus Randomized Protocols for Asynchronous Consensus Alessandro Panconesi DSI - La Sapienza via Salaria 113, piano III 00198 Roma, Italy One of the central problems in the Theory of (feasible) Computation is

More information

Maximum Metric Spanning Tree made Byzantine Tolerant

Maximum Metric Spanning Tree made Byzantine Tolerant Algorithmica manuscript No. (will be inserted by the editor) Maximum Metric Spanning Tree made Byzantine Tolerant Swan Dubois Toshimitsu Masuzawa Sébastien Tixeuil Received: date / Accepted: date Abstract

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

Gradient Clock Synchronization

Gradient Clock Synchronization Noname manuscript No. (will be inserted by the editor) Rui Fan Nancy Lynch Gradient Clock Synchronization the date of receipt and acceptance should be inserted later Abstract We introduce the distributed

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont Benchmarking Model Checkers with Distributed Algorithms Étienne Coulouma-Dupont November 24, 2011 Introduction The Consensus Problem Consensus : application Paxos LastVoting Hypothesis The Algorithm Analysis

More information

Self-Stabilizing Silent Disjunction in an Anonymous Network

Self-Stabilizing Silent Disjunction in an Anonymous Network Self-Stabilizing Silent Disjunction in an Anonymous Network Ajoy K. Datta 1, Stéphane Devismes 2, and Lawrence L. Larmore 1 1 Department of Computer Science, University of Nevada Las Vegas, USA 2 VERIMAG

More information

Consensus and Universal Construction"

Consensus and Universal Construction Consensus and Universal Construction INF346, 2015 So far Shared-memory communication: safe bits => multi-valued atomic registers atomic registers => atomic/immediate snapshot 2 Today Reaching agreement

More information

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George) Time Lakshmi Ganesh (slides borrowed from Maya Haridasan, Michael George) The Problem Given a collection of processes that can... only communicate with significant latency only measure time intervals approximately

More information

Consensus when failstop doesn't hold

Consensus when failstop doesn't hold Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has

More information

6.852: Distributed Algorithms Fall, Class 10

6.852: Distributed Algorithms Fall, Class 10 6.852: Distributed Algorithms Fall, 2009 Class 10 Today s plan Simulating synchronous algorithms in asynchronous networks Synchronizers Lower bound for global synchronization Reading: Chapter 16 Next:

More information

Replication predicates for dependent-failure algorithms

Replication predicates for dependent-failure algorithms Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo Department of Computer Science and Engineering University of California, San Diego La Jolla, CA USA {flavio,

More information

Time Synchronization

Time Synchronization Massachusetts Institute of Technology Lecture 7 6.895: Advanced Distributed Algorithms March 6, 2006 Professor Nancy Lynch Time Synchronization Readings: Fan, Lynch. Gradient clock synchronization Attiya,

More information

A simple proof of the necessity of the failure detector Σ to implement a register in asynchronous message-passing systems

A simple proof of the necessity of the failure detector Σ to implement a register in asynchronous message-passing systems A simple proof of the necessity of the failure detector Σ to implement a register in asynchronous message-passing systems François Bonnet, Michel Raynal To cite this version: François Bonnet, Michel Raynal.

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) Week 1: Introduction, Logical clocks, Snapshots Dr. Borzoo Bonakdarpour Department of Computing and Software McMaster University Dr. Borzoo Bonakdarpour Distributed Algorithms

More information

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Martin Hutle martin.hutle@epfl.ch André Schiper andre.schiper@epfl.ch École Polytechnique Fédérale de Lausanne

More information

A Self-Stabilizing Distributed Approximation Algorithm for the Minimum Connected Dominating Set

A Self-Stabilizing Distributed Approximation Algorithm for the Minimum Connected Dominating Set A Self-Stabilizing Distributed Approximation Algorithm for the Minimum Connected Dominating Set Sayaka Kamei and Hirotsugu Kakugawa Tottori University of Environmental Studies Osaka University Dept. of

More information

How to solve consensus in the smallest window of synchrony

How to solve consensus in the smallest window of synchrony How to solve consensus in the smallest window of synchrony Dan Alistarh 1, Seth Gilbert 1, Rachid Guerraoui 1, and Corentin Travers 2 1 EPFL LPD, Bat INR 310, Station 14, 1015 Lausanne, Switzerland 2 Universidad

More information