Wait-Free Dining Under Eventual Weak Exclusion

Size: px

Start display at page:

Download "Wait-Free Dining Under Eventual Weak Exclusion"

Benjamin Jones
5 years ago
Views:

1 Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Kaustav Ghoshal Texas A&M University Department of Computer Science College Station, TX , USA {pike, yantao, Technical Report: TAMU-CS-TR

2 Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Kaustav Ghoshal Texas A&M University Department of Computer Science College Station, TX , USA Abstract We explore dining philosophers under eventual weak exclusion in environments subject to permanent crash faults. Eventual weak exclusion permits neighboring diners to eat concurrently only finitely many times, but requires that, for each run, there exists a (potentially unknown) time after which live neighbors never eat simultaneously. This safety property models systems where resources can be recovered from crashed processes or where sharing violations precipitate only transient faults. Although dining under eventual weak exclusion can be solved in synchronous environments, the problem is unsolvable in asynchronous systems, where crash faults can precipitate permanent starvation failures among other diners. We present the first wait-free solution to this problem under partial synchrony, along with a careful proof of correctness. Our solution is related to the hygienic algorithm of Chandy and Misra insofar as we use forks for safety and a dynamic ordering of process priorities for progress. Additionally, we use a local refinement of the eventually perfect failure detector 3P 1 to guarantee wait-freedom in the presence of crash faults. This localized oracle provides information only about immediate neighbors, and, as such, is fundamental to the scalability of our approach, since it is implementable in sparse communication graphs that are partitionable by crash faults. Our results have potential applications to duty-cycle scheduling in sensor networks, as well as distributed daemon refinement for self-stabilizing algorithms. Keywords: Dining Philosophers, Failure Detectors, Fault Tolerance, Wait-Freedom, Stabilization Technical Report: TAMU-CS-TR , Texas A&M University, May This work was supported by the Advanced Research Program of the Texas Higher Education Coordinating Board under Project Number

3 1 Introduction Distributed processes require access to a variety of run-time resources such as memory, bandwidth, ports, and CPU cycles. A resource is a generic concept which can denote any entity necessary for computation or communication. Resources can be physical (e.g., routers, wireless access points), logical (e.g., time slots, channels), or virtual (e.g., file handles, virtual machines). Resources are often finite in supply and non-trivial in cost. In distributed environments, demand for resource utilization often exceeds supply, thereby creating a scarcity of resources which must be shared or allocated among competing processes. Problem Statement. We consider the dining philosophers problem [8, 15] in environments subject to permanent crash faults, and explore its solvability in the presence of unreliable failure detectors [4]. Dining is a fundamental model of static resource allocation, where processes in a distributed system require periodic access to a fixed subset of mutually exclusive resources. Processes with overlapping resource needs are connected as neighbors in an abstract conflict graph. Additionally, each diner is either thinking, hungry, or eating. These states correspond to three basic phases of computation: executing independently, requesting shared resources, and utilizing resources in a critical section, respectively. The progress specification for dining requires that every hungry process eventually eats. Various forms of the safety specification restrict concurrent eating among neighbors. The traditional safety constraint of strong exclusion requires that no two neighbors eat simultaneously. Strong exclusion is trivially unsolvable in faulty environments, because if any diner crashes while eating, all adjacent hungry neighbors must starve; otherwise, they would violate safety. This result is independent of whether crashes can be detected reliably. As such, strong exclusion is best suited only for resources that can be permanently corrupted by crashes. The more flexible model of weak exclusion requires only that no two live neighbors eat simultaneously. In particular, live diners can eat simultaneously with crashed neighbors. Recent results on fault-tolerant mutual exclusion indicate that this variation of dining is solvable in environments with Trusting oracles, a relatively powerful class of failure detectors that can reliably detect certain crashes [7]. Weak exclusion remains unsolvable, however, when considering less powerful detectors that are implementable in weaker models of partial synchrony. For example, with only an eventually perfect failure detector 3P, neighbors of a crashed process can still starve [17]. Primary Results. We explore dining under the safety constraint of eventual weak exclusion, and show that it is solvable using the oracle 3P in asynchronous environments subject to arbitrarily many crashes. Eventual weak exclusion, 3WX, permits neighboring diners to eat concurrently only finitely many times. More specifically, 3WX requires that, for each run, there exists a time after which no two live neighbors eat simultaneously. The time to convergence may be unknown, and it can vary from run to run. Still, this model of safety is sufficiently powerful to serve as a scheduling primitive for several practical applications. In particular, 3WX models systems where resources can be recovered from crashed processes, and/or where sharing violations precipitate only transient faults. We sketch some relevant examples in Section 2. Our approach is based on the eventually perfect failure detector 3P from the original hierarchy of Chandra-Toueg oracles [4]. Detectors in this class always suspect crashed processes, and eventually stop suspecting correct processes. As originally defined in [4], the scope of 3P is global, insofar as it provides information about every process in the system. One drawback of global oracles is that the communication overhead limits their practicality for large-scale networks. As such, other researchers have investigated scope-restricted variations of Chandra-Toueg detectors which only require information from or about a subset of processes [1, 12, 18]. Our dining solution uses a variant of 3P defined in [3, 14] for which 2

4 information is only required about immediate neighbors. This local refinement, 3P 1, can be implemented efficiently in sparse, large-scale, and even partitionable networks. As such, it underscores the practicality of our solution as a scalable approach to fault-tolerant schedulers. Significance. Dining is essentially a scheduling problem subject to local, overlapping constraints on shared resource utilization. As such, dining under eventual weak exclusion defines a class of schedulers that are permitted to make finitely many scheduling mistakes. Our solution is wait-free [13] insofar as no correct process will starve (i.e., cease to make progress) even if some or all other processes fault by crashing. In the presence of crash faults under partial synchrony, our work demonstrates a trade-off between schedulers that make finitely many transient mistakes (by scheduling live neighbors to eat), versus schedulers that make permanent mistakes (by never scheduling correct processes to eat in the wake of crash faults). On a practical level, our results provide a new technique for isolating the impact of crash faults. These results are demonstrated by presenting a wait-free dining algorithm and a careful proof of correctness. 2 Background and Technical Framework The dining philosophers problem is a classic model of static resource allocation in distributed systems. Originally proposed by Dijkstra for a ring topology [8], it was later generalized by Lynch for overlapping mutual exclusion problems on arbitrary graphs [15]. A dining instance is modeled by an undirected conflict graph DP = (Π, E), where each vertex p Π represents a diner, and each edge (p, q) E represents a set of resources shared between neighbors p and q. Each diner is either thinking, hungry, or eating. These abstract states correspond to three basic phases of an ordinary process: executing independently, requesting shared resources, and utilizing shared resources in a critical section, respectively. Initially, every process is thinking. Although processes may think forever, they are also permitted to become hungry at any time. By contrast, eating is always finite (but not necessarily bounded). Hungry neighbors are said to be in conflict, because they compete for shared but mutually exclusive resources. A correct solution to wait-free dining under eventual weak exclusion (3WX ) is a conflict-resolution algorithm that schedules diner transitions from hungry to eating, subject to the following two requirements: Safety: Progress: For every run, there exists a time after which no two live neighbors eat simultaneously. Every correct hungry process eventually eats, regardless of how many processes crash. The progress requirement guarantees fairness among hungry neighbors. In particular, dining solutions are not permitted to starve hungry processes by never scheduling them to eat. This property is also known elsewhere as starvation freedom and no lockout. In the presence of arbitrarily many crash faults, a dining algorithm that satisfies progress is called wait-free [13]. The safety requirement of eventual weak exclusion, 3WX, permits finitely many scheduling mistakes. A mistake occurs when two live neighbors are scheduled to eat simultaneously. We motivate two scenarios for which this model of exclusion is useful. Example 1: Consider a wireless network of sensors that must cover a given surveillance environment. Such sensors typically have finite power supplies; as such, every node will eventually crash due to battery depletion. Ideally, the life-span of the network should greatly exceed the life-span of its constituent nodes. With node redundancy, sensors can coordinate their activities so that only a subset are required to cover the environment during so-called active duty cycles, while the remainder can sleep to conserve energy. 3

5 In this scenario, the resources being shared are time slices, and the eating nodes are on duty. For stronger models of exclusion, the crash of a single sensor may precipitate starvation among its neighbors, leading to a loss of coverage, because nearby sensors are no longer scheduled to go on duty. By contrast, wait-free eventual weak exclusion can maintain coverage, but may schedule redundant neighbors for duty, even when fewer could have sufficed. This may have implications for energy conservation in the short run, but it does not affect the correctness of surveillance coverage. Since 3WX permits only finitely many scheduling mistakes, the dining algorithm eventually converges to a more energy-efficient long-term duty schedule without compromising short-term coverage. Example 2: Self-stabilization is a non-masking technique for tolerating transient faults [9]. A stabilizing algorithm automatically converges from any configuration to a closed set of safe states, and then continues ordinary execution thereafter. Stabilizing algorithms are often easier to design under interleaving semantics via a centralized daemon, for which at most one process is activated to take a step at any given time. In practice, however, many stabilizing algorithms need to execute correctly under parallel semantics as well. This can often be achieved via automatic model conversions, say, from read/write atomicity to message passing. An underlying distributed daemon schedules consistent sets of processes to take steps concurrently. Many existing transformations [2, 16] use dining algorithms to implement such daemons. Although such approaches have considered daemons that tolerate transient faults (such as data corruption), they do not address the need to provide wait-free scheduling guarantees in the presence of permanent crashes. In this scenario, diners become hungry in order to execute subsequent actions of the stabilization protocol. If crash faults precipitate starvation on the dining layer, however, correct processes will fail to take infinitely many steps a key requirement for convergence in stabilization. By contrast, wait-free dining under eventual weak exclusion guarantees that correct nodes will be scheduled infinitely often, regardless of crashes. This advantage comes at the expense of finitely many scheduling errors which may precipitate transient faults such as sharing violations. After the final scheduling mistake of any run, the stabilizing protocol may be left in an arbitrary configuration. But since stabilization guarantees convergence, the application will eventually recover to a safe state and then continue execution thereafter without further scheduling errors. Computational Model. We consider asynchronous environments where message delay, clock drift, and relative process speeds are unbounded. A system is modeled by a set of n distributed processes Π = {p 1, p 2,..., p n } that communicate only by asynchronous message passing. We assume that the communication graph is the same as the underlying conflict graph. Each pair of processes in the conflict graph is connected by a reliable channel, such that every message sent to a correct process is eventually received by that process, and messages are neither lost, duplicated, nor corrupted. Additionally, we posit a discrete global clock T, with the set of natural numbers IN as its range of clock ticks. T is merely a conceptual device to simplify our presentation; since T is inaccessible to processes in Π, it cannot be used to advantage by any algorithm executed by the system. Fault Patterns. Processes may fault only by crashing. Processes cannot crash at will, but only as the result of a crash fault, which occurs when a process ceases execution (without warning) and never recovers [6]. A fault pattern F models the occurrence of crash faults in a given run. Specifically, F is a function from the global time range T to the powerset of processes 2 Π, where F (t) denotes the subset of processes that have crashed by time t. Since crash faults are permanent, F is monotonically non-decreasing. We say that p is faulty in F if p F (t) for some time t; otherwise, we say that p is correct in F. Additionally, a process p is live at time t if p has not crashed by time t. That is, p / F (t). Thus, correct processes are always live, and faulty processes are live only prior to crashing. 4

6 Failure Detectors. An unreliable failure detector can be viewed as a distributed oracle that can be queried for (possibly incorrect) information about crashes in Π. Each process has access to its own local detector module that outputs the set of processes currently suspected of having crashed. Unreliable failure detectors are characterized by the kinds of mistakes they can make. Mistakes can include false-negatives (i.e., not suspecting a crashed process), as well as false-positives (i.e., wrongfully suspecting a correct process). In Chandra and Toueg s original definition [4], each class is defined by two properties: completeness and accuracy. Completeness restricts false negatives, while accuracy restricts false positives. Eventually perfect failure detectors (denoted by the class 3P) are defined by strong completeness and eventual strong accuracy. Informally, 3P detectors always suspect crashed processes, and eventually stop suspecting correct processes. Our algorithm uses a locally scope-restricted refinement of this detector class called 3P 1. It satisfies the foregoing properties, but only with respect to immediate neighbors [3, 14]: Local Strong Completeness: Every crashed process is eventually and permanently suspected by all correct neighbors. Local Eventual Accuracy: For every run, there exists a time after which no correct process is suspected by any correct neighbor. Thus, 3P 1 may commit an arbitrary number of false-positive mistakes by suspecting correct neighbors at most finitely many times during the computational prefix of any run. After some point, however, 3P 1 detectors converge to being well-founded, after which they provide reliable information about neighbors. Although the time to convergence is not known and may vary from run to run, this detector is still sufficiently powerful to mask crash faults under eventual weak exclusion, as we present in the next section. Additionally, 3P 1 can be implemented in many practical models of partial synchrony by using adaptive time-out mechanisms to estimate and eventually converge to the de facto, but unknown, timing bounds on parameters such as message delay and relative process speed [4, 10, 11] 3 A Wait-Free Dining Algorithm Under 3WX Our solution is related to the classic hygienic dining algorithm of Chandy and Misra [5]. In hygienic dining, a unique fork is associated with each edge in the conflict graph. Processes connected by a conflict edge share the corresponding fork, which is used to resolve conflicts over the overlapping set of resources they both need. In order to eat, a hungry process must collect and hold all of its shared forks. This provides a simple basis for safety, since at most one neighbor can hold a given fork at any time. We emphasize that the forks are not the resources. This popular misconception results from drawing conclusions about the dining metaphor that do not apply to the actual problem. In particular, it leads to the incorrect conclusion that each resource can be shared by at most two processes. Note that forks are not even mentioned in the problem specification; they are simply introduced as an implementation strategy to guarantee safety. For clarification, each resource can be shared by arbitrarily many processes, and will form a unique clique in the conflict graph. Neighbors that share many resources in common will be in several conflict cliques together, but will share only one fork to resolve their overlapping set of conflicts. In the original hygienic solution, any process that crashes while holding forks will cause its corresponding hungry neighbors to starve, because the forks necessary for eating cannot be acquired. In our solution, suspicion from 3P 1 serves as a proxy for permanently missing forks. Any process that crashes while holding a fork will never send that fork to its requesting neighbor. As such, the neighbor will not be able to eat unless a proxy for the missing fork can be supplied. The oracle 3P 1 serves exactly this purpose. The 5

7 local strong completeness property guarantees that any crashed process will be eventually and permanently suspected by all correct neighbors. As such, these neighbors will be able to use suspicion in place of the missing fork to proceed to eating. Unfortunately, the local eventually strong accuracy of 3P 1 also allows the oracle to make mistakes. In particular, it can wrongfully suspect live neighbors finitely many times during any run. This creates a potential safety hazard, since two neighbors which suspect each other may both proceed to eating, regardless of which neighbor actually holds the shared fork. These violations of mutual exclusion, however, can only occur during the finite prefix where 3P 1 makes mistakes. In the infinite suffix thereafter, no two correct neighbors can begin eating simultaneously, because they can no longer suspect each other, and at most one of them can hold the shared fork. Next, we describe the variables and actions of the algorithm. The algorithmic pseudo-code is presented as an action system of guarded commands thereafter. 3.1 Algorithm Variables Our algorithm guarantees safety using forks and the local eventual strong accuracy of 3P 1. It guarantees wait-free progress using a dynamic ordering on process priorities, plus the local strong completeness of 3P 1. In addition to the oracle module itself, the following variables are associated with each process. The trivalent variable state i denotes the current dining phase: thinking, hungry, or eating. Each process also has a local integer-valued variable height i and a globally unique process identifier id i. Taken together as an ordered pair, (height i, id i ) determines the priority i of process i. As we describe later, priority will be used to resolve conflicts among hungry neighbors. Note that since process identifiers are unique, every pair of priorities P 1 and P 2 can be totally ordered lexicographically as follows: P 1 > P 2 = (P 1.height > P 2.height) or (P 1.height = P 2.height) (P 1.id > P 2.id) To implement the forks, we introduce two local variables for each pair of neighbors. For process i, we associate a boolean variable fork ij for each neighbor j. Symmetrically, each process j has a boolean variable fork ji corresponding to neighbor i. We interpret these variables as follows: fork ij is true exactly when process i holds the unique fork that it shares with neighbor j. Alternatively, fork ji is true exactly when j holds the fork. When the fork is in transit from one neighbor to the other, both local variables are false. Since the fork is unique and exclusive, it is never the case that both variables are true. In addition to the forks, we also introduce a request token for each pair of neighbors. In general, if a process holds the request token, but needs the fork, then it can request the missing fork by sending the request token to the corresponding neighbor. The tokens are implemented and interpreted the same as forks. For process i, we associate a unique boolean variable token ij for each neighbor j. Symmetrically, each process j has a boolean variable token ji corresponding to neighbor i. 3.2 Algorithm Actions A thinking process can become hungry at any time by executing Action 1 and selecting the corresponding alternative. While hungry, Action 2 is always enabled. When executed, it has the effect of requesting every missing fork for which no previous request is currently pending. This is achieved by sending the request token to the corresponding neighbor, including the current priority of the requesting process. As a result, the local token variable becomes false to indicate that a request has been sent. 6

8 process i, with unique identifier id i, and with local set of neighbors N(i) Code for process i var state i : {thinking, hungry, eating} initially, state i = thinking height i : integer initially, height i = 0 priority i : (height i process-id) initially, priority i = (0, id i ) fork ij : boolean, for each neighbor j initially, fork ij = (i > j) token ij : boolean, for each neighbor j initially, token ij = (i < j) 3P 1 : local eventually perfect detector 1 : {state i = thinking} Action 1 2 : state i := (thinking or hungry) Become Hungry 3 : {state i = hungry} Action 2 4 : j N(i) where (token ij fork ij ) do Request Missing Forks 5 : send-request priority i to j 6 : token ij := false 7 : {receive-request priority j from j N(i)} Action 3 8 : token ij := true Send Fork or 9 : if (state i = thinking (state i = hungry (priority i < priority j ))) Defer Request 10 : then send-fork i to j 11 : fork ij := false 12 : {receive-fork j from j N(i)} Action 4 13 : fork ij := true Obtain Shared Fork 14 : {state i = hungry ( j N(i) :: (fork ij j 3P 1 ))} Action 5 15 : state i := eating Enter Critical Section 16 : {state i = eating} Action 6 17 : Lower(priority i ) Exit Critical Section 18 : state i := thinking Send Deferred Forks 19 : j N(i) where (token ij fork ij ) do 20 : send-fork i to j 21 : fork ij := false 22 : procedure Lower (p : priority) Reduce Priority 23 : ensures p = Lower (p) where Integer Height Lowered 24 : (p.pid = p.pid) and (p < p) Process ID Unchanged Algorithm 1: Wait-Free Dining under Eventual Weak Exclusion 7

9 Any process that receives a request token via Action 3 decides whether or not to send the corresponding fork. If the recipient is thinking, or is hungry but lower priority than the requestor, then the fork must be sent immediately. Otherwise, the request is deferred until a later time specifically, until after eating. Eating processes always defer requests, as do higher-priority hungry neighbors. Deferred requests are represented by holding both the shared fork and the request token. Note that if a hungry process loses a requested fork to a higher-priority neighbor in Action 3, the relinquished fork will be re-requested by subsequently executing Action 2, which is always enabled while hungry. Action 4 simply receives forks, and Action 5 determines when a hungry process can transit to eating. A hungry process i can begin eating, if for each neighbor j, process i either holds the shared fork, or currently suspects j. This is the only action that utilizes the local oracle 3P 1 and it is central to the wait-freedom of the algorithm. In essence, suspicion serves as a local proxy for missing forks. Processes transit back to thinking by executing Action 6, which reduces priority and sends forks for any requests that were previously deferred while hungry or eating. This action invokes a local procedure called Lower that reduces the height component of priority by some positive integer. The magnitude of the reduction is up to the algorithm designer, and can be either statically fixed or dynamically chosen at runtime. Although much of the algorithm so far is fairly straightforward, this action masks a deeper subtlety that arises from an intrinsic limitation on local knowledge. In hygienic dining, a process must reduce its priority below that of all neighbors after eating. This absolute reduction forms the basis for progress, because diners cannot remain high priority after eating to continually overtake lower-priority hungry neighbors. Hygienic solutions typically encode the relative difference in priority in the forks. Since no process changes priority while thinking or hungry, and since every eating process holds all shared forks, it is easy to reduce priority absolutely after eating. In our algorithm, a process may eat without holding all shared forks if it suspects a neighbor while hungry. As such, the process may not know the priority of such neighbors after eating. Our proof of progress shows that reducing priority by an arbitrary amount is still sufficient for wait-freedom. 4 Proof of Correctness Safety: Algorithm 1 satisfies eventual weak exclusion 3WX. That is, for every execution there exists a time after which no two live neighbors eat simultaneously. Proof: The safety proof is by direct construction and depends only on the local eventually strong accuracy property of 3P 1. This property guarantees that for each run there exists a time t after which no correct process is suspected by any correct neighbor. Recall that t is unknown and may vary for each run. We observe that faulty processes cannot prevent safety from being established. Since faulty processes are live for only a finite prefix before crashing, they can eat simultaneously as live neighbors only finitely many times in any run. Consequently, we can restrict our focus to just the correct processes. Consider any execution α of Algorithm 1. Let t denote the time in α after which 3P 1 never suspects correct neighbors. Let p be any correct process that begins eating after time t. By Action 5, process p can only transit from hungry to eating if for each neighbor q, either p holds the shared fork or p suspects q. Since 3P 1 never suspects correct neighbors after time t in execution α, process p must hold every fork it shares with its correct neighbors in order to begin eating. So long as p remains eating, Actions 3 and 6 guarantee that p will defer all fork requests. Thus, p will not relinquish any forks while eating. Furthermore, 3P 1 has already converged in α, so no correct neighbor can suspect p either. Consequently, Action 5 remains disabled for every correct hungry neighbor of p until after p transits back to thinking. We conclude that no pair of correct neighbors can begin overlapping eating sessions after time t. 8

10 Progress: Algorithm 1 satisfies wait-free progress: every correct hungry process eventually eats. Proof: We prove progress by complete induction on the absolute depth of live processes in the total order of priorities. Recall that the set of live processes at time t is the set of correct processes plus the set of faulty processes that have not crashed by time t. Thus, for any fault pattern F over a set of process Π, we define live(t) = Π F (t). Next, we define the depth of a process p at time t as the number of processes in live(t) that have priorities greater than the priority of p. We define depth(p, t) formally as follows: depth(p, t) = { q live(t) : priority q > priority p : q} (1) (a) Conflict Graph (b) Depth Ordering Figure 1: Absolute depth of live processes in a conflict graph at time t. Figure 1 illustrates the depth function. Subfigure 1(a) shows a particular conflict graph with priorities for each process. Recall that priority is an ordered-pair consisting of an integer-valued height and a globally unique process identifier id. In this configuration, process F is crashed and all other processes are currently live. Subfigure 1(b) shows the depth ordering of the live processes. Note that the crashed process F is not included. Process D has the highest global priority, so its corresponding depth is 0. Although process C has a relative depth of 2 in the conflict graph, its absolute depth is 3 in the global depth ordering. Finally, process C is lower than process G, because although both have equal heights, ties are broken in favor of the higher process id; G has the greater priority (and hence lower depth) because G > C. Next, we define a measure for the priority distance between processes. For any two processes p and q, the function dist(p, q) measures the maximum number of times that p can eat while q is hungry, before the priority of p must become less than that of q. 9

11 0, if (prioirity p < priority q ) dist(p, q) = p.height q.height, if (prioirity p > priority q ) (p.id < q.id) p.height q.height + 1, if (prioirity p > priority q ) (p.id > q.id) (2) For example, in Figure 1 the priority distance between processes C and A is = 4. Note how this is value can be different than distances in the depth ordering, which in this case is only 1. Still, priority and depth are related. For example, suppose that depth(p, t) > depth(q, t) at some time t, and that dist(p, q) = k. Let t denote the time after which p eats for the k th time following t. Assuming that q does not eat during the interval [t, t ], we know that depth(p, t ) < depth(q, t ). In other words, p can overtake q at most k times before the depth of p must be deeper than the depth of q. We now prove that every correct hungry process eventually eats, by complete induction on the depth of live processes. Base Case: Suppose a correct process p is hungry at time t and that depth(p, t) = 0. By Action 3, hungry processes only relinquish forks to other higher-priority hungry neighbors. Since depth(p, t) = 0, process p has no higher-priority neighbors; consequently, p will keep every fork it either holds or collects until after it eats. By Action 5, p can only eat if for each of its δ neighbors, p either holds the shared fork or currently suspects that neighbor. Accordingly, let F orks p denote the set of neighbors for which p currently holds the shared fork, and let Suspects p denote the set of neighbors currently suspected by the local oracle 3P 1. This suggests the following progress metric M(p): M(p) = F orks p Suspects p (3) Note that M(p) is bounded above by δ, the total number of neighbors. Moreover, when M(p) = δ, Action 5 is enabled, which, upon execution allows p to eat and thereby satisfy progress. It remains to show that M(p) eventually reaches δ. We partition the neighbors of p into two disjoint sets: correct and faulty. Since p is hungry, Action 2 is enabled, which, upon execution will result in p sending requests for all of its missing forks. Since channels are assumed to be reliable, each fork request sent to a correct neighbor q will eventually be received, thereby causing q to execute Action 3. There are three possible outcomes: 1. If q is thinking, then q will send the shared fork immediately. 2. If q is hungry, then q still sends the shared fork immediately. By hypothesis, p has depth 0, so q must have lower priority. 3. If q is eating, then the fork request is temporarily deferred. As a correct process, however, q will send the shared fork after finite time when it executes Action 6 to stop eating. In all three cases, each correct neighbor immediately or eventually sends the requested fork back to p. Upon receiving each fork in Action 4, the value of M(p) increases by 1 or remains unchanged. The latter outcome occurs only when the corresponding neighbor is already suspected at p. Next, we consider the faulty neighbors of p. If a faulty neighbor q receives a fork request from p prior to crashing, then q sends the shared fork either immediately or (potentially) after eating, just like a correct process. In either case, the value of M(p) increases by 1 or remains unchanged when p receives the fork in Action 4. Alternatively, a faulty neighbor q might crash, either before receiving the fork request, or possibly while eating with the request deferred. At this point, we invoke the strong completeness property of 3P 1, which guarantees that every crashed neighbor is eventually and permanently suspected by p. Let t be the 10

12 unique time at which crashed neighbor q is permanently added to the suspect list at p. If p already holds the shared fork at this time, then M(p) remains unchanged; otherwise, the value of M(p) increases by 1. We conclude that p is guaranteed to collect every shared fork from each of its correct neighbors, and that p either collects the shared fork or permanently suspects each of its faulty neighbors. The union of these sets includes all δ neighbors of p. Consequently, M(p) eventually reaches the value δ after finite time, and so p is guaranteed to eat by Action 5. Remark 1: The base case actually proves that a correct hungry process at any depth will not wait indefinitely on its lower-priority neighbors. In the special case of depth 0, this is enough to establish progress, because all neighbors are lower priority. This result will be applied again in Case 2(a) of the inductive step, which the case where all higher-priority neighbors are correct. Remark 2: We observe that the metric M(p) can actually decrease while p is hungry. This is due to the eventual strong accuracy of 3P 1, which permits the oracle to wrongfully suspect live neighbors finitely many times during any execution. Suppose that 3P 1 suspects a live neighbor q while p is hungry, and that p does not currently hold the shared fork. If q is then removed from the suspect list, the value of M(p) decreases by 1. Is this a problem for progress? As demonstrated above, it is not. If the live neighbor q is actually correct, then p will eventually acquire the shared fork anyway. Alternatively, the live neighbor q is faulty, in which case it will eventually crash and become permanently suspected by p after finite time. In either case, the metric value of M(p) will increase by 1 and nullify the effect of the earlier decrease. To see this argument otherwise, we can introduce a submetric for the oracle itself. While p is hungry, the original metric M(p) can only decrease if its oracle 3P 1 makes a mistake. By eventual strong accuracy, however, 3P 1 only makes finitely many mistakes during any run. After each new mistake, the number of remaining mistakes decreases by 1 until some unknown time after which the oracle never suspects live processes. Thereafter, if p is still hungry, the value of M(p) only increases. Remark 3: The mistakes permitted by eventual strong accuracy may enable p to eat earlier than usual. For example, 3P 1 could initially suspect every neighbor, and thereby enable p to proceed directly to eating. Although such mistakes may have implications for safety, they cannot forestall progress. In particular, we note that with respect to 3P 1 the base case for progress depends only on the local strong completeness of this oracle. Inductive Hypothesis: Suppose for every time t 0, and for every depth d > 0, that every correct hungry process p with depth(p, t) < d eventually eats. We will show that any correct process q that is hungry at time t with depth(q, t) = d eventually eats, regardless of how many processes crash. Suppose that a correct process q is hungry at time t with depth(q, t) = d. In Algorithm 1, no process reduces its priority while hungry. Additionally, no process increases its priority at any time. As such, the depth of q will never become deeper than d so long as q remains hungry. Priority only changes when an eating process executes Action 6 to transit back to thinking. This is achieved by invoking the local procedure Lower(priority i ) on line 17, which reduces the height component of priority i by an arbitrary positive integer. Next, we partition the inductive step into two cases. In the first, some higher-priority live process is faulty. In the second, all higher-priority processes are correct. Case 1: Suppose there exists a live faulty process r Π with depth(r, t) < d at time t. By definition of the function depth, the priority of r must be greater than the priority of q. Since r is faulty, however, there is a finite time t > t at which r crashes. Crashed processes are irrelevant to depth, so the crash of r at time t actually causes the depth of q to rise, even though the priority of q remains unchanged. Consequently, we have depth(q, t ) < d at time t, and so by application of the inductive 11

13 hypothesis, we conclude that q eventually eats. It may be of interest that this case is the same whether or not r is actually a neighbor or q. Case 2: Suppose that every process in Π that is higher-priority than q at time t is correct. All future executions after time t can be divided into two subcases: futures in which every higherpriority process at time t eventually thinks forever, and futures in which some higher-priority process at time t becomes hungry infinitely often. Case 2(a) There exists a time t > t after which all such processes remain thinking forever. As mentioned earlier in Remark 2, the situation for q with respect to its lower-priority neighbors actually reduces to the base case. It remains to show that q will also acquire all of its high forks. Suppose that p is a correct higher-priority neighbor that thinks forever after time t and that q does not currently hold the shared fork. If the fork is already in transit from p to q, then it will arrive after finite time. Otherwise, Action 2 remains enabled while q is hungry, so the fork request has or will be sent to p. Since p thinks forever after time t, the fork request will be granted when p executes Action 3. Once the fork arrives at q it will never be relinquished again to p because p never becomes hungry in the future. Consequently, q will collect and hold the shared fork for each of its high neighbors, and will collect the shared fork or permanently suspect each of its low neighbors. This enables Action 5, so q eventually eats. Case 2(b) There exists a correct higher-priority process r that becomes hungry infinitely often. Recall that depth(q, t) = d at time t. Since r is correct and is higher priority than q at time t, we know that depth(r, t) < d. Every time that r becomes hungry with depth less than d, we can apply the inductive hypothesis to conclude that r eventually eats. Since r is a correct process, it eventually transits from eating back to thinking by executing Action 6. This action reduces the integer-valued height component of its priority by at least 1 every time it is executed. Suppose dist(r, q) = k at time t. Then r can eat at most k times while q remains hungry before the priority of r must be lower than that of q. This suggests the following progress metric for q: Let t i denote the i th time that r executes Action 6 after time t. Then at time t k, process r has reduced its priority k times, and must now be lower priority than q. Consequently, the depth of q at time t k must be less that d, so we can apply the inductive hypothesis to q to conclude that q eventually eats. 5 Conclusions This paper explores the possibility of wait-free dining philosophers in partially synchronous environments subject to process crash faults. Although dining is unsolvable under stronger safety constraints, we present a wait-free solution under the more flexible requirements of eventual weak exclusion 3WX. Our algorithm uses a local refinement of the eventually perfect failure detector 3P 1. This oracle can be implemented in sparse networks than are partitionable by crash faults. As such, our solution is also practical in the sense that it can scale to large networks. From a broader perspective, dining philosophers is essentially a scheduling problem subject to local, overlapping constraints on shared resource utilization. As such, wait-free dining under eventual weak exclusion defines a class of schedulers that is permitted to make finitely many scheduling mistakes. By way of motivation, we illustrated some potential applications of this safety model to duty-cycle scheduling for wireless sensor networks and daemon refinement for stabilizing algorithms. It remains to be seen how this model may be applied to these and other potential scenarios where resources can be recovered from crashed nodes and/or scheduling violations precipitate only transient repairable faults. Error is acceptable as long as we are young, but one must not drag it along into old age. Johann Wolfgang von Goethe, Art and Antiquity, III, 1 (1821) 12

14 References [1] E. Anceaume, A. Fernández, A. Mostéfaoui, G. Neiger, and M. Raynal. A necessary and sufficient condition for transforming limited accuracy failure detectors. J. Comput. Syst. Sci., 68(1): , [2] Joffroy Beauquier, Ajoy Kumar Datta, Maria Gradinariu, and Frédéric Magniette. Self-Stabilizing Local Mutual Exclusion and Daemon Refinement. Chicago J. Theor. Comput. Sci, 2002, [3] Joffroy Beauquier and Synnöve Kekkonen-Moneta. Fault-tolerance and self-stabilization: impossibility results and solutions using self-stabilizing failure detectors. International Journal of Systems Science, 28(11): , [4] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2): , [5] K. Mani Chandy and Jayadev Misra. Parallel Program Design: A Foundation. Addison-Wesley, Reading, MA, USA, [6] Flaviu Cristian. Understanding fault-tolerant distributed systems. Communications of the ACM, 34(2):56 78, [7] Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui, and Petr Kouznetsov. Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput., 65(4): , [8] Edsger W. Dijkstra. Hierarchical ordering of sequential processes. Acta Informatica, 1(2): , Oct Reprinted in Operating Systems Techniques, C.A.R. Hoare and R.H. Perrot, Eds., pp , Academic Press, An earlier version appeared as EWD310. [9] Shlomi Dolev. Self-Stabilization. MIT Press, [10] Cynthia Dwork, Nancy A. Lynch, and Larry Stockmeyer. Consensus in the Presence of Partial Synchrony. Journal of the ACM, 35(2): , April [11] Christof Fetzer, Ulrich Schmid, and Martin Susskraut. On the possibility of consensus in asynchronous systems with finite average response times. In 25th International Conference on Distributed Computing System, pages IEEE Computer Society, [12] Rachid Guerraoui and André Schiper. Γ Accurate Failure Detectors. In Özalp Babaoglu and Keith Marzullo, editors, Distributed Algorithms, 10th International Workshop, WDAG 96, volume 1151 of Lecture Notes in Computer Science, pages , Bologna, Italy, 9 11 October Springer. [13] Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1): , [14] Martin Hutle and Josef Widder. Self-stabilizing failure detector algorithms. In Thomas Fahringer and M. H. Hamza, editors, Parallel and Distributed Computing and Networks, pages IASTED/ACTA Press, [15] Nancy Lynch. Fast allocation of nearby resources in a distributed system. In Proceedings of the 12th ACM Symposium on Theory of Computing (STOC), pages 70 81, [16] Mikhail Nesterenko and Anish Arora. Stabilization-Preserving Atomicity Refinement. Journal of Parallel and Distributed Computing, 62(5): , May [17] Scott M. Pike and Paolo A.G. Sivilotti. Dining Philosophers with Crash Locality 1. In Proceedings of the 24th IEEE International Conference on Distributed Computing Systems (ICDCS), pages IEEE, [18] Michel Raynal and Frédéric Tronel. Restricted failure detectors: Definition and reduction protocols. Information Processing Letters, 72(3 4):91 97, November

Wait-Free Dining Under Eventual Weak Exclusion

Wait-Free Dining Under Eventual Weak Exclusion Scott M. Pike, Yantao Song, and Srikanth Sastry Texas A&M University Department of Computer Science College Station, TX 77843-3112, USA {pike,yantao,sastry}@cs.tamu.edu