CONSENSUS IN THE CRASH-RECOVERY MODEL

Size: px
Start display at page:

Download "CONSENSUS IN THE CRASH-RECOVERY MODEL"

Transcription

1 Rheinisch-Westfälische Technische Hochschule Aachen Department of Computer Science CONSENSUS IN THE CRASH-RECOVERY MODEL DAS CONSENSUS PROBLEM IN VERTEILTEN SYSTEMEN MIT ANHALTE-AUSFÄLLEN UND NEUSTARTS Diploma Thesis of CHRISTIAN LAMBERTZ Thesis advisor: Prof. Dr. Felix Freiling Second examiner: Prof. Dr. Klaus Wehrle

2

3 iii I hereby declare that I have created this work completely on my own and used no other sources or tools than the ones listed, and that I have marked any citations accordingly. Hiermit versichere ich, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt sowie Zitate kenntlich gemacht habe. Aachen, December 17th, 2007 (Christian Lambertz)

4

5 v Contents Abstract xix Zusammenfassung xxi Acknowledgements xxiii 1 Introduction Motivation Problem Statement Result Overview Roadmap Conventions Definitions Algorithms Model Environment Failures

6 vi Contents 2.3 Stable Storage Algorithms Communication Fair Loss Links Stubborn Links Quiescence Failure Detection Synchronization The Consensus Protocol Consensus Definition Uniform Consensus Quiescence of Consensus Space-Time Diagrams Summary Related Work Complexity of Consensus Solving Consensus in the Crash-Stop Model Perfect Failure Detector and One Correct Process Eventually Perfect Failure Detector and Correct Majority Weakest Failure Detector for Crash-Stop Consensus Crash-Recovery Consensus

7 Contents vii Oliveira, Guerraoui, and Schiper Hurfin, Mostéfaoui, and Raynal Aguilera, Chen, and Toueg Lamport s Paxos Algorithm Summary New Consensus Algorithms Necessary Process State Assumption in the Absence of Stable Storage Consensus Algorithms with Synchronizers Crash-Stop Model and Synchronizers Crash-Recovery Model and Synchronizers Summary Emulation Technique Idea The CS-consensus Interface Solutions using the Emulation Technique without Stable Storage Perfect Failure Detector and One Always Up Process Eventually Perfect Failure Detector and Always Up Majority Eventually Perfect Failure Detector, Correct Majority, and One Always Up Process Eventually Perfect Failure Detector and More Always Up Than Incorrect Processes

8 viii Contents Summary Solutions using the Emulation Technique with Stable Storage Failure Detector Extension Eventually Perfect Failure Detectors still need a Correct Majority of Processes Eventually Perfect Failure Detector, Correct Majority, and Last Message Storage Weakening of the Process State Assumption Summary Repeated Consensus Definition Adjusting the Emulation Technique Repeated Consensus Algorithm Combining the Stubborn Links Modules Summary Conclusion New Results Further Ideas A Stubborn Links 107 A.1 Asynchronous Timer A.2 Implementation

9 Contents ix A.3 Comparison with Other Definitions B Recurrent Completeness 111 C The More Always Up Than Incorrect Processes Assumption 113 Bibliography 117 Index 119

10

11 xi List of Algorithms 3.1 Crash-stop consensus algorithm that uses P Crash-stop consensus algorithm that uses P Round based crash-stop consensus algorithm that uses Z Tick based crash-stop consensus algorithm that uses Z Consensus algorithm with Z and at least one always up assumption Emulator with P and at least one always up assumption Emulator with P and always up majority assumption Emulator with P, stable storage, and correct majority assumption Repeated emulator with P, stable storage, and correct majority assumption A.1 Stubborn Links Implementation C.1 Consensus algorithm with P and more always up than incorrect assumption

12

13 xiii List of Definitions 2.1 Asynchronous System Protocol Failure Pattern Crash-Stop Model Crash-Recovery Model Stable Storage Algorithm Interface Links Fair Loss Links Interface of Fair Loss Links Stubborn Links Interface of Stubborn Links Quiescence Failure Detector Interface of Failure Detectors

14 xiv List of Definitions 2.17 Perfect Failure Detector Eventually Perfect Failure Detector Synchronizer Interface of Synchronizers Consensus Protocol Interface of Consensus Uniform Consensus Protocol Quiescent Consensus Interface of CS-consensus Repeated Consensus Interface of Repeated Consensus Interface of CR-consensus A.1 Asynchronous Timer A.2 Interface of Asynchronous Timers

15 xv List of Figures 1.1 Mobile Phone Example Consensus Solvability Results without Stable Storage Consensus Solvability Results with Stable Storage The Computation Layers of Algorithm Process Speed Assumption Unreliability of Communication Channels Process Failure Classification in the Crash-Stop Model Process Failure Classification in the Crash-Recovery Model Fair Loss Links Usage Stubborn Sending of a Message Meaning of Stubbornness Legend of Space-Time Diagrams The FLP Impossibility Example Run 1 of Algorithm Example Run 2 of Algorithm Example Run of Algorithm

16 xvi List of Figures 3.5 Comparison of Crash-Recovery Solutions Result Overview of Section Example Run of Algorithm Example Run of Algorithm Result Overview of Subsection Example Run of Algorithm The Consensus Interface The CS-consensus Interface Usage of the CS-consensus Interface Result Overview of Subsection Example Run 1 of Algorithm Example Run 2 of Algorithm Example Run 3 of Algorithm Result Overview of Subsection Result Overview of Subsection Correct Majority and One Always Up Process Impossibility Result Overview of Subsection Summary of the Consensus Results without Stable Storage Result Overview of Subsection Eventually Perfect Failure Detector, Stable Storage, and One Always Up Process Impossibility Result Overview of Subsection

17 List of Figures xvii 4.21 All Messages of Algorithm Example Run of Algorithm Result Overview of Subsection One Correct Process Impossibility Usage of the CR-consensus Interface Example Run 1 of Emulation Algorithm Example Run 2 of Emulation Algorithm B.1 Insufficiency of Original Failure Detectors in the Crash- Recovery Model

18

19 xix Abstract This diploma thesis examines the solvability of the consensus problem in asynchronous distributed systems with a specific failure assumption called the crash-recovery model. Processes can crash and recover, but otherwise they behave benign, and losses in the message exchange are fair. Roughly, to solve consensus, every process in the system proposes a value, and all eventually not any more crashing processes have to decide a common value, which must be one of the proposed ones. The research on the consensus problem under a weaker failure assumption the crash-stop model where processes stop participating after a crash is rather sophisticated, so instead of developing completely new algorithms, research of reusing already known crash-stop algorithms in the crash-recovery model is conducted in this thesis with the help of emulation algorithms. Throughout this research, different assumptions on the amount of faulty processes, the availability of stable storage, and accessibility of means of synchrony such as failure detectors and synchronizers are made.

20 xx Abstract

21 xxi Zusammenfassung Die vorliegende Diplomarbeit untersucht die Lösbarkeit des Consensus Problems in asynchronen verteilten Systemen unter der Annahme eines bestimmten Fehlermodells, welches Crash-Recovery Modell genannt wird. In diesem Modell können Prozesse durch Abstürze ausfallen und später neu starten, aber verhalten sich sonst gutartig und die Nachrichtenverluste im Netzwerk sind gerecht. Um Consensus zu lösen, muss jeder Prozess einen Wert vorschlagen und alle schließlich nicht mehr abstürzenden Prozesse müssen sich auf einen gemeinsamen Wert einigen, der unter den Vorgeschlagenen ist. Die Erforschung des Consensus Problems unter einem schwächeren Fehlermodell, welches Crash-Stop Modell genannt wird und in dem die Prozesse nach einem Ausfall nicht neu starten, ist weit fortgeschritten. Anstatt nun neue Algorithmen für das Crash-Recovery Modell zu entwickeln, wird in dieser Diplomarbeit die Wiederverwendung der bereits bekannten Crash-Stop Modell basierten Algorithmen im Crash-Recovery Modell untersucht. Diese Wiederverwendung wird durch so genannte Emulationsalgorithmen ermöglicht. Während der Untersuchung werden verschiedene Annahmen über die Anzahl fehlerhafter Prozesse, das Vorhandensein eines stabilen Speichers und die Zugänglichkeit von Synchronisationsmitteln wie Fehlerdetektoren und Synchronisierer gemacht.

22

23 xxiii Acknowledgements I would like to thank all the people who supported me with technical advices and personal encouragements while I was writing my diploma thesis. I thank my advisor Prof. Dr. Felix Freiling for the introduction to the topic of reliable distributed computing and for the time he spent reading and discussing preliminary versions of this work. Furthermore, thanks to Prof. Dr. Klaus Wehrle for evaluating my diploma thesis as second examiner.

24

25 1 Chapter 1 Introduction Distributed systems can be found everywhere nowadays: from small personal networks up to big computing environments. The most famous and biggest one connects billions of single computers and coined the term Internet. As more and more important computations are executed on large groups of connected machines, reliable communication and agreement among the single systems become essential. Booking reservations, online payment, and banking are only three examples that rely on agreement of the participating parties. Such agreements often take place among distributed databases, which have to either commit or abort transactions that change important data, e.g., the withdrawal of money from bank accounts. In this thesis, the basic abstraction of all these forms of agreement problems is studied, the consensus problem. Before explicating the exact problem, an example of consensus in the world of mobile phones is presented to motivate the problem and to rise the interest in studying it. 1.1 Motivation Consider five with mobile phones equipped persons, who want to agree on a meeting point and time. They communicate via SMS 1 messages only, i.e., they do not call each other and talk directly. Everybody suggests his preferred meeting data in a message and is allowed to send any amount of additional messages in order to achieve agreement on the meeting. Unfortunately, all of them bought the same mobile phone, which was very cheap and sometimes becomes extremely slow, i.e., the input of text can take a long time. Even worse, occasionally a whole phone freezes and a person 1 SMS abbreviates short message service in mobile phone networks.

26 2 1 Introduction must reset it. But this reset does not always help, i.e., a phone can freeze forever, and the owner cannot attempt any communication with the others any longer. Furthermore, their mobile provider is currently working heavily on the network. Thus, the network can be slow and occasionally fails, i.e., it loses a message without notifying the sender or receiver. The intention of the persons is that everybody with a working phone eventually knows the same meeting point and time. A person, whose phone froze forever, is not demanded to arrive; he or she has a good excuse. But actually, all persons want to come to the meeting. Figure 1.1 illustrates the situation, the nodes depict the persons mobile phones, and the lines between the nodes depict the mobile network Phones can be slow. Sometimes they freeze. Resets are a remedy. But do not always help. 4 3 Network can be slow. Messages can get lost. Figure 1.1: Communication situation of the five persons. Interestingly, the persons are unable to solve the problem, i.e., to ensure that everybody with a working mobile phone arrives at the meeting, even if all persons have a working phone. But why are they unable to solve this simple task and which additional assumptions are required to solve it? The research on these assumptions is the topic of this thesis. The mobile phones of the persons are abstracted as processes and the mobile network as communication channels between these processes. The possible failures of the processes are abstracted as failure models, which provide a precise definition of the faulty behavior. 1.2 Problem Statement Processes in a distributed system have to agree on a common value to solve the consensus problem. Therefore, each of the processes provides one input value, called the proposal. The agreed value, called the decision, has to be in the set of the input values. The processes are allowed to send messages over

27 1.3 Result Overview 3 unreliable communication channels in order to reach a decision. Of course, not all messages are lost by the channels, some delivery guarantees, called fair loss links, are provided by the distributed system. Two failure models abstract the faulty behavior of the processes. The first one, called crash-stop model, allows a process to crash only once. After this crash a process is not allowed to take any further step. The second one, called crash-recovery model, allows several crashes with in-between recoveries. A crashed process is allowed to recover, but it loses all pre-crash information that was not stored on a special location, called stable storage, previously. The failure models group the processes into sets of correct processes and incorrect processes, e.g., in the crash-recovery model, the correct processes either never crash (called always up processes) or stop crashing eventually (eventually up), and the incorrect ones either are permanently crashed eventually (eventually down) or oscillate forever between crashes and recoveries (unstable). Consensus is studied in distributed system abstractions that consist of a failure model, an assumption of how many processes are correct and incorrect, and a mean of synchrony, which provides the processes with additional information. Note that without such a mean, consensus is not solvable as already previewed in the mobile phone example. The available means of synchrony in this thesis are failure detectors and synchronizers. A failure detector provides the processes with the ability to suspect any process of belonging to the set of incorrect processes. Thereby, two classes of failure detectors are studied, one called perfect failure detector, in which the detector makes no suspicion mistakes, and one called eventually perfect failure detector, in which the detector is allowed to make finite many suspicion mistakes. A synchronizer provides the processes with synchronization points at which all previously sent messages are delivered. Of course, with the help of these synchronization points, the processes can also detect incorrect processes, e.g., if no more I am alive messages are received from a certain process, this process presumably crashed. The following section provides an overview of the results of this thesis. The results focus on the crash-recovery model only, because the research on the crash-stop model is already rather sophisticated, and the ideas of the crashstop research are reused in the new ideas of this thesis. 1.3 Result Overview The following two figures 1.2 and 1.3 on page 4 and 5 summarize the studied cases of consensus solvability in the crash-recovery model.

28 4 1 Introduction no stable storage eventually perfect failure detector perfect failure detector synchronizer one correct ( n crashes ) impossible impossible impossible correct majority ( n crashes ) impossible impossible see 4.1/p. 52 impossible see 4.1/p. 52 one always up ( n 1 crashes ) impossible emulator see 4.4.1/p. 65 algorithm see 4.2.2/p. 58 correct majority & one always up ( n 1 crashes ) impossible see 4.4.3/p. 74 solvable solvable more always up than incorrect ( nbad crashes ) algorithm see app. C p. 113 solvable solvable always up majority ( n 1 2 crashes) emulator see 4.4.2/p. 71 solvable solvable Figure 1.2: Consensus solvability results in the crash-recovery model without the availability of stable storage. In this thesis, three parameters can be adjusted to study the consensus solvability. These are the availability of stable storage, a process state assumption, and a mean of synchrony, i.e., failure detectors and synchronizers. Since the availability of stable storage is either yes or no, two tables are used to present the results in a clearly arranged manner. The topmost row in the tables determines the used mean of synchrony, and the leftmost column determines the assumed presence of correct processes. The amount of processes that is allowed to crash simultaneously is enclosed in parentheses under the presence assumption. The red arrow that connects some cells depicts a logical implication, i.e., if the source cell of the arrow contains a particular result, the result in the destination cell of the arrow logically follows from the first one.

29 1.3 Result Overview 5 stable storage eventually perfect failure detector perfect failure detector synchronizer one correct ( n crashes ) impossible impossible see 4.5.4/p. 89 impossible see 4.5.4/p. 89 correct majority ( n crashes ) emulator see 4.5.3/p. 82 solvable solvable one always up ( n 1 crashes ) impossible see 4.5.2/p. 80 trivial see w/o stable storage trivial see w/o stable storage correct majority & one always up ( n 1 crashes ) emulator see 4.5.3/p. 82 solvable solvable more always up than incorrect ( nbad crashes ) trivial see w/o stable storage solvable solvable always up majority ( n 1 2 crashes) trivial see w/o stable storage solvable solvable Figure 1.3: Consensus solvability results in the crash-recovery model with the availability of stable storage. The single results can be distinguished in four groups. Impossibility results are simply denoted by impossible, and solutions that follow trivially from another one are denoted by solvable. The cases that are solvable with a special algorithm, which uses another algorithm from the crash-stop model, are denoted by emulator, and direct algorithms are simply denoted by algorithm. These emulation algorithms are the main idea of this thesis, and most of the work is spent on their study. The two figures 1.2 and 1.3 are later used in a condensed form to guide through this research. In all cases, the aim of the research is simplicity over efficiency, i.e., the solutions rather avoid performance than scarifying comprehensibility, although some performance issues are obvious and could easily be added.

30 6 1 Introduction 1.4 Roadmap In chapter 2, the complete computational model and environment are precisely defined. The above mentioned crash-stop and crash-recovery failure models, communication channels, failure detectors, and synchronizers are explained in more detail. Of course, the consensus problem is also defined. The chapter forms the foundation for the whole research in the succeeding chapters. Thus, all important definitions are presented in a regular sentence and a logical notation in order to emphasize their exact meaning. The reason for the impossibility of consensus without the availability of a mean of synchrony is explicated in chapter 3, and several solutions in the crash-stop and crash-recovery model are discussed. Two crash-stop model based algorithms are presented in detail, because they are important for the mentioned emulation idea. The complexity of the described crash-recovery solutions motivates the idea of emulation in the next chapter. In chapter 4, new approaches on the consensus solvability with the idea of emulating the crash-stop consensus solutions in the crash-recovery model are studied. The whole research is oriented at the two figures 1.2 and 1.3, and all cases are discussed step by step, e.g., first the processes are denied to use stable storage and later allowed. The emulation idea is further developed in chapter 5 in order to solve an extension of the consensus problem, called repeated consensus, in which the processes have to solve several possibly simultaneous instances of consensus. Note that chapters 4 and 5 contain the own work of this thesis. The last chapter 6 concludes the thesis and discusses further ideas. Three appendices contain additional information that is taken out of the main part of the thesis. The reason for this can be found in the text when the corresponding appendix is mentioned for the first time. An index at the end of the thesis provides quick access to the properties of the definitions of chapter 2, whose exact meaning becomes important in several proofs. 1.5 Conventions Throughout this thesis, the following conventions are used for the definitions and for the notation of algorithms in pseudo code, which is used as high-level description of algorithms, because it allows to neglect unnecessary details of real programming languages such as special variable declarations and obscure syntaxes.

31 1.5 Conventions Definitions Definitions are presented in a regular explanation and a logical notation. This logical notation is similar to first-order logic, but with the addition of several available domains for the variables. The two available quantifiers bound variables to these domains with the element symbol, i.e., with the following terms: n N and n N respectively. All quantifiers appear first in any logical formula in order to avoid mixing the important part with the quantification of variables. Note that the end of a definition is indicated by more vertical space between the last line of the definition and the next paragraph, but the end also becomes clear from the context. A list of all definitions of this thesis is provided on page xiii after the table of contents Algorithms Algorithms are written in pseudo code similar to the notation used in [Cormen et al., 2001]. The following pseudo code conventions are used: 1. Indentation indicates a block structure in order to increase the readability of the code and to reduce clutter. In real programming languages, this is typically done by bracketing or special begin and end statements; such marks are completely omitted here. 2. The symbol indicates that the remainder of the line is a comment. 3. If a line number is missing in a code listing, the unnumbered line belongs to the previous numbered one. 4. Variables are uncapitalized, e.g., message m. A set of variables is capitalized, e.g., set of messages M. Tuples of variables, e.g., (a, b), are also possible. Assignments are depicted by the symbol, e.g., m 0 and M {0}. 5. The symbol is the default value of any variable. 6. The amount of elements in a set M can be determined by M. The empty set is denoted by with = 0. Sets are duplicate free, i.e., {0, 1, 1} = {0, 1} = Text-strings are written in small capitals, e.g., TEXT. 8. Arrays are denoted by a[], and its elements can be accessed by a[i], with i N. If an element has not been set before, the value is returned. An array can be reset by the statement a[].

32 8 1 Introduction 9. A multiple assignment of the form a b c assigns to both variables a and b the value of expression c. 10. Because this thesis studies distributed systems and therefore message exchanges, a special syntax for messages is used. The content of a message m can be accessed by m = v 1, v 2,..., v i, where v 1, v 2,..., v i are variables, which reference the values of the corresponding slots in message m. 11. The boolean operators and and or are short circuiting, i.e., if the expression a and b is evaluated, a is evaluated first and b is only evaluated if a is true; because if a is false, the whole expression can only be false. If the expression c or d is evaluated, c is evaluated first and d is only evaluated if c is false; because if c is true, the whole expression is true regardless of the evaluation of d. The listing of algorithms is similar to the notation of [Guerraoui and Rodrigues, 2006]. Algorithm 1.1 shows the general structure of algorithms in this thesis. Algorithms always implement protocols, which offer request and indication events. A caller of a protocol, denoted by the term application, can ask for information with request events and the protocol answers with corresponding indication events. Figure 1.4 depicts the layers of algorithm 1.1. Note that the terms algorithm, protocol, and event are all defined precisely in the next chapter. Algorithm 1.1 Example Algorithm Implements: Protocol X (start, stop) Uses: Message Exchange (send, receive) Assumption: at least one correct process. The same algorithm runs on every process p {1,..., n}. 1: upon start do 2: send message EXAMPLE to all 3: upon receive message EXAMPLE do 4: stop The header of the pseudo code listing contains information about the protocol that is implemented by the algorithm, a list of other used protocols and their events, and possible assumptions that must hold in order to satisfy all properties of the implemented protocol.

33 1.5 Conventions 9 The event model of the algorithms works as follows. At every computational step and each time the input changes, the listing is executed top down, i.e., the first upon statement that matches the new input is executed. Thereby, input refers to any new information, e.g., received messages, calls from higher application layers, and notifications from lower protocols. Application (caller) Requests start stop Indications Protocol X Requests send receive Indications Message Exchange Figure 1.4: Illustration of the computation layers of algorithm 1.1. The red arrows illustrate the events occurring in algorithm 1.1 Note that a list of all algorithms of this thesis is provided on page xi and a list of all figures on page xv.

34

35 11 Chapter 2 Model In this chapter, the computational model is introduced. The model abstracts real distributed systems and failures in order to be able to study the solvability of consensus among certain failure models in the next chapters. All assumptions and properties are defined in a full sentence and by a logical term. This helps to clarify special cases and to emphasize the exact meaning of properties. 2.1 Environment The first definition introduces the basic abstraction of this thesis, the asynchronous system. Henceforth, all definitions rely on this system. Definition 2.1 (Asynchronous System): An asynchronous system is a distributed system, which consists of a set of n processes, denoted by Π = {1,..., n}. This set is static, i.e., no additional processes are present, and all processes are aware of each other. The computation speed of each single process is unknown; while one process takes a step, another process can take any finite number of steps. In the system exists a discrete global clock, T N. Every step in the system corresponds to a tick, t T, of this clock. The processes do not have access to the clock; it just simplifies the representation of the system. Furthermore, all processes are connected to a network so-called links between the processes and communicate by message exchange, i.e., the system provides methods for sending and receiving messages over the network. The channels of the network are unreliable, i.e., messages can get lost and transmission delays are unknown.

36 12 2 Model Figure 2.1 illustrates the process speed assumption in asynchronous systems, which states that no process can take infinite many steps while any other process takes only one. p p } {{ } p 1 takes at most a finite number of steps Figure 2.1: Process speed assumption in asynchronous systems. Figure 2.2 depicts the unreliability of the communication channels in asynchronous systems. A process that waits for a message cannot distinguish between an unknown transmission delay and the loss of the message. m p 1 1 m 2 p 2 Time t: t s t r t Figure 2.2: Illustration of the unreliability of the assumed communication channels in asynchronous systems. The transmission delay t is unknown as in the case of message m 1, and messages can get lost as in the case of message m 2. The problem is the uncertainty, whether the message is lost or just needs more time until its delivery. The next task is the definition of the purpose of an asynchronous system. As mentioned in the introductory chapter, the processes in such a system should solve problems. This is accomplished by algorithms, which describe the action of each process. A precise definition of algorithms follows in section 2.4 of this chapter. The problems that processes in a distributed computing environment have to solve are called protocols in this thesis and defined as follows. Definition 2.2 (Protocol): A protocol is a task that a set of processes has to solve in an asynchronous system. Usually, the protocol defines properties that have to be satisfied in order to solve the task. Two classes of properties exist:

37 2.2 Failures 13 Liveness properties: Something good happens. Liveness properties typically describe anything that has to happen eventually during a computation, e.g., termination of an algorithm. Safety properties: Nothing bad happens. Safety properties typically describe anything that must not happen during a computation, e.g., wrong output of an algorithm. This classification of properties was formalized by Alpern and Schneider [1985]. They also proved that any problem specification, which is called protocol in this thesis, can be written as conjunction of safety and liveness properties. An algorithm implements a protocol, if it solves the task and meanwhile fulfills all properties. If the processes are not caught in a deadlock situation, i.e., the algorithms are well designed in order to avoid deadlocks, and all processes work flawlessly, every basic protocol could be solved. Assuming message delivery guarantees, the processes could just wait long enough until all communication problems are dyed out, i.e., all messages are delivered. Since no failures occur inside the processes, all liveness properties are fulfilled eventually. 2.2 Failures A flawless distributed system is unrealistic. Failures can happen at all times; therefore, the remaining system should react in some way to complete its tasks. In order to understand possible failures, the following definition abstracts a common failure the crash of a process that influences the solvability of protocols heavily as studied later. Definition 2.3 (Failure Pattern): A failure pattern F is a function that determines the set of processes that are not functioning are down at a particular time of the global clock. Thus, F: T 2 Π and F(t) = {p Π p is down at t}. If process p / F(t), p is up at time t. Furthermore, if process p / F(t 1) and p F(t), p crashes at time t denoted by p F c (t). If process p F(t 1) and p / F(t), p recovers at time t denoted by p F r (t). This definition enables the description of failures occurring in a computation. But as mentioned before, if no failure occurs in the asynchronous system, the processes could solve protocols easily. Such algorithms have only to deal

38 14 2 Model with message losses and can wait for all processes to finish their computation, since no processes crashes. But this scenario is unrealistic. Therefore, the following two failure models are introduced, which extend the asynchronous system model with possible failures. The first failure model allows processes to crash once, but thenceforward they have to stop to participate in the computation of the system. Definition 2.4 (Crash-Stop Model): In the crash-stop model processes can fail by crashing and do not recover. According to a failure pattern F, this means if process p F(t c ), i.e., p crashes at time t c, then t T : t t c p F(t). Thus, the processes can be classified in two groups: Correct or good processes that do not crash. These processes are denoted by good(f) = {p Π t T : p / F(t)}. Incorrect or bad processes that crash. They are denoted by bad(f) = {p Π t T : p F(t)}. Figure 2.3 illustrates the possible failure behavior of the processes in the crash-stop model. p 1 up } correct p 2 down } incorrect Time t: t Figure 2.3: Classification of process failure behavior in the crash-stop model. The second failure model allows processes to crash and later recover. Definition 2.5 (Crash-Recovery Model): In the crash-recovery model processes can fail by crashing, but later recover. So once they recovered, they can crash again then recover and so on. Because of this behavior, four groups of processes can be distinguished (according to a failure pattern F): Always up processes that never crash. Eventually up processes that crash and recover finitely often, but eventually remain up. Eventually down processes that crash and recover finitely often, but eventually remain down.

39 2.2 Failures 15 Unstable processes that crash and recover infinitely often. The always up and eventually up processes together are named correct or good processes and denoted by: good(f) = {p Π t T : p / F(t)} {p Π s T t T : t > s p / F(t)} Similarly, the eventually down and unstable processes are named incorrect or bad processes. They are denoted by: bad(f) = {p Π s T t T : t > s p F(t)} {p Π t T t T : t > t ( p F(t) p / F(t ) ) ( p / F(t) p F(t ) ) } Figure 2.4 illustrates the possible failure behavior of the processes in the crash-recovery model. p 1 p 2 always up eventually up }{{} correct p 3 p 4 eventually down unstable }{{} incorrect Time t: t Figure 2.4: Classification of process failure behavior in the crash-recovery model. The two definitions of failure models are similar; they even use the same terminology for sets of processes with the same faulty behavior. Thus, in the following it is always made clear, which model is used to avoid confusion. On the other hand, this similarity helps to differentiate between the models directly. Note that the failure models are very different from the amount of algorithmic complexity. In the latter model the algorithms have to deal with recoveries and, even worse, with unstable processes, i.e., no point in time exists when no more crashes occur, and the system somehow stabilizes.

40 16 2 Model 2.3 Stable Storage If an algorithm runs in the crash-recovery model and if a process crashes and later recovers, this process loses all variables due to the crash, and therefore its whole state. But, some of these variables could be important for the whole run of the algorithm, so processes in the crash-recovery model have the ability to save variables on stable storage, which is preserved during a crash period. Definition 2.6 (Stable Storage): The processes have access to stable storage in order to store values and to retrieve them after a recovery. For this purpose, special save and load functions, which are defined in the upcoming definition 2.8, are provided by the system. Note that the access to stable storage is typically expensive. Thus, usage of storage operations should be minimized, and stored values should be as small as possible. 2.4 Algorithms In this section, the often mentioned term algorithm is defined precisely. This is important in order to understand the behavior of runs of these algorithms in asynchronous systems. Definition 2.7 (Algorithm): An algorithm in a distributed system is a set of n automata, one for each process. These automata proceed in atomic steps, which change the state a representation of the local variables of each process. Every step corresponds to a tick t T of the global clock. According to a failure pattern F, the automaton at a process p takes only a step a so-called normal step at a time t, if p / F(t). Otherwise, p is in a special state at time t, the crash state. In this state, all local variables not on stable storage are lost. At the time t a process p crashes p / F(t 1) p F(t) p takes a socalled crash step at time t 1. If p recovers at time t, it takes a special recovery step first and then can take a normal step again at time t + 1. At one normal step a process can perform two of the following six actions: Send/receive a message to/from the network. Set/get an external output/input.

41 2.4 Algorithms 17 Save/load a set of variables on/from its stable storage. These actions are denoted by the set A = {send, receive, set, get, save, load}. Furthermore, each automaton can also perform any finite number of local operations at every step in addition to the actions that do not affect other processes. Crashes are assumed to happen fairly, i.e., a process can always complete a step. The crash either happens before or after a step, but not inbetween. Of course, this is only valid, if a process does not calculate too long at one step, i.e., although the amount of local steps is arbitrary, a process can only take reasonable many local steps in order to avoid crashing in-between a step. As mentioned before, an algorithm implements a protocol, if it fulfills all its properties. The following definition explicates how an algorithm can achieve this, i.e., how it can access the input of a calling application and output the results of its computation. Definition 2.8 (Interface): An interface of a protocol provides events to processes. The set of all events of a protocol is denoted by E. Furthermore, two types of events are distinguished: Request event: The algorithm has to handle a request of its caller. Indication event: The algorithm informs its caller about the event. If an algorithm implements a protocol, it has to implement all request events and trigger indication events to its caller. Additionally, every protocol provides the following four events: Request init: Used to initialize variables at the beginning of the computation. Request recover: Used to handle recoveries of crashed processes. Indication save variables v 1,..., v i at register LOCATION: Used to save variables v 1,..., v i on stable storage at register LOCATION. Indication load variables v 1,..., v j at register LOCATION: Used to load data into variables v 1,..., v j from stable storage at register LOCATION. Note that if an algorithm does not implement all request events and one of them occurs, nothing bad happens, but the algorithm misses the occurring event. If an algorithm does not use save or load events, the algorithm does not use stable storage at all.

42 18 2 Model 2.5 Communication As already mentioned in definition 2.1 of asynchronous systems, the processes are able to communicate through a message exchange network, which is called links between the processes. The following definition formalizes these links. Definition 2.9 (Links): The processes in an asynchronous system are connected by links and communicate via message exchange, which is unreliable, i.e., messages can get lost and transmission delays are unknown. The set of all messages that are sent through the network is denoted by M. Every message m M is labeled with a unique identifier, which is provided by the system automatically. The sending of a message by a process to another one at a time is denoted by the function S: Π Π T M with: { m, if process p sends message m to process q at time t S(p, q, t) =, else The receiving of a message by a process from another one at a time is denoted by the function R: Π Π T M with: { m, if process q receives message m from p at time t R(q, p, t) =, else Note that the receiving of two different messages at the same time is excluded by this definition, and no process has access to the set M. The basic links provide no delivery guarantees, and message loss fairness is therefore added in the next subsection Fair Loss Links The unreliability of the links makes solutions of protocols impossible. With such links, all messages could get lost and no two-sided communication is possible. In order to solve this problem, some minimal delivery guarantees of the links are assumed. The message losses should be fair, i.e., not all messages should get lost. The following definition explicates such fair loss links, which were introduced by Lynch [1996] as strong loss limitation property of communication channels. Henceforth, these links are assumed to be present in every asynchronous system.

43 2.5 Communication 19 Definition 2.10 (Fair Loss Links): The links of an asynchronous system are fair loss links, if the following properties are satisfied: Property 2.1 (Fair Loss): If a process p infinitely often sends a message m to a good process q, then q receives m an infinite number of times. Formal: m M p, q Π t T t s, t r T : q good(f) S(p, q, t) = m S(p, q, t s ) = m t s > t R(q, p, t r ) = m t r > t Property 2.2 (Finite Duplication): If a process p only finitely often sends a message m to a process q, then q receives m only a finite number of times. Formal: m M p, q Π s T t, t T : t > t > s S(p, q, t) = m R(q, p, t ) = m Property 2.3 (No Creation): If a process q receives a message m from a process p, then m was previously sent to q by p. Formal: m M p, q Π t r T t s T : R(q, p, t r ) = m S(p, q, t s ) = m t s < t r Algorithms can use fair loss links via the following interface. Definition 2.11 (Interface of Fair Loss Links): If an algorithm wants to use fair loss links, it has to implement the following indication event and can use the following request event: Request FL-send message m to process q: Used to send message m to process q. Indication FL-receive message m from process q: Used to receive messages. Figure 2.5 shows how fair loss links can be used in algorithms in oder to guarantee message delivery. The sender has to transmit the message infinitely often. p 1 m m m m m m p 2 t Figure 2.5: Usage of the fair loss property: process p 1 sends message m to process p 2 infinitely often.

44 20 2 Model Stubborn Links Definition 2.10 provides delivery guarantees to the processes. Thus, algorithms have some guarantee that communication is possible. But, if a process sends a message to another one and wants to be sure that the message arrives, it has to send the message an infinite number of times. Since this sureness is needed in many algorithms, Guerraoui et al. [1996] introduced stubborn links. These links build upon fair loss links and ensure reception by infinitely many retransmits. The advantage of stubborn links is the hiding of the retransmission process. Thus, an algorithm does not need to care about the fair loss of the communication network. Figure 2.6 illustrates the technique. The sender issues one stubborn send command, and the link abstraction takes care of the retransmission to avoid message losses. p 1 m p 2 t Figure 2.6: Illustration of stubborn links: p 1 stubbornly sends m to p 2. Originally, Guerraoui et al. [1996] defined the retransmission for the last sent message of each process only. Later Oliveira et al. [1997] introduced k- stubborn links, which deal with the last k messages. Fortunately, the 1-stubborn links are sufficient for the yet to define consensus problem, as shown by the correctness proof of a consensus algorithm in the paper of Guerraoui et al. [1996]. Thus, the following definition, which is used in this thesis, is similar to the original one. Definition 2.12 (Stubborn Links): The links of an asynchronous system are stubborn links, if the following properties are satisfied: Property 2.3 (No Creation), confer page 19. Property 2.4 (Stubbornness): If a process p sends a message m to a good process q, and does not crash, and indefinitely delays sending any further message to q, then q eventually receives m. Formal: m M p, q Π t s, t r T t T : q good(f) S(p, q, t s ) = m t r > t > t s S(p, q, t) = p / F(t) R(q, p, t r ) = m Remark: Note that the requirement of the indefinite delay in property 2.4 does not mean that a process p is not allowed to send any new message m

45 2.5 Communication 21 after the sending of a message m in order to ensure the reception of m. In fact, no certain delay exists for which process p has to wait until it can send m without compromising the reception of m. This incertitude is expressed by property 2.4 in the part t r > t > t s S(p, q, t) = of the formal term, which means that p sends no message (S(p, q, t) = ) until q receives m (t r > t > t s ), where t represents all times between the sending time t s and the receiving time t r. Figure 2.7 illustrates the meaning of the stubbornness property 2.4. The indefiniteness of the delay, that a process has to wait before sending the next message, can cause message loss as in the case of m 2, where process p 1 did not wait long enough before sending m 3. Processes can overcome the indefiniteness, if they acknowledge the reception of messages. This technique is also used in the algorithms of this thesis. p 1 m 1 m 2 m 3 p 2 t Figure 2.7: Illustration of stubbornness: p 1 stubbornly sends m 1, m 2, and m 3 to p 2. Algorithms can use stubborn links via the following interface. Definition 2.13 (Interface of Stubborn Links): If an algorithm wants to use stubborn links, it has to implement the following indication event and can use the following request events: Request send message m to process q: Used to send message m to process q stubbornly. Indication receive message m from process q: Used to receive messages. Request single-send message m to process q: Used to send message m to process q once. Request stop-retransmit: Stops the stubborn sending of any message. As mentioned before, stubborn links can be built upon fair loss links. An algorithm which uses fair loss links and implements stubborn links is shown in appendix A, where also the differences to other stubborn links definitions are explicated. The reason for the last request event, which is called finalsend in the original definition by Guerraoui et al. [1996], is presented in the following subsection.

46 22 2 Model Quiescence Quiescence in the context of distributed systems means that a point in time exists after which all communication is finished. Aguilera et al. [1997] researched on quiescence of failure detectors first, but the idea is suited for all communication problems, because after a problem is solved, no more communication is desired. Definition 2.14 (Quiescence): An algorithm is quiescent, if all processes stop sending messages eventually. If an algorithm uses stubborn links, it should be able to stop the retransmission of the last message. For this task the stopretransmit event of definition 2.13 can be used. Note that quiescence can never be satisfied, if an algorithm that solves an agreement problem runs in the crash-recovery model and unstable processes are present. No point in time exists, after which no more I recovered messages occur, because the freshly recovered processes typically also have to agree on some result, and the algorithm cannot know at the time of the recovery that a process is unstable. Nevertheless, all algorithms of this thesis satisfy quiescence in the absence of unstable processes. If unstable processes are present, the communication is reduced to a minimum to get close to quiescence. 2.6 Failure Detection Any algorithm never terminates, if it waits for the delivery of messages sent by crashed processes. In order to prevent such situations, failure detectors are introduced to ensure the liveness of protocols. The following failure detector definition is valid for both failure models. Definition 2.15 (Failure Detector): A failure detector D is a function D: Π T 2 Π. Each process p Π is equipped with the same failure detector, but the output of these local failure detectors can differ. The failure detector at process p suspects process q of being down at time t, if q D(p, t). Algorithms can use failure detectors via the following interface. Definition 2.16 (Interface of Failure Detectors): If an algorithm wants to use a failure detector, it has to implement the following indication event: Indication suspect set of processes Q: The failure detector suspects all processes q Q.

Easy Consensus Algorithms for the Crash-Recovery Model

Easy Consensus Algorithms for the Crash-Recovery Model Reihe Informatik. TR-2008-002 Easy Consensus Algorithms for the Crash-Recovery Model Felix C. Freiling, Christian Lambertz, and Mila Majster-Cederbaum Department of Computer Science, University of Mannheim,

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

Failure Detection and Consensus in the Crash-Recovery Model

Failure Detection and Consensus in the Crash-Recovery Model Failure Detection and Consensus in the Crash-Recovery Model Marcos Kawazoe Aguilera Wei Chen Sam Toueg Department of Computer Science Upson Hall, Cornell University Ithaca, NY 14853-7501, USA. aguilera,weichen,sam@cs.cornell.edu

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

Failure detection and consensus in the crash-recovery model

Failure detection and consensus in the crash-recovery model Distrib. Comput. (2000) 13: 99 125 c Springer-Verlag 2000 Failure detection and consensus in the crash-recovery model Marcos Kawazoe Aguilera 1, Wei Chen 2, Sam Toueg 1 1 Department of Computer Science,

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

Eventually consistent failure detectors

Eventually consistent failure detectors J. Parallel Distrib. Comput. 65 (2005) 361 373 www.elsevier.com/locate/jpdc Eventually consistent failure detectors Mikel Larrea a,, Antonio Fernández b, Sergio Arévalo b a Departamento de Arquitectura

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector

More information

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x Failure Detectors Seif Haridi haridi@kth.se 1 Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to detect failures Heartbeats, timeouts,

More information

Asynchronous Models For Consensus

Asynchronous Models For Consensus Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information

Early consensus in an asynchronous system with a weak failure detector*

Early consensus in an asynchronous system with a weak failure detector* Distrib. Comput. (1997) 10: 149 157 Early consensus in an asynchronous system with a weak failure detector* André Schiper Ecole Polytechnique Fe dérale, De partement d Informatique, CH-1015 Lausanne, Switzerland

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

Shared Memory vs Message Passing

Shared Memory vs Message Passing Shared Memory vs Message Passing Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Revised: 15 February 2004 Abstract This paper determines the computational strength of the shared memory abstraction

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

Genuine atomic multicast in asynchronous distributed systems

Genuine atomic multicast in asynchronous distributed systems Theoretical Computer Science 254 (2001) 297 316 www.elsevier.com/locate/tcs Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper Departement d Informatique, Ecole

More information

A Realistic Look At Failure Detectors

A Realistic Look At Failure Detectors A Realistic Look At Failure Detectors C. Delporte-Gallet, H. Fauconnier, R. Guerraoui Laboratoire d Informatique Algorithmique: Fondements et Applications, Université Paris VII - Denis Diderot Distributed

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Ordering events. Lamport and vector clocks. Global states. Detecting failures. Required reading for this topic } Leslie Lamport,"Time, Clocks, and the Ordering

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures Technical Report Department for Mathematics and Computer Science University of Mannheim TR-2006-008 Felix C. Freiling

More information

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS A Dissertation by YANTAO SONG Submitted to the Office of Graduate Studies of Texas A&M University in partial

More information

Distributed Consensus

Distributed Consensus Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement

More information

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony Antonio FERNÁNDEZ Ernesto JIMÉNEZ Michel RAYNAL LADyR, GSyC, Universidad Rey Juan Carlos, 28933

More information

Approximation of δ-timeliness

Approximation of δ-timeliness Approximation of δ-timeliness Carole Delporte-Gallet 1, Stéphane Devismes 2, and Hugues Fauconnier 1 1 Université Paris Diderot, LIAFA {Carole.Delporte,Hugues.Fauconnier}@liafa.jussieu.fr 2 Université

More information

Relations between Asynchronous Systems with Crash-Stop and Asynchronous Systems with Omission Failures

Relations between Asynchronous Systems with Crash-Stop and Asynchronous Systems with Omission Failures Rheinisch-Westfälische Technische Hochschule Aachen Department of Computer Science Relations between Asynchronous Systems with Crash-Stop and Asynchronous Systems with Omission Failures (Zusammenhänge

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

On the weakest failure detector ever

On the weakest failure detector ever On the weakest failure detector ever The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Guerraoui, Rachid

More information

Consensus. Consensus problems

Consensus. Consensus problems Consensus problems 8 all correct computers controlling a spaceship should decide to proceed with landing, or all of them should decide to abort (after each has proposed one action or the other) 8 in an

More information

Impossibility of Distributed Consensus with One Faulty Process

Impossibility of Distributed Consensus with One Faulty Process Impossibility of Distributed Consensus with One Faulty Process Journal of the ACM 32(2):374-382, April 1985. MJ Fischer, NA Lynch, MS Peterson. Won the 2002 Dijkstra Award (for influential paper in distributed

More information

Finite-Delay Strategies In Infinite Games

Finite-Delay Strategies In Infinite Games Finite-Delay Strategies In Infinite Games von Wenyun Quan Matrikelnummer: 25389 Diplomarbeit im Studiengang Informatik Betreuer: Prof. Dr. Dr.h.c. Wolfgang Thomas Lehrstuhl für Informatik 7 Logik und Theorie

More information

Consensus when failstop doesn't hold

Consensus when failstop doesn't hold Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has

More information

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Xianbing Wang 1, Yong-Meng Teo 1,2, and Jiannong Cao 3 1 Singapore-MIT Alliance, 2 Department of Computer Science,

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

Automatic Synthesis of Distributed Protocols

Automatic Synthesis of Distributed Protocols Automatic Synthesis of Distributed Protocols Rajeev Alur Stavros Tripakis 1 Introduction Protocols for coordination among concurrent processes are an essential component of modern multiprocessor and distributed

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Model Checking of Fault-Tolerant Distributed Algorithms

Model Checking of Fault-Tolerant Distributed Algorithms Model Checking of Fault-Tolerant Distributed Algorithms Part I: Fault-Tolerant Distributed Algorithms Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef Widder LOVE 2016 @ TU Wien Josef Widder (TU

More information

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Xianbing Wang, Yong-Meng Teo, and Jiannong Cao Singapore-MIT Alliance E4-04-10, 4 Engineering Drive 3, Singapore 117576 Abstract

More information

Asynchronous Leasing

Asynchronous Leasing Asynchronous Leasing Romain Boichat Partha Dutta Rachid Guerraoui Distributed Programming Laboratory Swiss Federal Institute of Technology in Lausanne Abstract Leasing is a very effective way to improve

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Viveck R. Cadambe EE Department, Pennsylvania State University, University Park, PA, USA viveck@engr.psu.edu Nancy Lynch

More information

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA)

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Christian Mueller November 25, 2005 1 Contents 1 Clock synchronization in general 3 1.1 Introduction............................

More information

On the weakest failure detector ever

On the weakest failure detector ever Distrib. Comput. (2009) 21:353 366 DOI 10.1007/s00446-009-0079-3 On the weakest failure detector ever Rachid Guerraoui Maurice Herlihy Petr Kuznetsov Nancy Lynch Calvin Newport Received: 24 August 2007

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems ACM SIGACT News Distributed Computing Column 17 Sergio Rajsbaum Abstract The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These

More information

Distributed Algorithms Time, clocks and the ordering of events

Distributed Algorithms Time, clocks and the ordering of events Distributed Algorithms Time, clocks and the ordering of events Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Tolerating Permanent and Transient Value Faults

Tolerating Permanent and Transient Value Faults Distributed Computing manuscript No. (will be inserted by the editor) Tolerating Permanent and Transient Value Faults Zarko Milosevic Martin Hutle André Schiper Abstract Transmission faults allow us to

More information

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced Primary Partition \Virtually-Synchronous Communication" harder than Consensus? Andre Schiper and Alain Sandoz Departement d'informatique Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne (Switzerland)

More information

Crash-resilient Time-free Eventual Leadership

Crash-resilient Time-free Eventual Leadership Crash-resilient Time-free Eventual Leadership Achour MOSTEFAOUI Michel RAYNAL Corentin TRAVERS IRISA, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France {achour raynal travers}@irisa.fr

More information

Timeliness, Failure-Detectors, and Consensus Performance ALEXANDER SHRAER

Timeliness, Failure-Detectors, and Consensus Performance ALEXANDER SHRAER Timeliness, Failure-Detectors, and Consensus Performance ALEXANDER SHRAER Timeliness, Failure-Detectors, and Consensus Performance Research Thesis Submitted in Partial Fulfillment of the Requirements

More information

Introducing Proof 1. hsn.uk.net. Contents

Introducing Proof 1. hsn.uk.net. Contents Contents 1 1 Introduction 1 What is proof? 1 Statements, Definitions and Euler Diagrams 1 Statements 1 Definitions Our first proof Euler diagrams 4 3 Logical Connectives 5 Negation 6 Conjunction 7 Disjunction

More information

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:

More information

Distributed Systems Byzantine Agreement

Distributed Systems Byzantine Agreement Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.

More information

THE chase for the weakest system model that allows

THE chase for the weakest system model that allows 1 Chasing the Weakest System Model for Implementing Ω and Consensus Martin Hutle, Dahlia Malkhi, Ulrich Schmid, Lidong Zhou Abstract Aguilera et al. and Malkhi et al. presented two system models, which

More information

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at MAD Models & Algorithms for Distributed systems -- /5 -- download slides at http://people.rennes.inria.fr/eric.fabre/ 1 Today Runs/executions of a distributed system are partial orders of events We introduce

More information

6.852: Distributed Algorithms Fall, Class 10

6.852: Distributed Algorithms Fall, Class 10 6.852: Distributed Algorithms Fall, 2009 Class 10 Today s plan Simulating synchronous algorithms in asynchronous networks Synchronizers Lower bound for global synchronization Reading: Chapter 16 Next:

More information

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

The Heard-Of Model: Computing in Distributed Systems with Benign Failures The Heard-Of Model: Computing in Distributed Systems with Benign Failures Bernadette Charron-Bost Ecole polytechnique, France André Schiper EPFL, Switzerland Abstract Problems in fault-tolerant distributed

More information

Slides for Chapter 14: Time and Global States

Slides for Chapter 14: Time and Global States Slides for Chapter 14: Time and Global States From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, Addison-Wesley 2012 Overview of Chapter Introduction Clocks,

More information

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS I R I P U B L I C A T I O N I N T E R N E N o 1599 S INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTÈMES ALÉATOIRES A THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS ROY FRIEDMAN, ACHOUR MOSTEFAOUI,

More information

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) Week 1: Introduction, Logical clocks, Snapshots Dr. Borzoo Bonakdarpour Department of Computing and Software McMaster University Dr. Borzoo Bonakdarpour Distributed Algorithms

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

CS 453 Operating Systems. Lecture 7 : Deadlock

CS 453 Operating Systems. Lecture 7 : Deadlock CS 453 Operating Systems Lecture 7 : Deadlock 1 What is Deadlock? Every New Yorker knows what a gridlock alert is - it s one of those days when there is so much traffic that nobody can move. Everything

More information

COMP/MATH 300 Topics for Spring 2017 June 5, Review and Regular Languages

COMP/MATH 300 Topics for Spring 2017 June 5, Review and Regular Languages COMP/MATH 300 Topics for Spring 2017 June 5, 2017 Review and Regular Languages Exam I I. Introductory and review information from Chapter 0 II. Problems and Languages A. Computable problems can be expressed

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Causal Broadcast Seif Haridi

Causal Broadcast Seif Haridi Causal Broadcast Seif Haridi haridi@kth.se Motivation Assume we have a chat application Whatever written is reliably broadcast to group If you get the following output, is it ok? [Paris] Are you sure,

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Topics Clock Synchronization Physical Clocks Clock Synchronization Algorithms

More information

Weakening Failure Detectors for k-set Agreement via the Partition Approach

Weakening Failure Detectors for k-set Agreement via the Partition Approach Weakening Failure Detectors for k-set Agreement via the Partition Approach Wei Chen 1, Jialin Zhang 2, Yu Chen 1, Xuezheng Liu 1 1 Microsoft Research Asia {weic, ychen, xueliu}@microsoft.com 2 Center for

More information

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion Srikanth Sastry Computer Science and Engr Texas A&M University College Station, TX, USA sastry@cse.tamu.edu Scott M. Pike

More information

Theory of Computation

Theory of Computation Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 13: Algorithmic Unsolvability The Halting Problem I In the last lecture we have shown

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont Benchmarking Model Checkers with Distributed Algorithms Étienne Coulouma-Dupont November 24, 2011 Introduction The Consensus Problem Consensus : application Paxos LastVoting Hypothesis The Algorithm Analysis

More information

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Michel RAYNAL, Julien STAINER Institut Universitaire de France IRISA, Université de Rennes, France Message adversaries

More information

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H Cuts Cuts A cut C is a subset of the global history of H C = h c 1 1 hc 2 2...hc n n A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1,ec 2 2,...ec n n C = h

More information

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs Dafna Kidron Yehuda Lindell June 6, 2010 Abstract Universal composability and concurrent general composition

More information

Do we have a quorum?

Do we have a quorum? Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register

More information

Generalized Consensus and Paxos

Generalized Consensus and Paxos Generalized Consensus and Paxos Leslie Lamport 3 March 2004 revised 15 March 2005 corrected 28 April 2005 Microsoft Research Technical Report MSR-TR-2005-33 Abstract Theoretician s Abstract Consensus has

More information

6.852: Distributed Algorithms Fall, Class 24

6.852: Distributed Algorithms Fall, Class 24 6.852: Distributed Algorithms Fall, 2009 Class 24 Today s plan Self-stabilization Self-stabilizing algorithms: Breadth-first spanning tree Mutual exclusion Composing self-stabilizing algorithms Making

More information

CSE 123: Computer Networks

CSE 123: Computer Networks CSE 123: Computer Networks Total points: 40 Homework 1 - Solutions Out: 10/4, Due: 10/11 Solutions 1. Two-dimensional parity Given below is a series of 7 7-bit items of data, with an additional bit each

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Asynchronous Communication 2

Asynchronous Communication 2 Asynchronous Communication 2 INF4140 22.11.12 Lecture 11 INF4140 (22.11.12) Asynchronous Communication 2 Lecture 11 1 / 37 Overview: Last time semantics: histories and trace sets specification: invariants

More information

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No! A subtle problem An obvious problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be S is anyway executed after LC = t Fixes: if e is internal/send and

More information

Outline F eria AADL behavior 1/ 78

Outline F eria AADL behavior 1/ 78 Outline AADL behavior Annex Jean-Paul Bodeveix 2 Pierre Dissaux 3 Mamoun Filali 2 Pierre Gaufillet 1 François Vernadat 2 1 AIRBUS-FRANCE 2 FéRIA 3 ELLIDIS SAE AS2C Detroit Michigan April 2006 FéRIA AADL

More information

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong Outline Introduction Asynchronous distributed systems, distributed computations,

More information

Appendix A Prototypes Models

Appendix A Prototypes Models Appendix A Prototypes Models This appendix describes the model of the prototypes used in Chap. 3. These mathematical models can also be found in the Student Handout by Quanser. A.1 The QUANSER SRV-02 Setup

More information

Overview. Discrete Event Systems Verification of Finite Automata. What can finite automata be used for? What can finite automata be used for?

Overview. Discrete Event Systems Verification of Finite Automata. What can finite automata be used for? What can finite automata be used for? Computer Engineering and Networks Overview Discrete Event Systems Verification of Finite Automata Lothar Thiele Introduction Binary Decision Diagrams Representation of Boolean Functions Comparing two circuits

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Achour Mostefaoui, Michel Raynal To cite this version: Achour Mostefaoui, Michel Raynal. Signature-Free Broadcast-Based

More information

Time in Distributed Systems: Clocks and Ordering of Events

Time in Distributed Systems: Clocks and Ordering of Events Time in Distributed Systems: Clocks and Ordering of Events Clocks in Distributed Systems Needed to Order two or more events happening at same or different nodes (Ex: Consistent ordering of updates at different

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra I.B.M Thomas J. Watson Research Center, Hawthorne, New York and Sam Toueg Cornell University, Ithaca, New York We introduce

More information

Axiomatic set theory. Chapter Why axiomatic set theory?

Axiomatic set theory. Chapter Why axiomatic set theory? Chapter 1 Axiomatic set theory 1.1 Why axiomatic set theory? Essentially all mathematical theories deal with sets in one way or another. In most cases, however, the use of set theory is limited to its

More information

Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science. Bachelor Thesis. From Uppaal To Slab.

Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science. Bachelor Thesis. From Uppaal To Slab. Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science Bachelor Thesis From Uppaal To Slab submitted by Andreas Abel submitted August 26, 2009 Supervisor Prof.

More information

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014 Overview of Ordering and Logical Time Prof. Dave Bakken Cpt. S 464/564 Lecture January 26, 2014 Context This material is NOT in CDKB5 textbook Rather, from second text by Verissimo and Rodrigues, chapters

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information