Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures

Size: px
Start display at page:

Download "Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures"

Transcription

1 Research Report 56/2007. Technische Universität Wien, Institut für Technische Informatik (Revised Version of Technical Report 183/1-126, Department of Automation, Technische Universität Wien, Jan. 2003) Josef Widder Ulrich Schmid Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures Originally published in Distributed Computing. c Springer Verlag Abstract This paper provides description and analysis of a new clock synchronization algorithm for synchronous and partially synchronous systems with unknown upper and lower bounds on delays. It is purely message-driven, timer-free and relies on a hybrid failure model incorporating both process and link failures, in both time and value domain. Unlike existing solutions, our algorithm works during both system start-up and normal operation: Whereas bounded precision (the mutual deviation of any two clocks) can always be guaranteed, accuracy (clocks being within a linear envelope of real-time) and hence progress is only ensured when sufficiently many correct processes are eventually up and running. By means of a detailed analysis, we provide formulas for resilience, precision and envelope bounds. Keywords Fault-Tolerant Distributed Algorithms Initial Clock Synchronization System Start-up Hybrid Failure Models Link Failures Partially Synchronous Systems 1 Introduction Clock synchronization is an important service in distributed systems [29]. It assumes that every process p owns a discrete clock C p (t), which is periodically adjusted by a faulttolerant clock synchronization algorithm. We will focus on deterministic clock synchronization here, which must guarantee the following properties: (P) Precision: The simultaneous reading of any two correct clocks may deviate by at most some D max. This work has been supported by the Austrian START programme Y41-MAT, the BM:vit FIT-IT Embedded Systems project DCBA (proj.no ), and the FWF project Theta (proj.no. P17757-N04). Josef Widder Ulrich Schmid Technische Universität Wien, Embedded Computing Systems Group (E182/2), Treitlstraße 3, 1040 Vienna, Austria, EU, {widder,s}@ecs.tuwien.ac.at Josef Widder Laboratoire d Informatique LIX, Ecole Polytechnique, Palaiseau Cedex, France, EU (A) Accuracy: Any correct clock remains within a linear envelope of real-time. Many different clock synchronization algorithms have been proposed in literature [49,35,38,1]. Most of these assume systems with known bounds on the durations of computing steps and communication delays, and cannot handle system start-up. In real systems, however, each process may start independently at some unpredictable time. Thus, processes may not have completed booting when some earlier process starts sending messages. During start-up, even messages from correct processes may hence be lost, and failure assumptions like the one that less than a third of the processes are Byzantine faulty (which is necessary [13] for achieving (P) and (A) in the presence of Byzantine failures) do not hold. Moreover, many real networks (like the Internet) cannot be modeled properly as synchronous systems [53,11,22]. To implement clock synchronization 1 in such systems, a timer free start-up mechanism is required. In [55], we provided a solution to this problem that unlike naïve startup algorithms avoids increasing the required number of processes and/or adding a priori timing assumptions to the system model: By modifying the clock synchronization algorithm by Srikanth and Toueg [51], which is based on the well-known consistent broadcasting primitive, we derived a simple and efficient clock synchronization algorithm that requires just n 3 f + 1 processes for coping with up to f Byzantine faulty processes and works both during normal operation and system startup: Whereas some precision D max is guaranteed during the whole system lifetime, progress of the clocks, i.e., accuracy, is only guaranteed when sufficiently many correct processes are eventually up and running. In this paper, we will present and analyze a variant of the algorithm of [55] under a powerful hybrid failure model. Since less severe failures can be handled with fewer processes than more severe ones, a hybrid failure model leads to 1 Although clock synchronization is traditionally studied in synchronous systems with hardware clocks, it is a useful service in partially synchronous systems with software clocks (counters) as well; see Sect. 5 for details.

2 2 Josef Widder, Ulrich Schmid a decreased system size in the presence of realistic failures [54, 3]: Given the maximum number of failures of certain types, the required number of processes is smaller than that of a solution where all even benign failures are treated as Byzantine. As demonstrated by Powell [34], less processors in the system that could fail may actually increase the overall system s dependability. Apart from the refined treatment of process failures, our perception-based analysis also shows that the algorithm tolerates a large number of communication failures, i.e., moving link failures. It can therefore be applied even in typical wireless settings, where link failure rates up to 10 2 are common [47,41]. Designed for synchronous or partially synchronous systems with unknown lower and upper bounds on delays, our round-based algorithm is completely message-driven, does not employ timers, and can hence be run on systems with different timing characteristics without recompilation. In fact, it is only the achieved precision and the envelope bounds but not the algorithm itself that depend on the underlying system s timing behavior: If the lower and upper bound τ, τ + on delays used for computing D max hold during an execution, the algorithm s actual precision in this execution will not exceed D max. Note that, in contrast to classic clock synchronization algorithms, D max does not depend on the delay uncertainty ε = τ + τ but rather on the delay ratio Θ = τ + /τ. Moreover, it is elastic with respect to timing assumptions: If the assumed bounds are (temporarily) violated, the actual precision might (but need not necessarily) exceed D max ; it eventually returns to within D max, however, when the assumed bounds hold again. The remainder of this paper is organized as follows: After a survey of existing work in Sect. 2, we provide a glossary of all our notation in Sect. 3 for further referencing. In Sect. 4, we introduce our system and failure model. It is a refinement of the generic perception-based failure model of [42] extended to the system startup phase. In Sect. 5, we present our clock synchronization algorithm and its operation principles. Sect. 6 and Sect. 7 contain the analysis of our algorithm in early and degraded mode, respectively, when not sufficiently many processes have completed booting. In Sect. 8, we investigate our algorithm s performance in normal mode, when sufficiently many processes are eventually up and running. A discussion of certain impacts of our synchrony and failure assumptions in Sect. 9 and a summary of our accomplishments in Sect. 10 round off the paper. Although clock synchronization in synchronous systems [1, 13,49,35,48,38,32], as well as in partially synchronous systems [15, 33], is a very well-researched subject, there are only a few papers [51,31,26,32,52] that deal with initial synchronization. Rather than considering a full system startup, however, most of those papers are devoted to integrating a new process into an already running system. The only exceptions known to us are [52,10], which deal with solutions to the startup problem in very specific TDMA system architectures, and [26], which considers an approach for initialization in the MAFT architecture under stronger system assumptions. None of these solutions is messagedriven and time(r)-free and works in partially synchronous systems. This is also true for the self-stabilizing Byzantine clock synchronization algorithms of [14, 12]. Although selfstabilization obviously solves the booting problem as well, it is an overkill here in that booting nodes do not start from an arbitrary internal state. Consequently, in sharp contrast to our solution, none of the algorithms of [14,12] can provide constant booting (i.e., stabilization) time. Clock synchronization in the presence of link failures has been studied in [36] and in some of our previous work [39, 45, 44, 40, 27]. None of these papers considered initial startup and, to the exception of [40,27], all those papers consider synchronous systems only. In [27], we introduced a time(r)-free implementation of the perfect failure detector in partially synchronous systems based on consistent broadcasting. In [23], it was shown how such an algorithm can be implemented in an architecture based on broadcast bus networks. In both [27,23], however, we assumed that all processes are up right from the start. This assumption is relaxed in the implementation of the eventually perfect failure detector in [57], which employs the clock synchronization algorithm of [55] to cope with the booting problem. 3 Notational Conventions Throughout the paper, we use the following notation: Processors are denoted by lowercase letters like p and q; round numbers by lowercase letters like k and l. Process names are upper case letters but usually represented by their processors. Processor subscripts denote the process where a quantity like Vq p,k and tq p,k is locally available, processor superscripts denote the remote source of a quantity. Calligraphic variables like Vp r denote sets or vectors, bold variables like τ denote intervals. Real-time values and variables are denoted by lower case letters like t,t, logical time values are just round numbers k,k. An overview of the most important terms and variables, including a reference to their definition, is provided in Table 1. 2 Related Work 4 System Model In this section, we will define our system model. Starting from the basic execution and timing model in Sect. 4.1 and Sect. 4.2, we introduce a hybrid failure model for roundbased algorithms in Sect. 4.3, which distinguishes different classes of process and communication failures, both in the time and in the value domain. In Sect. 4.4, we define a convenient abstraction originally proposed in [40, 47, 42] which considerably simplifies the analysis of round-based algorithms. It is based on how a process locally perceives

3 Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures 3 Name Description Definition Page /0 empty perception value Sect be k p = Vp(t k p) k process p s round k broadcast event Sect k = C p (t) p s logical clock value (i.e., round number) at real-time t Sect. 5 9 C max (t) maximum clock value in the system at real-time t Definition D max maximum precision during whole operation Theorem 5 17 D MCB maximum precision from degraded mode on Theorem 4 16 D MCB maximum precision in normal mode Theorem δq p,k end-to-end delay of round k message from p q Sect init synchronization latency Theorem 8 19 Ep k process p s perception vector for (echo,k) Sect (echo,k) algorithm s echo-message for clock value k Fig ε = τ + τ correct end-to-end delay uncertainty Sect f a maximum number arbitrary faulty processes Definition 7 6 f s maximum number symmetric faulty processes Definition 7 6 f i maximum number omission faulty processes Definition 7 6 f c maximum number clean crash faulty processes Definition 7 6 fl s maximum number of outbound link failures per round and process Definition 10 7 fl sa maximum number malign outbound link failures per round and process Definition 10 7 fl r maximum number of inbound link failures per round and process Definition 10 7 fl ra maximum number malign inbound link failures per round and process Definition 10 7 σfirst k 1 real-time when first process s clock reaches k Definition Ip k process p s perception vector for (init,k) Sect (init,k) algorithm s init-message for clock value k Fig I σ (t) indicator function for non-synchrony with real-time t Definition M set of messages created by obedient processes during a step Sect (N1) n fl s + 2 f l ra + 2 fl r + 3 f a + 2 f s + 2 f i + f c + 1 Sect (N2) n 2 fl s + 2 f l ra + 2 fl r + 3 f a + 3 f s + 2 f i + 2 f c + 1 Sect n up (k) number of processes that see all round k messages Definition P up (k) set of processes that see all round k messages Definition Π = {1,..., n} set of all processes Sect peq p,k = Vq p,k (tq p,k ) process q s round k perception event for sender p Sect R k matrix of round k messages actually received system-wide Equation 3 6 S k matrix of round k messages sent system-wide Equation 2 6 S set of messages sent by obedient processes during a step Sect σp k process p s round switching time k k+ 1 Sect tick k abbreviation for (init,k) and/or (echo,k) messages Sect tp k occurence time process p s round k broadcast event Sect tq p,k occurrence time process q s round k perception event for sender p Sect τ,τ + lower and upper bounds on end-to-end delay Sect t up real-time when late starter gets up Theorem 5 17 t sync real-time when late starter gets synchronized Theorem 6 18 Θ = τ + /τ correct end-to-end delay ratio Sect Vp k value sent by p in round k message Sect Vq p,k process q s perception of the round k value from sender p Sect V k = V k (t) round k perception matrix of the system Sect Vp k = Vp k (t) round k perception vector of process p Sect Table 1 Glossary of our Notation the behavior of the other processes in a round, i.e., the presence and absence of this round s messages. In Sect. 4.5, we will extend this perception-based model to incorporate the system booting phase. 4.1 Execution Model We consider a distributed system of n processors linked by a fully connected point-to-point network. Every processor is identified by a unique processor id p Π = {1,...,n}. Processors are down initially and boot at unpredictable times, after which they are called up. When up, every processor executes one or more concurrent processes, appropriately scheduled on the processor s CPU. Processes are uniquely identified system-wide by the tuple p, N, where the process name N is chosen from a suitable name space. Since a process will usually communicate with processes of the same name, we will distinguish processes primarily by their processor ids and suppress process names when they are clear from the context. Actually, we will define our process failure model for a single process per processor only. If a distributed algorithm consists of several processes per pro-

4 4 Josef Widder, Ulrich Schmid cessor, the failure model must be applied at the processorlevel, in conjunction with the assumption that all processes executed by a single processor may commit failures at most as severe as the failure mode of the processor allows. Concerning the communication between processes, we assume that every pair of processes is connected via a pair of dedicated unidirectional links. Links are considered independent 2 of each other and need not be FIFO. Executions are modeled as sequences of atomic computing steps. We assume a message-driven model [24], where all steps except the initial one, which is triggered when the processor gets up of processes that faithfully execute their algorithms are triggered by some message reception (possibly from itself). More specifically, a computing step s consists of the reception of exactly one message 3, a state change (depending on the former local state of the process and the received message), and the generation and sending of a (possibly empty) set M s of messages. We call a step s correct if the process both changes its state and creates M s according to its algorithm and successfully sends M s. A step s is called obedient, if the process changes its state and creates M s according to its algorithm, but succeeds only to send a (possibly empty) subset of messages S s M s. A step that is neither correct nor obedient will be called faulty. The following related terms will be used in the sequel: Definition 1 (Correct process) A process p is correct up to step s in an execution, if it performs a correct step on every message reception up to step s. A process p is correct in an execution, if it performs a correct step on every message reception. Note that we assume that the algorithm also contains code that handles (typically via a no-operation) the reception of any illegal message. An illegal message is a message that is detectably malformed w.r.t. the protocol, e.g. bad checksum, obviously wrong message content and/or message format. Definition 2 (Obedient process) A process p is obedient up to step s in an execution if it performs an obedient step on every message reception up to step s. A process p is obedient in an execution, if it performs an obedient step on every message reception. Note that, according to this definition, a correct process is also obedient. In Sect. 4.3, we will use the notion of correct, obedient and faulty steps to define the exact semantics of process failures. According to Definition 6, benign faulty processes (crash and send omissions 4 ) are obedient, whereas non-benign faulty processes (symmetric and arbitrary) may also take faulty steps. 2 Hence, we do not model the fact that all links between processes residing on the same pair of processors share the single physical link connecting the two processors. Link failure budgets will also be assigned on a per process basis, rather than on a per processor basis. 3 Formerly, as in [17], such algorithm were denoted asynchronous. 4 Receive omissions and hence general omissions are incorporated via communication link failures, rather than via process failures. See Sect. 4.3 for details and Sect. 9.2 for some discussion. 4.2 Timing Assumptions The execution model introduced above is based on atomic steps, which are executed in zero time. Since real processors and communication links have finite speed, computation and communication delays (resulting from processing, transmission and queuing) are modeled via some non-zero time interval between computing steps. Unlike the classic partially synchronous model by Dwork, Lynch, and Stockmeyer [15], which postulates the existence of (possibly unknown) bounds on processing speed ratio and communication delays, we employ a timing model that is solely based on the end-to-end delays of successfully received messages. In the following definitions, we take a sender-centric view, in that we consider the transmission of some particular message m that is sent by some process p. While the receiver may get multiple messages with the same content as m (e.g., due to failures or due to some algorithm which does not use unique messages), we focus on the question of whether/when a particular message m sent by p is received. Definition 3 (Successful reception) A message m S s sent by process p to an obedient process q in some step s is successfully received by q, if q eventually takes a step that is triggered by m. Successfully received messages will be further classified according to their end-to-end delay, which is defined as follows: Definition 4 (End-to-end delay) Let process p send message m S s to process q in some step s at time t s, and let m be successfully received by the obedient process q in some step r at time t r. The end-to-end delay of the successfully received message m is defined as δq p,m = t r t s. In the following sections, we will consider executions E = E(τ,τ + ), which are implicitly parametrized by two (unvalued) timing parameters τ and τ +. Informally, τ and τ + represent the lower respectively upper bound on the endto-end delay of successfully received messages in E i.e., the values of these bounds may differ for different executions. They must satisfy 0 < τ τ + < and hold also for the self-reception delay δp p,m (cf. Sect. 9.3 for how to remove this simplifying assumption). Let τ = [τ,τ + ] be the closed interval spanned by those bounds. Definition 5 (Timely reception) A message m that is successfully received by obedient process q and sent by process p (including p = q) is timely received at q if δq p,m τ, i.e., τ δq p,m τ +. The resulting bound for δq p,m s delay uncertainty and delay ratio, which will play a central role in our analysis, is given by ε = τ + τ resp. Θ = τ + /τ. Obviously, ε and Θ solely depend on τ and τ + of the execution under consideration. It is important to note that τ and τ + are not known to the algorithm analyzed in this paper. Rather, they are unvalued variables that are solely used for analysis purposes:

5 Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures 5 The formulas derived for quantities like precision or accuracy involve τ and τ +. Obviously, those formulas can be used to actually compute those quantities when the algorithm is employed in a certain system: For example, all executions of a system S adhering to the synchronous model obey some a priori known bounds τ and τ +. Hence, plugging in τ = τ and τ + = τ + into e.g. our precision formula (D max ) provides the worst case precision of our algorithm in any execution of S. Our formulas can also be used for computing those quantities in other system models, however, in particular, for the partially synchronous system model with unknown delay bound from [15]. See Sect. 9.1 for details. 4.3 Hybrid Failure Model for Round-based Algorithms In this section, we will specify our comprehensive hybrid failure model for round-based algorithms. Such algorithms execute in a sequence of (non-lockstep) rounds, where every process broadcasts the same message 5 to all processes. Following the detailed definition of our round-based algorithms, we introduce several classes of process failures and associated failure bounds. Next, we provide various classes of communication failures (link failures) and associated failure bounds. Link failures, which may be moving (transient) here, are defined on a per round basis, and are orthogonal to process failures. To simplify the analysis of distributed algorithms under our detailed failure model, we proceed with a high-level abstraction based on how processes perceive each other in a round [47,42], based on the set of received messages. Finally, we extend the validity of the resulting perception-based failure model to the system booting phase. In this paper, we consider asynchronous round-based algorithms only. Every process executes a sequence of consecutive rounds k = {0, 1,...} here, which are asynchronous in the sense that different processes may be in different rounds at the same time. In round k, obedient process p receives round k messages, performs some computation based on the received messages and its local state, and broadcasts (i.e., sends to all processes in Π, including itself) the resulting round k + 1 message. The round k message sent from obedient process p to q in such a broadcast step has the form (Vp,k) k p,q, where Vp k represents an algorithm-specific value and k the current round number. The sender identifier p and the receiver identifier q are not explicitly included in the message, since they are implicitly determined according to our point-to-point network assumption (we assume that the network prevents masquerading). These subscripts are hence usually omitted for brevity. In more detail, obedient process p s initial round k = 1 consists of a single step s 1 p, triggered when the process gets up, where p s round 0 message (Vp,0) 0 is broadcast. 5 Only broadcast-based algorithms can benefit from our class of symmetric failures (cf. Definition 6), which allow a faulty processes to broadcast an erroneous value. As long as such erroneous values are received consistently by all receivers, symmetric failures are less severe than Byzantine failures. Process p s round k 0 consists of some number l k 1 of round k steps, which are triggered by the successful reception of the round k messages from the processes. Steps 1,...,l k 1 are pure receiving steps, where no messages are sent by p (such that M s = /0). Process p s round k is terminated by the round k step l k, termed p s round k round switching step s k p, in which p computes and broadcasts its round k + 1 message (Vp k+1,k + 1). Hence, M s k p = n here. Typically (but depending on the particular algorithm), the round switching step occurs when sufficiently many round k messages (from distinct peers) have arrived. Let the round switching time σp k be the real-time when the round switching step s k p occurs, that is, when p switches from round k to k + 1. We consider consecutive rounds only, such that σp 1 σp 0 σp 1... for every obedient process p. Note that round k messages arriving after σp k are discarded, i.e., we consider a communication-closed model. From the above, it is apparent that both the local state reached by process p after its round k round-switching step s k p and the content Vp k+1 of the round k + 1 message broadcast in step s k p are based on p s local state at the beginning of (the first step of) round k and all 6 the round k messages received during round k. Nevertheless, since rounds are asynchronous, it may of course happen that some round k k steps triggered by the reception of a round k message from some peer occur at p when it is actually in round k. Note that such early messages are of course not illegal ones, recall our comment following Definition 1. In Sect. 4.1, we described the behavior of correct and obedient processes, both of which faithfully execute their algorithm. Here we add the behavior of non-benign faulty processes: A non-benign faulty process may take faulty steps, which are not in accordance with its algorithm and may thus send messages of arbitrary content to arbitrary receivers. We distinguish processes which exhibit arbitrary failures here, where there is no restriction on the set of messages sent in any step, and symmetric failures (similar to identical Byzantine failures from [2]), where, for any step s of process p in an execution, it holds that either no message is sent in s at all, or the same (possibly erroneous) message m is broadcast. Definition 6 (Process Behaviors) The behavior of a process in an execution is classified as follows: Correct: See Definition 1. Clean Crash. A process that is correct up to some step and then does not take any further steps. Crash: A process that is obedient up to some step and then does not take any further steps. Symmetric Omission: An obedient process, where all steps s satisfy S s = /0 or S s = M s. Asymmetric Omission: An obedient process as introduced in Definition 2. 6 If multiple round k messages arrive from the same process q, which can happen if q is faulty, for example, the handling of these messages may be determined by the algorithm.

6 6 Josef Widder, Ulrich Schmid Symmetric: A process p, where for all steps s in the execution either S s = /0 or (V,k) : q Π : (V,k) p,q S s S s = Π. Arbitrary: A process with no restriction on the recipients and content of the messages S s sent in step s. Processes exhibiting failures ranging from clean crashes up to asymmetric omissions are called benign and are obedient. Both symmetric and arbitrary faulty processes are nonbenign and not obedient, such that no assumption is made on how steps are triggered and performed on such processes. Our hybrid process failure model rests on the following upper bounds on the number of faulty processes during an execution. Definition 7 (Maximum number of process failures) We assume that, during every execution of an algorithm, at most f c processes are either clean crash or symmetric omission faulty (obedient processes that either perform complete broadcasts or full omissions), f i processes are either crash faulty or asymmetric omission faulty (obedient processes that may perform incomplete broadcasts), f s processes are symmetric faulty, and f a processes are arbitrary faulty. In addition to process failures, our failure model also provides communication failures (i.e., link failures). In sharp contrast to process failures, link failures may also hit messages sent and received by correct processes, and are typically moving and transient [37], in the sense that they can affect different links in different rounds of an execution. The following Definition 8 classifies the communication failures provided by our failure model, by considering their effects on a single message. Note that we only have to consider obedient receivers in our specifications, since non-obedient receiver processes need not follow their algorithm anyway. Definition 8 (Link behaviors) The communication behavior when sending a message (V k p,k) p,q S s M s in step s of process p to some obedient process q is classified as follows: Correct: The end-to-end delay δq p,k satisfies δq p,k τ (see Definition 5) and the message is received unaltered. Lost: During the whole execution, no step is triggered by the reception of message (Vp,k) k p,q at q. Untimely: The end-to-end delay δq p,k satisfies δq p,k τ (i.e., early or late message) and the message is received unaltered. Corrupted: The message content (value Vq p,k and/or round number k ) received at q differs from the message content (value Vp k and round number k) sent. Spurious: A message that has not been sent by p is received by q. Lost, untimely, corrupted or spurious communication behavior is termed a link failure; a link failure which is not a lost message is termed a malign link failure. Since it is impossible to solve any representative distributed computing problem in presence of unrestricted communication failures [20, 37], the power of link failures must be restricted somehow. As in [40,47,42], we will restrict the admissible link failure patterns by requiring that, for every process, only a certain fraction of its outbound links and a certain fraction of its inbound links suffer from link failures. Note that a single link failure hitting the link from process p to q affects both the outbound link to receiver q of the sender p and the inbound link from sender p of the receiver q. The links actually hit by link failures can change from round to round, i.e., may be moving [37]. In [41,47], we have shown that this model has good coverage even in typical wireless settings, where link failure rates up to 10 2 are common. A comprehensive collection of impossibility results and lower bounds for consensus in this model can be found in [46]. To formally define our link failure model, we start with the set Sp(q) k of round k messages actually sent by process p to process q in all 7 steps s of an execution E: { } Sp(q) k = m p,q = (,k) p,q : m p,q S s (1) s E Clearly, Sp(q) k = /0 or Sp(q) k = {(Vp,k) k p,q } for every obedient sender process p, depending on whether p suffered from a send omission in its round k 1 switching step s k 1 p or not. For any non-benign faulty process p, however, Sp(q) k can be arbitrary since there are no restrictions on how many round k messages are sent to q in an execution. The sets of round k messages sent system-wide in an execution can hence be represented by the following matrix: S1 k (1) S k 2 (1)... S k n (1) S k S1 k = (2) S 2 k (2)... S k n (2) (2).... S1 k(n) S 2 k (n)... S k n (n) Note that the p-th column in this matrix provides all the round k messages sent by p (to any receiver). Similarly, the q-th row contains all the round k messages sent to q (from any sender). Our link failures act on the above matrices (2), for any k, thereby leading to possibly modified matrices R1 k (1) Rk 2 (1)... Rk n(1) R k R1 k = (2) Rk 2 (2)... Rk n(2)...., (3) R1 k(n) Rk 2 (n)... Rk n(n) with entry R k p(q) denoting the set of round k messages actually received by q over the inbound link from p. Extending the single-message link semantics given in Definition 8 to 7 Since we are dealing with a communication-closed model, we could even restrict our attention to round k messages sent before the round k switching step of q, i.e., to all steps s pref k q(e) in the finite prefix pref k q(e) of E up to step s k q in (1).

7 Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures 7 the behavior relevant for round k messages, we arrive at the following Definition 9. Definition 9 ( Round k link behaviors) Given the systemwide matrices of round k messages (2) and (3), we distinguish the following round k link behavior between sender p and obedient receiver q: Correct: R k p(q) = S k p(q), and the successful receptions are timely. Lost: R k p(q) S k p(q), and additionally the successful receptions are timely. Untimely: R k p(q) S k p(q), and at least one message m R s p(q) is untimely. Corrupted: R k p(q) S k p(q), due to at least one message m R k p(q) that is corrupted. Spurious: R k p(q) S k p(q), due to at least one message m R k p(q) that is spurious. Lost, untimely, corrupted or spurious round k link behavior is termed a round k link failure. Note that a single malign link failure that changes a message (V k p,k) to (V k p,l) affects both R k and R l. Finally, our link failure model given in Definition 10 below will just restrict the number of entries in the matrix (3) that are affected by link failures as introduced above. Definition 10 (Maximum number of round k link failures) For any k 0 and for any process p Π and any obedient process q Π, in matrix R k, all entries are correct except (R) at most fl r entries in row q may be affected by round k link failures, with at most fl ra fl r of those caused by malign link failures, and (S) at most fl s entries in column p may be affected by round k link failures, with at most fl sa fl s of those caused by malign link failures. Obviously, (R) bounds the number of failures per round at inbound links of the obedient process q, while (S) bounds the number of failures per round at the outbound links of process p. The particular links actually hit by link failures may be different in different rounds. In addition, since the effects of process failures and link failures are orthogonal, it can of course happen that a link failure hits a message from a faulty sender process. Note that this actually increases the power of the adversary: In case of a clean crash or symmetric faulty sender process p, for example, malign link failures can create spurious messages or message with erroneous content from p at some receivers. Further details and consequences of our failure assumptions will be provided in Sect Perception-based Analysis In Sect. 4.3, we defined our hybrid failure model by looking into the details of the execution model: How (faulty) processes may perform their steps and how messages may be affected by communication failures. We will now introduce a higher level of abstraction, which considerably simplifies the analysis of distributed algorithms in our model. This abstraction rests on how processes perceive their peers at the end of a round, i.e., whether and which round k messages they have received. In Lemma 1 below, we will derive some properties of those perceptions from our physical failure model. The round-based model introduced in Sect. 4.1 can be viewed at a higher level of abstraction as follows: For every round number k, we assume that every (obedient) process q collects the values received in round k in a local array Vq k = (Vq 1,k,...,Vq n,k ) called perception vector. The multiset Vq p,k Vq k (called perception) is either the special value /0 if no round k message from process p has been received yet, or it contains the values of all 8 round k messages received over the link from process p so far. Obviously, Vq k = Vq k (t) as well as its entries Vq p,k = Vq p,k (t) are time-dependent (we will usually suppress the observation real-time t in order not to overload our notation): Initially, all Vq p,k (0) = /0. In the first step where process q receives a round k message from process p, at some time tq p,k > 0 (called perception event peq p,k here), the received value is put into the perception Vq p,k ; thus Vq p,k (t ) = /0 if t < tq p,k and Vq p,k (t ) /0 otherwise. Note that tq p,k = in case of complete message loss. Process p s round k terminates at the round switching time σ k p, which is the real-time when the round k switching step s k p occurs (where p s round k + 1 messages are sent). This instant is called broadcast event be k+1 p = V k+1 p (t k+1 here and occurs at real-time t k+1 p p ) = σp. k In case of synchronous algorithms, round switching at time σp k is enforced by some means external to the algorithm. In case of messagedriven algorithms like the one of this paper, σp k is determined by the algorithm itself, typically, when sufficiently many messages have been received. The value Vp k+1 broadcast in p s round k+1 message is computed from the round k perceptions available 9 in Vp k = Vp k (σp) k and p s local state at time σp. k Process p s broadcast event be k p = Vp(t k p) k and process q s perception event peq p,k = Vq p,k ) are related via their values V k p, V p,k q of no failure, V p,k q. (In case = t k p+δq p,k, with the endto-end delay δ p,k q (t p,k q and their occurrence times t k p and t p,k q = {V k p } and t p,k q τ.) Our perception-based analysis rests on the n n perception matrix V k (t) of all round k perceptions observed at 8 Multiple round k messages via the same link can occur in case of failures and/or late booting retransmissions. 9 More specifically, in an algorithm-specific way, a single value may be chosen from V k p and used to compute the next state.

8 8 Josef Widder, Ulrich Schmid the same real-time t at all processes typically, the round switching time of some process: V1 k (t) V k V2 k (t) = (t). = Vn k (t) V 1,k 1 (t) V 2,k 1 (t)... V n,k V 1,k 2 (t) V 2,k 2 (t)... V n,k.. n (t) V 2,k V 1,k n. (t)... Vn n,k 1 (t) 2 (t). (4) (t) Note that V k (t) and the (in time and space) global matrix R k of round k messages defined in (3) are of course closely related: Vq p (t) is the subset of the messages in R k p(q) with reception times before or at time t, or /0 if there is no such message. The above perception matrix V k (t) is a quite flexible basis for the analysis of distributed algorithms, since it provides a system-global view of all the processes local views (i.e., perception vectors) at an arbitrary observation time t. The primary way of using the perception matrix in the analysis of agreement-type algorithms like the clock synchronization algorithm of this paper is the following: Given the perception vector Vq k (σq) k of some specific obedient receiver process q at its round switching time σq k, it allows to determine how many non-empty perceptions will at least be present in any other obedient process r s perception vector Vr k (σq k + ε) shortly thereafter. The following Lemma 1 developed in [4, 42] formalizes this fact. Lemma 1 (Difference in Perceptions) At any time t, the perception vector Vq k (t) of any process at an obedient receiver q may contain at most fl ra + f a + f s time and/or valuefaulty perceptions Vq p,k /0. Moreover, at most f = fl ra + fl r + f a + f i perceptions Vr p,k, for any p where Vq p,k /0 in Vq k (t), may be empty in any other obedient receiver s Vr k (t + t) for any t ε. Proof The first statement of our lemma is an obvious consequence of Definition 7 and Definition 10. To prove the second statement, we note that at most fl ra+ f a + f i messages may have been available (partly too early) at q without being available yet at r, additional fl ra fl ra perceptions may be late at r, and fl r f l ra ones could suffer from a loss at inbound links to r. All messages from symmetric faulty processes present in V q (t) must also be present in V r (t + t), however. Summing up all the differences, the expression for f given in Lemma 1 follows. The above perception-based model for a single process per processor is easily generalized to distributed algorithms consisting of multiple processes per processor: Every processor can host multiple perception vectors, one per process. Any process but at most one per processor may send messages for a specific perception vector, i.e., to some specific process. Only one process per processor may process the messages from a specific perception vector, however. 4.5 System Booting In classic models, it is assumed that all processes boot simultaneously at time t = 0 and are hence able to receive each other s messages right from the start. In this section, we will show how to get rid of this assumption. At the very beginning, all processes are down. Every message that arrives at a process while it is down is lost, and no messages are sent by such a process. Note carefully that we do not even allow spurious messages from a down process here. Obedient processes boot at times that are not known a priori, while non-obedient processes are assumed to be up right from the beginning at t = 0. During startup, an obedient process goes through the following sequence of operating modes: Down: A process is down when it has not been created yet or has not completed booting. Messages which drop in while a process is down are lost, i.e., are not successfully received and do not generate a corresponding perception. Up: During booting, process q s perception vectors are initialized and the receive processing is set up. When q completes booting at some time σq 1, we say that it gets up. Messages that are sent to process q by process p before time σq 1 τ may be lost, since q need not be up at the time of reception. The relation between the round k perception matrix V k (t) and the global matrix R k if booting is considered is as follows: Vq p (t) contains all the messages in R k p(q) received within the time interval [σq 1,t], or /0 if there is no such message. Note carefully that such late booting losses do not contribute to the failure bounds in Definition 7 and Definition 10. Passive: Upon getting up, a process performs an initialization phase, during which it is called passive. The actions to be performed in passive mode are of course algorithmspecific. Typically, a process broadcasts a special join message as its round 0 message when it gets up. The first reception of a join message from some process p causes every already up receiver to send an algorithmdependent response message back to p (a point-to-point reply is sufficient here); subsequent join messages from the same sender are ignored. When p has received sufficiently many replies for constructing a sufficiently accurate view of the instantaneous system state, it terminates passive mode. 10 Active: A process that has completed its initialization phase and thus left passive mode is called active. It operates as described in the execution model in Sect In case of the clock synchronization algorithm given in Fig. 1, for example, a passive process broadcasts join namely, (init, 0), in order to get the last (init, ) and (echo, ) message of every peer and participates in the algorithm as in active mode. It need not satisfy the clock synchronization conditions (P) and (A) while passive, however. The transition to active mode occurs when the process can be sure that it is within the synchronization precision D max. This happens when sufficiently many messages with certain properties have been obtained.

9 Booting Clock Synchronization in Partially Synchronous Systems with Hybrid Process and Link Failures 9 The communication pattern for joining is slightly different from ordinary rounds: Whereas join is broadcast by a single joining process p as its round 0 message as usual, response messages are sent back immediately (maybe even point-to-point) upon reception of the first join message from p. Fortunately, response messages can also be accommodated easily in our framework: We just consider such a message as q s retransmission of some (recent) round k message, triggered by the reception of p s join. Since the joiner p must be provided with up-to-date information to construct an internal state that satisfies problem specific invariants (as e.g. precision of all active clocks), this is a sensible approach. In fact, all definitions in Sect. 4.3 remain valid when we allow processes to retransmit round k messages (with a minor caveat, see below): Process failures have been defined at the level of single computing steps. Adding retransmission steps hence does not affect the process failure model in Definition 7. Communication failures are defined for single rounds. Since all related definitions, in particular Equation (1), are based on sets of messages, they transparently extend to retransmissions of round k messages. Hence, the link failure model in Definition 10 applies without changes as well. Note carefully, however, that this binds together the original round k broadcast and all round k retransmissions w.r.t. link failure bounds, as they are all contained in the round k matrices S k and R k. A response message retransmitting the round k message of some process q upon p s join is considered as belonging to the original round k broadcast, with the actual transmission deferred until the joining receiver p is up. This peculiarity creates a minor problem with respect to timely messages, however, which must be considered by the algorithm: A retransmitted round k message is of course not sent at the same time as the original round k broadcast. Hence, the former cannot be considered timely w.r.t. the latter, but may suffer from a (late) timing failure. Hence, Lemma 1 does not apply to (just joined) processes that have received at least one response messages. This case will extensively be treated in our analysis in Sect Clock Synchronization In the following, we define the clock synchronization problem and discuss its properties. Definition 11 (Clock synchronization properties) Every obedient process p is equipped with an adjustable discrete clock C p (t) that can be read at arbitrary real-times t. A correct clock synchronization algorithm guarantees the following properties: (P) Uniform Precision: In each execution, there is some precision D max > 0 that is independent of real-time t such that C p (t) C q (t) D max (5) for any two obedient active processes p and q and any real-time t. (A) Uniform Accuracy: In each execution, there exist some bounds R, O, R +, O + > 0 that are independent of real-time t such that O (t 2 t 1 ) R C p (t 2 ) C p (t 1 ) O + (t 2 t 1 )+R + (6) for any obedient active process p and any two real-times t 2 t 1. The precision requirement (P) just states that the difference of any two correct clocks in the system must be bounded, whereas (A) guarantees some relation of the progress of clock time with respect to the progress of real-time; (A) is also called envelope requirement in literature. Note that (P) and (A) are uniform [21] here, i.e., they do not only hold for correct but also for active benign faulty (obedient) processes. Any such process must hence satisfy the clock synchronization conditions until it (possibly) crashes. Traditional research on clock synchronization (see [49, 35, 48, 38] for an overview) considers synchronous systems equipped with hardware clocks with high time resolution and small drift ρ (in the range of µs/s). All executions in such systems must obey some (usually a priori known) lower and upper bounds τ, τ + on the end-toend delay of successfully transmitted messages between correct processes. Consequently, in synchronous systems, the bounds D max, R, O, R + and O + in Definition 11 are indeed constants that hold in all executions, i.e., are worst case bounds. Optimal clock synchronization algorithms for synchronous systems like [51,16,45] guarantee O = 1 ρ and O + = 1 + ρ and very small R, R +. The achievable precision depends primarily on the transmission delay uncertainty ε = τ + τ [30], which is in the ms-range for typical local area networks. By contrast, in partially synchronous systems [15, 33] like ours, different executions may obey different lower and upper bounds τ, τ + for the end-to-end delays. Every execution E(τ,τ + ) is hence parametrized by τ, τ +, and D max, R, O, R + and O + in Definition 11 depend on τ and τ + of the current execution. Clocks are usually implemented as simple software counters here. For example, in our paper, process p s clock will be the round number of the clock synchronization algorithm running on p: C p (t) is incremented by 1 when process p switches to the next round. The timeresolution of a clock is hence determined by the number of round switches within a given real-time interval, which is about once per round trip: Theorem 11 will reveal that our algorithm ensures O = 1/(2τ + ), O + = 1/(2τ ) with reasonably small R, R +, where the lower bound holds only if sufficiently many processes are eventually up and running When implementing this algorithm, it is possible to stretch τ to obtain τ + τ (1 + ρ) + ε by using local bounded-drift interval timers at every sender process. This way, a situation comparable to the synchronous clock synchronization setting can be established using our message-driven algorithm (originally designed for partially syn-

10 10 Josef Widder, Ulrich Schmid Remarkably, it will turn out that the precision D max as well as R, R + depend only on the delay ratio Θ = τ + /τ, rather than on τ + and τ itself. Hence, our algorithm guarantees some constant precision even in systems with arbitrary delays, provided that Θ remains bounded by a constant in every execution. See Sect. 9.1 for further details. 5.1 The Algorithm The clock synchronization algorithm considered in this paper is a hybrid variant of the algorithm of [55]. It is based on the well-known non-authenticated 12 clock synchronization algorithm of [51], which employs consistent broadcasting as a primitive for generating nearly-simultaneous global resynchronization events in the system. Its pseudo-code is given in Fig. 1. Note carefully that our algorithm is completely time(r)-free in that it does not incorporate τ + and τ, and not even ε nor Θ, and that it is completely message-driven as every step is triggered by a message reception (and not by the progress of time, as in time-driven execution models). The algorithm is given in an event-based style that consists of the 6 outermost if statements, numbered from the zeroth if (line 4) up to the fifth if (line 25). The variable k provides the clock value C p (t) of the processor p that executes the algorithm. The second and the third if are just the instances of the fourth and fifth if for l = k, which have been incorporated explicitly for the ease of explanation only and could entirely be omitted from the code. The presence of the third if in Fig. 1 implies, however, that the only part of the fifth if that is ever executed is setting mode to active when in passive mode (line 26). Note carefully that this also happens when l < k. The clock value k is never changed in the fifth if, since k is already set to l in the fourth if, which always triggers before the fifth if. Comparison of the algorithm of Fig. 1 with hybrid variants [40, 28] of the original consistent broadcasting primitive (which do not incorporate system start-up handling) shows that the first three if-clauses 13 (see lines 7 to 17) are the same: Each round k is started by sending an (init,k) message to all. When an obedient process achieves sufficient evidence that at least one obedient process has sent a round k message (line 7 or line 10), it sends (echo, k) to all. When a process can be sure that there are enough round k messages in the system, such that every obedient process will eventually reach sufficient evidence (line 13), it advances to round k + 1 and hence sends (init,k+1) (line 16). This guarantees both (P) and (A) if sufficiently many correct processes are up and running right from the beginning. chronous systems). Note that this also reduces the message load imposed on the system, see [27] for details. 12 We do not consider authenticated algorithms here. Besides the disadvantages of computational and communication overhead, it is never guaranteed that malicious processes cannot break the authentication scheme. Using the algorithm of Srikanth and Toueg [51], our correctness proofs cannot be invalidated by this event. 13 We do not count the booting-related zeroth if (line 4) here. In reality, however, processes complete booting at unpredictable times. Late starters could hence miss the (echo, k) and/or (init, k) message(s) of early starters. Consequently, three consecutive modes of system operation must be distinguished to properly handle system startup: Early mode, where the first few correct processes have completed booting and started exchanging messages. This mode terminates when the first obedient process advances its clock to 1. Degraded mode, where sufficiently many correct processes are up such that some clocks may advance when assisted by faulty processes and transmissions. Normal mode, where enough correct processes are up and synchronized to guarantee progress for all clocks. Note that the system remains in normal mode forever, i.e., does not return to degraded mode. Unfortunately, it is impossible for any process in the system to delimit the exact borders between those modes from local information. In order to add the handling of system startup to the original consistent broadcasting algorithm, join messages and two additional if-clauses are required. First, a newly booted process must tell all others that it is up now and wants to learn their current clock values. This is accomplished by means of join messages, as introduced in Sect. 4.5: Every process p sends join = (init,0) as the very first message after having completed booting. Every process q that receives this message replies by retransmitting its previously sent (init, k) and (echo,k ) message (line 3-6). The latter message is omitted if q did not yet send (echo,0) in round k = 0; otherwise, k = k or k = k 1, depending on whether (echo,k) has already been sent. This ensures that p will eventually get sufficiently many messages (which may have been lost while it was down) to trigger the catch-up rule described below. The major problem in degraded mode is the impossibility to guarantee (P) solely via the third if-clause (line 13): There are not sufficiently many correct processes to guarantee that every obedient process will eventually advance its clock when the first one does so. Here it is where the fourth if (line 18), our pivotal catch-up rule, comes into play: It allows an obedient process to advance its clock to round l when sufficiently many (echo, l) or (echo, l + 1) messages have been received. Hence, eventually, a sufficiently large group of correct processes can be ensured to be within two rounds of each other. Note, that the algorithm of Fig. 1 cannot guarantee that the clock of an obedient process takes on every integer value: It may leap forward due to catch-up. The catch-up rule could cause two other problems: First, the second and third if-clause (line 10 or line 13) must trigger when sufficiently many (echo) messages from different processes within 2 rounds have been received. Since messages from two consecutive rounds k 1, k could trigger round switching, which is not directly supported by the round model introduced in Sect. 4.3, the following convention is used: The reception of an (echo,k) message must cause perceptions for both round k and k 1. More specifically, since the reception of (echo, k) at process q implies

The Theta-Model: Achieving Synchrony without Clocks

The Theta-Model: Achieving Synchrony without Clocks Distributed Computing manuscript No. (will be inserted by the editor) Josef Widder Ulrich Schmid The Theta-Model: Achieving Synchrony without Clocks Supported by the FWF project Theta (proj. no. P17757-N04)

More information

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George) Time Lakshmi Ganesh (slides borrowed from Maya Haridasan, Michael George) The Problem Given a collection of processes that can... only communicate with significant latency only measure time intervals approximately

More information

Time Free Self-Stabilizing Local Failure Detection

Time Free Self-Stabilizing Local Failure Detection Research Report 33/2004, TU Wien, Institut für Technische Informatik July 6, 2004 Time Free Self-Stabilizing Local Failure Detection Martin Hutle and Josef Widder Embedded Computing Systems Group 182/2

More information

Failure Detection with Booting in Partially Synchronous Systems

Failure Detection with Booting in Partially Synchronous Systems Proc. Fifth European Dependable Computing Conference (EDCC-5) Budapest, Hungary c Springer Verlag. LNCS January 31, 2005 Failure Detection with Booting in Partially Synchronous Systems Josef Widder 1,

More information

Self-stabilizing Byzantine Agreement

Self-stabilizing Byzantine Agreement Self-stabilizing Byzantine Agreement Ariel Daliot School of Engineering and Computer Science The Hebrew University, Jerusalem, Israel adaliot@cs.huji.ac.il Danny Dolev School of Engineering and Computer

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information

Shared Memory vs Message Passing

Shared Memory vs Message Passing Shared Memory vs Message Passing Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Revised: 15 February 2004 Abstract This paper determines the computational strength of the shared memory abstraction

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!

More information

Optimal Resilience Asynchronous Approximate Agreement

Optimal Resilience Asynchronous Approximate Agreement Optimal Resilience Asynchronous Approximate Agreement Ittai Abraham, Yonatan Amit, and Danny Dolev School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel {ittaia, mitmit,

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

THE chase for the weakest system model that allows

THE chase for the weakest system model that allows 1 Chasing the Weakest System Model for Implementing Ω and Consensus Martin Hutle, Dahlia Malkhi, Ulrich Schmid, Lidong Zhou Abstract Aguilera et al. and Malkhi et al. presented two system models, which

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA)

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA) Christian Mueller November 25, 2005 1 Contents 1 Clock synchronization in general 3 1.1 Introduction............................

More information

Optimal clock synchronization revisited: Upper and lower bounds in real-time systems

Optimal clock synchronization revisited: Upper and lower bounds in real-time systems Research Report 71/006, Technische Universität Wien, Institut für Technische Informatik, 006 July 5, 006 Optimal clock synchronization revisited: Upper and lower bounds in real-time systems Heinrich Moser

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Model Checking of Fault-Tolerant Distributed Algorithms

Model Checking of Fault-Tolerant Distributed Algorithms Model Checking of Fault-Tolerant Distributed Algorithms Part I: Fault-Tolerant Distributed Algorithms Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef Widder LOVE 2016 @ TU Wien Josef Widder (TU

More information

Time in Distributed Systems: Clocks and Ordering of Events

Time in Distributed Systems: Clocks and Ordering of Events Time in Distributed Systems: Clocks and Ordering of Events Clocks in Distributed Systems Needed to Order two or more events happening at same or different nodes (Ex: Consistent ordering of updates at different

More information

Tolerating Permanent and Transient Value Faults

Tolerating Permanent and Transient Value Faults Distributed Computing manuscript No. (will be inserted by the editor) Tolerating Permanent and Transient Value Faults Zarko Milosevic Martin Hutle André Schiper Abstract Transmission faults allow us to

More information

Eventually consistent failure detectors

Eventually consistent failure detectors J. Parallel Distrib. Comput. 65 (2005) 361 373 www.elsevier.com/locate/jpdc Eventually consistent failure detectors Mikel Larrea a,, Antonio Fernández b, Sergio Arévalo b a Departamento de Arquitectura

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson Distributed systems Lecture 4: Clock synchronisation; logical clocks Dr Robert N. M. Watson 1 Last time Started to look at time in distributed systems Coordinating actions between processes Physical clocks

More information

Common Knowledge and Consistent Simultaneous Coordination

Common Knowledge and Consistent Simultaneous Coordination Common Knowledge and Consistent Simultaneous Coordination Gil Neiger College of Computing Georgia Institute of Technology Atlanta, Georgia 30332-0280 gil@cc.gatech.edu Mark R. Tuttle DEC Cambridge Research

More information

Early consensus in an asynchronous system with a weak failure detector*

Early consensus in an asynchronous system with a weak failure detector* Distrib. Comput. (1997) 10: 149 157 Early consensus in an asynchronous system with a weak failure detector* André Schiper Ecole Polytechnique Fe dérale, De partement d Informatique, CH-1015 Lausanne, Switzerland

More information

Integrating External and Internal Clock Synchronization. Christof Fetzer and Flaviu Cristian. Department of Computer Science & Engineering

Integrating External and Internal Clock Synchronization. Christof Fetzer and Flaviu Cristian. Department of Computer Science & Engineering Integrating External and Internal Clock Synchronization Christof Fetzer and Flaviu Cristian Department of Computer Science & Engineering University of California, San Diego La Jolla, CA 9093?0114 e-mail:

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Ordering events. Lamport and vector clocks. Global states. Detecting failures. Required reading for this topic } Leslie Lamport,"Time, Clocks, and the Ordering

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

Asynchronous Models For Consensus

Asynchronous Models For Consensus Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms

More information

Reliable Broadcast for Broadcast Busses

Reliable Broadcast for Broadcast Busses Reliable Broadcast for Broadcast Busses Ozalp Babaoglu and Rogerio Drummond. Streets of Byzantium: Network Architectures for Reliable Broadcast. IEEE Transactions on Software Engineering SE- 11(6):546-554,

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Gradient Clock Synchronization

Gradient Clock Synchronization Noname manuscript No. (will be inserted by the editor) Rui Fan Nancy Lynch Gradient Clock Synchronization the date of receipt and acceptance should be inserted later Abstract We introduce the distributed

More information

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults Martin Hutle martin.hutle@epfl.ch André Schiper andre.schiper@epfl.ch École Polytechnique Fédérale de Lausanne

More information

Easy Consensus Algorithms for the Crash-Recovery Model

Easy Consensus Algorithms for the Crash-Recovery Model Reihe Informatik. TR-2008-002 Easy Consensus Algorithms for the Crash-Recovery Model Felix C. Freiling, Christian Lambertz, and Mila Majster-Cederbaum Department of Computer Science, University of Mannheim,

More information

Recap. CS514: Intermediate Course in Operating Systems. What time is it? This week. Reminder: Lamport s approach. But what does time mean?

Recap. CS514: Intermediate Course in Operating Systems. What time is it? This week. Reminder: Lamport s approach. But what does time mean? CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA Recap We ve started a process of isolating questions that arise in big systems Tease out an abstract issue Treat

More information

Distributed Consensus

Distributed Consensus Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Distributed Computing. Synchronization. Dr. Yingwu Zhu Distributed Computing Synchronization Dr. Yingwu Zhu Topics to Discuss Physical Clocks Logical Clocks: Lamport Clocks Classic paper: Time, Clocks, and the Ordering of Events in a Distributed System Lamport

More information

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:

More information

Time Synchronization

Time Synchronization Massachusetts Institute of Technology Lecture 7 6.895: Advanced Distributed Algorithms March 6, 2006 Professor Nancy Lynch Time Synchronization Readings: Fan, Lynch. Gradient clock synchronization Attiya,

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

Proving Safety Properties of the Steam Boiler Controller. Abstract

Proving Safety Properties of the Steam Boiler Controller. Abstract Formal Methods for Industrial Applications: A Case Study Gunter Leeb leeb@auto.tuwien.ac.at Vienna University of Technology Department for Automation Treitlstr. 3, A-1040 Vienna, Austria Abstract Nancy

More information

Chapter 11 Time and Global States

Chapter 11 Time and Global States CSD511 Distributed Systems 分散式系統 Chapter 11 Time and Global States 吳俊興 國立高雄大學資訊工程學系 Chapter 11 Time and Global States 11.1 Introduction 11.2 Clocks, events and process states 11.3 Synchronizing physical

More information

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report # Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine

More information

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:

More information

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong Outline Introduction Asynchronous distributed systems, distributed computations,

More information

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

The Heard-Of Model: Computing in Distributed Systems with Benign Failures The Heard-Of Model: Computing in Distributed Systems with Benign Failures Bernadette Charron-Bost Ecole polytechnique, France André Schiper EPFL, Switzerland Abstract Problems in fault-tolerant distributed

More information

Do we have a quorum?

Do we have a quorum? Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register

More information

Dynamic Group Communication

Dynamic Group Communication Dynamic Group Communication André Schiper Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland e-mail: andre.schiper@epfl.ch Abstract Group communication is the basic infrastructure

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Byzantine agreement with homonyms

Byzantine agreement with homonyms Distrib. Comput. (013) 6:31 340 DOI 10.1007/s00446-013-0190-3 Byzantine agreement with homonyms Carole Delporte-Gallet Hugues Fauconnier Rachid Guerraoui Anne-Marie Kermarrec Eric Ruppert Hung Tran-The

More information

Efficient Construction of Global Time in SoCs despite Arbitrary Faults

Efficient Construction of Global Time in SoCs despite Arbitrary Faults Efficient Construction of Global Time in SoCs despite Arbitrary Faults Christoph Lenzen Massachusetts Institute of Technology Cambridge, MA, USA clenzen@csail.mit.edu Matthias Függer, Markus Hofstätter,

More information

Automatic Synthesis of Distributed Protocols

Automatic Synthesis of Distributed Protocols Automatic Synthesis of Distributed Protocols Rajeev Alur Stavros Tripakis 1 Introduction Protocols for coordination among concurrent processes are an essential component of modern multiprocessor and distributed

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Genuine atomic multicast in asynchronous distributed systems

Genuine atomic multicast in asynchronous distributed systems Theoretical Computer Science 254 (2001) 297 316 www.elsevier.com/locate/tcs Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper Departement d Informatique, Ecole

More information

Causality and Time. The Happens-Before Relation

Causality and Time. The Happens-Before Relation Causality and Time The Happens-Before Relation Because executions are sequences of events, they induce a total order on all the events It is possible that two events by different processors do not influence

More information

Gradient Clock Synchronization in Dynamic Networks Fabian Kuhn, Thomas Locher, and Rotem Oshman

Gradient Clock Synchronization in Dynamic Networks Fabian Kuhn, Thomas Locher, and Rotem Oshman Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2009-022 May 29, 2009 Gradient Clock Synchronization in Dynamic Networks Fabian Kuhn, Thomas Locher, and Rotem Oshman

More information

Consensus. Consensus problems

Consensus. Consensus problems Consensus problems 8 all correct computers controlling a spaceship should decide to proceed with landing, or all of them should decide to abort (after each has proposed one action or the other) 8 in an

More information

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x Failure Detectors Seif Haridi haridi@kth.se 1 Modeling Timing Assumptions Tedious to model eventual synchrony (partial synchrony) Timing assumptions mostly needed to detect failures Heartbeats, timeouts,

More information

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007 Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can

More information

Optimal Clock Synchronization

Optimal Clock Synchronization Optimal Clock Synchronization T. K. SRIKANTH AND SAM TOUEG Cornell University, Ithaca, New York Abstract. We present a simple, efficient, and unified solution to the problems of synchronizing, initializing,

More information

Time. Today. l Physical clocks l Logical clocks

Time. Today. l Physical clocks l Logical clocks Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

Slides for Chapter 14: Time and Global States

Slides for Chapter 14: Time and Global States Slides for Chapter 14: Time and Global States From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, Addison-Wesley 2012 Overview of Chapter Introduction Clocks,

More information

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures Xianbing Wang 1, Yong-Meng Teo 1,2, and Jiannong Cao 3 1 Singapore-MIT Alliance, 2 Department of Computer Science,

More information

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced Primary Partition \Virtually-Synchronous Communication" harder than Consensus? Andre Schiper and Alain Sandoz Departement d'informatique Ecole Polytechnique Federale de Lausanne CH-1015 Lausanne (Switzerland)

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 & Clocks, Clocks, and the Ordering of Events in a Distributed System. L. Lamport, Communications of the ACM, 1978 Notes 15: & Clocks CS 347 Notes

More information

Failure Detection and Consensus in the Crash-Recovery Model

Failure Detection and Consensus in the Crash-Recovery Model Failure Detection and Consensus in the Crash-Recovery Model Marcos Kawazoe Aguilera Wei Chen Sam Toueg Department of Computer Science Upson Hall, Cornell University Ithaca, NY 14853-7501, USA. aguilera,weichen,sam@cs.cornell.edu

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1 Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1 Stavros Tripakis 2 VERIMAG Technical Report TR-2004-26 November 2004 Abstract We introduce problems of decentralized

More information

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1 Hyun Chul Chung, Jennifer L. Welch Department of Computer Science & Engineering Texas A&M University

More information

A Realistic Look At Failure Detectors

A Realistic Look At Failure Detectors A Realistic Look At Failure Detectors C. Delporte-Gallet, H. Fauconnier, R. Guerraoui Laboratoire d Informatique Algorithmique: Fondements et Applications, Université Paris VII - Denis Diderot Distributed

More information

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014 Overview of Ordering and Logical Time Prof. Dave Bakken Cpt. S 464/564 Lecture January 26, 2014 Context This material is NOT in CDKB5 textbook Rather, from second text by Verissimo and Rodrigues, chapters

More information

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems Xianbing Wang, Yong-Meng Teo, and Jiannong Cao Singapore-MIT Alliance E4-04-10, 4 Engineering Drive 3, Singapore 117576 Abstract

More information

Immediate Detection of Predicates in Pervasive Environments

Immediate Detection of Predicates in Pervasive Environments Immediate Detection of redicates in ervasive Environments Ajay Kshemkalyani University of Illinois at Chicago November 30, 2010 A. Kshemkalyani (U Illinois at Chicago) Immediate Detection of redicates......

More information

On the weakest failure detector ever

On the weakest failure detector ever On the weakest failure detector ever The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Guerraoui, Rachid

More information

Towards optimal synchronous counting

Towards optimal synchronous counting Towards optimal synchronous counting Christoph Lenzen Joel Rybicki Jukka Suomela MPI for Informatics MPI for Informatics Aalto University Aalto University PODC 5 July 3 Focus on fault-tolerance Fault-tolerant

More information

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H Cuts Cuts A cut C is a subset of the global history of H C = h c 1 1 hc 2 2...hc n n A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1,ec 2 2,...ec n n C = h

More information

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion Srikanth Sastry Computer Science and Engr Texas A&M University College Station, TX, USA sastry@cse.tamu.edu Scott M. Pike

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Topics Clock Synchronization Physical Clocks Clock Synchronization Algorithms

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

The Weighted Byzantine Agreement Problem

The Weighted Byzantine Agreement Problem The Weighted Byzantine Agreement Problem Vijay K. Garg and John Bridgman Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712-1084, USA garg@ece.utexas.edu,

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking Real-Time Course Clock synchronization 1 Clocks Processor p has monotonically increasing clock function C p (t) Clock has drift rate For t1 and t2, with t2 > t1 (1-ρ)(t2-t1)

More information

Meeting the Deadline: On the Complexity of Fault-Tolerant Continuous Gossip

Meeting the Deadline: On the Complexity of Fault-Tolerant Continuous Gossip Meeting the Deadline: On the Complexity of Fault-Tolerant Continuous Gossip Chryssis Georgiou Seth Gilbert Dariusz R. Kowalski Abstract In this paper, we introduce the problem of Continuous Gossip in which

More information

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG Why Synchronization? You want to catch a bus at 6.05 pm, but your watch is

More information

Optimal and Player-Replaceable Consensus with an Honest Majority Silvio Micali and Vinod Vaikuntanathan

Optimal and Player-Replaceable Consensus with an Honest Majority Silvio Micali and Vinod Vaikuntanathan Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2017-004 March 31, 2017 Optimal and Player-Replaceable Consensus with an Honest Majority Silvio Micali and Vinod Vaikuntanathan

More information

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs Dafna Kidron Yehuda Lindell June 6, 2010 Abstract Universal composability and concurrent general composition

More information

Time is an important issue in DS

Time is an important issue in DS Chapter 0: Time and Global States Introduction Clocks,events and process states Synchronizing physical clocks Logical time and logical clocks Global states Distributed debugging Summary Time is an important

More information

Failure detection and consensus in the crash-recovery model

Failure detection and consensus in the crash-recovery model Distrib. Comput. (2000) 13: 99 125 c Springer-Verlag 2000 Failure detection and consensus in the crash-recovery model Marcos Kawazoe Aguilera 1, Wei Chen 2, Sam Toueg 1 1 Department of Computer Science,

More information

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors Michel RAYNAL, Julien STAINER Institut Universitaire de France IRISA, Université de Rennes, France Message adversaries

More information

Concurrent Non-malleable Commitments from any One-way Function

Concurrent Non-malleable Commitments from any One-way Function Concurrent Non-malleable Commitments from any One-way Function Margarita Vald Tel-Aviv University 1 / 67 Outline Non-Malleable Commitments Problem Presentation Overview DDN - First NMC Protocol Concurrent

More information

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Achour Mostefaoui, Michel Raynal To cite this version: Achour Mostefaoui, Michel Raynal. Signature-Free Broadcast-Based

More information

Consensus when failstop doesn't hold

Consensus when failstop doesn't hold Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has

More information