Section 6 Fault-Tolerant Consensus

Similar documents
Fault-Tolerant Consensus

Asynchronous Models For Consensus

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

CS505: Distributed Systems

Distributed Consensus

CS505: Distributed Systems

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

Valency Arguments CHAPTER7

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Impossibility of Distributed Consensus with One Faulty Process

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Failure detectors Introduction CHAPTER

Distributed Computing in Shared Memory and Networks

Consensus and Universal Construction"

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Distributed Systems Byzantine Agreement

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Shared Memory vs Message Passing

Consensus when failstop doesn't hold

Unreliable Failure Detectors for Reliable Distributed Systems

Do we have a quorum?

Reliable Broadcast for Broadcast Busses

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

The Weakest Failure Detector to Solve Mutual Exclusion

On Equilibria of Distributed Message-Passing Games

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont

Eventually consistent failure detectors

arxiv: v2 [cs.dc] 18 Feb 2015

On the Minimal Synchronism Needed for Distributed Consensus

Byzantine agreement with homonyms

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Randomized Protocols for Asynchronous Consensus

Easy Consensus Algorithms for the Crash-Recovery Model

Consensus. Consensus problems

Failure Detection and Consensus in the Crash-Recovery Model

Tolerating Permanent and Transient Value Faults

Early consensus in an asynchronous system with a weak failure detector*

How to solve consensus in the smallest window of synchrony

On the weakest failure detector ever

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

A Realistic Look At Failure Detectors

Optimal Resilience Asynchronous Approximate Agreement

Atomic m-register operations

Failure detection and consensus in the crash-recovery model

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Model Checking of Fault-Tolerant Distributed Algorithms

Effectively Nonblocking Consensus Procedures Can Execute Forever a Constructive Version of FLP

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Asynchronous Leasing

On the weakest failure detector ever

Genuine atomic multicast in asynchronous distributed systems

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Colorless Wait-Free Computation

Causality and Time. The Happens-Before Relation

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

THE chase for the weakest system model that allows

On Stabilizing Departures in Overlay Networks

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Dynamic Group Communication

(Leader/Randomization/Signature)-free Byzantine Consensus for Consortium Blockchains

How to Elect a Leader Faster than a Tournament

Theoretical Computer Science

On the Number of Synchronous Rounds Required for Byzantine Agreement

DEXON Consensus Algorithm

Combining Shared Coin Algorithms

Gracefully Degrading Consensus and k-set Agreement in Directed Dynamic Networks

CS3110 Spring 2017 Lecture 21: Distributed Computing with Functional Processes

Information and Computation

Nontrivial and Universal Helping for Wait-Free Queues and Stacks

6.852: Distributed Algorithms Fall, Class 24

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

ROBUST & SPECULATIVE BYZANTINE RANDOMIZED CONSENSUS WITH CONSTANT TIME COMPLEXITY IN NORMAL CONDITIONS

Theoretical Computer Science. Partially dynamic efficient algorithms for distributed shortest paths

FAIRNESS FOR DISTRIBUTED ALGORITHMS

Uniform consensus is harder than consensus

6.852: Distributed Algorithms Fall, Class 10

LOWER BOUNDS FOR RANDOMIZED CONSENSUS UNDER A WEAK ADVERSARY

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model?

TECHNICAL REPORT YL DISSECTING ZAB

arxiv:cs/ v1 [cs.dc] 29 Dec 2004

How can one get around FLP? Around FLP in 80 Slides. How can one get around FLP? Paxos. Weaken the problem. Constrain input values

Byzantine Agreement. Gábor Mészáros. Tatracrypt 2012, July 2 4 Smolenice, Slovakia. CEU Budapest, Hungary

Byzantine Agreement. Gábor Mészáros. CEU Budapest, Hungary

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Byzantine behavior also includes collusion, i.e., all byzantine nodes are being controlled by the same adversary.

Byzantine Agreement. Chapter Validity 190 CHAPTER 17. BYZANTINE AGREEMENT

CS505: Distributed Systems

Partition is reducible to P2 C max. c. P2 Pj = 1, prec Cmax is solvable in polynomial time. P Pj = 1, prec Cmax is NP-hard

Slides for Chapter 14: Time and Global States

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Time Free Self-Stabilizing Local Failure Detection

Self-stabilizing Byzantine Agreement

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Transcription:

Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing. All non-faulty processes are required to produce outputs from the value set V, subject to simple agreement and validity. Correcteness Conditions Agreement: No two processes decide on different values. Validity: If all processes start with the same initial value v V, then v is the only decision value. Termination: All non-faulty processes eventually decide. Motivation Processes in a database system may need to agree whether a transaction should commit or abort. Processes in a communication system may need to agree on whether or not a message has been received. Processes in a control system may need to agree on whether or not a particular other process is faulty. CS586 - Panagiota Fatourou 2 1

Synchronous Shared Memory System Is there an algorithm that solves consensus in a synchronous shared-memory system? CS586 - Panagiota Fatourou 3 Modeling Message Passing Systems n: number of processes (p 0,, p n-1 ) Each process is modeled as a (possibly infinite) state machine. The state of process p i contains 2r special components, where r is the number of edges that are incident to p i : outbuf i [l], 1 l r, holds messages that p i has sent to its neighbor over its lth incident channel but that have not yet been delivered to the neighbor; inbuf i [l], 1 l r, holds messages that have been delivered to p i on its lth incident channel but that p i has not yet processed with an internal computation step. The state e.g., of p 0 consists of p 0 s local variables, and of six arrays: inbuf 0 [1],, inbuf 0 [3]: messages that have been sent to p 0, and p 0 has not yet processed. outbuf 0 [1],, outbuf 0 [3]: messages that have been sent by p 0 to each of the processes p 1, p 2, p 3, respectively, and which have not yet been delivered to p 1, p 2, p 3. CS586 - Panagiota Fatourou 4 2

Modeling Message Passing Systems The state of a process p i, excluding the outbuf i [l] components, comprises the accessible state of p i. Each process has a state at which all inbuf arrays are empty. In each step executed by p 0, p 0 processes all messages stored in its inbuf arrays, the state of p 0 changes and at most one message is sent to each of its neighboring processes. CS586 - Panagiota Fatourou 5 Modeling Message Passing Systems Events in Message-Passing Systems Delivery event, del(k,j,m): delivery of message m from process p k to process p j ; just before the event occurs, m must be an element of outbuf k [l], where l is p k s label for channel {p k,p j }. The event causes m to be deleted from outbuf k [l] and be inserted to inbuf j [l ], where l is p j s label for channer {p k,p j }: A message is delivered only if it is in transit and the only change is to move he message from the sender s outgoing buffer to the recepient s incoming buffer. Computational event by p j, comp(j): computation step of process pj in which pj s transition function is applied to its current accessible state; p j changes state according to its transition function operating on p j s accessible state and the set of messages specified by p j s transition function are added to the outbuf j variables. CS586 - Panagiota Fatourou 6 3

Modeling Message Passing Systems Admissible execution Each process has an infinite number of computation events and every message sent is eventually delivered. Message complexity Maximum number of messages that are sent in any execution. CS586 - Panagiota Fatourou 7 A Simple Algorithm for Synchronous Message-Passing Systems Each process maintains a set of the values it knows to exist in the system; initially, this set contains only its own input. At the first round, each process broadcasts its own input to all processes. For the subsequent f rounds, each process takes the following actions: updates its set by joining it with the sets received from other processes, and broadcasts any new additions to the set to all processes. After f+1 rounds, the process decides on the smallest value in its set. CS586 - Panagiota Fatourou 8 4

A Simple Algorithm for Synchronous Message-Passing Systems f = 3, n = 4 CS586 - Panagiota Fatourou 9 A Simple Algorithm for Synchronous Message-Passing Systems Termination? Validity? Intuition for Agreement: Assume that a process p i decides on a value x smaller than that decided by some other process p j. Then, x has remain hidden from p j for (f+1) rounds. We have at most f faulty processes. A contradiction!!! Number of processes? Round complexity? n > f (f+1) rounds Message Complexity? n 2 * V messages, where V is the set of input values. CS586 - Panagiota Fatourou 10 5

Impossibility of Consensus in Asynchronous Shared-Memory Systems Theorem 1:. For n 2, there is no algorithm in the read/write shared memory model that solves the agreement problem and guarantees wait-free termination. CS586 - Panagiota Fatourou 11 Useful Definitions The valence of a configuration C is the set of all values decided upon in any configuration reachable from C. C is univalent if this set contains one value; it is 0- valent if this value is 0 and 1-valent if this value is 1. If the set contains two values then C is bivalent. If C is bivalent and the configuration resulting by letting some process p take a step is univalent, we say that p is critical in C. Recall that: Two configurations C 1 and C 2 are similar to a process p, denoted C 1 ~ p C 2, if the values of all shared variables and the state of p are the same in C 1 and C 2. CS586 - Panagiota Fatourou 12 6

Impossibility of Consensus - Proof Assume, by the way of contradiction, that A is a wait-free consensus algorithm. Main Ideas of the Proof We construct an infinite execution in which: every process takes an infinite number of steps, yet every configuration is bivalent, and thus no process can decide. This contradicts the fact that the algorithm is wait-free. CS586 - Panagiota Fatourou 13 Impossibility of Consensus Lemma 2: Let C 1 and C 2 be two univalent configurations. If C 1 p C 2, for some process p, then C 1 is v-valent, if C 2 is also v-valent, where v {0,1}. Proof: Suppose C 1 is v-valent. Consider an infinite execution α from C 1 in which only p takes steps. Since the algorithm is supposed to be wait-free a is admissible and eventually p must decide in α. Since C 1 is v-valent pmust decide v in α. The schedule of a can be applied from C 2 Since C 1 p C 2 and only p takes steps, it follows that p decides v in this execution as well. Thus, C 2 is v-valent, as needed. CS586 - Panagiota Fatourou 14 7

Impossibility of Consensus Lemma 3: There exists a bivalent initial configuration. Proof: By contradiction. Let Ι 0 be the initial configuration in which all processes start with 0 Ι 0 is 0-valent. Let Ι 1 be initial configuration in which all processes start with 1 Ι 1 is 1-valent. Let Ι 01 be the initial configuration in which p 0 starts with 0 and the remaining processes start with 1. Ι 01 p0 Ι 0 (by Lemma 2) Ι 01 is 0-valent Ι 01 p1 Ι 1 (by Lemma 2) Ι 01 cannot be 0-valent. This is a contradiction! CS586 - Panagiota Fatourou 15 Impossibility of Consensus Lemma 4: If C is a bivalent configuration, then at least one processor is not critical in C. Proof: By the way of contradiction. Assume that all processes are critical in C. Since C is bivalent and all processes are critical in C there exists two process p j and p k such that: if p j takes a step from C, then the resulting configuration C is 0-valent, and if p k takes a step from C the resulting configuration C is 1-valent. CS586 - Panagiota Fatourou 16 8

Impossibility of Consensus Proof of Lemma 4 (continued) Consider the following cases. 1. The first step of process p j from C is a read. The case where the first step of p k from C is a read is symmetric. CS586 - Panagiota Fatourou 17 Impossibility of Consensus Proof of Lemma 4 (continued) 2. The first steps of p j and p k from C are both writes and they are to different variables. CS586 - Panagiota Fatourou 18 9

Impossibility of Consensus Proof of Lemma 4 (continued) 2. The first steps of p j and p k from C are both writes and they are to the same variable. CS586 - Panagiota Fatourou 19 Impossibility of Consensus Proof of Theorem 1 We inductively create an admissible execution C 0 i 1 C 1 i 2 in which the configurations remain bivalent forever. By Lemma 3, there is an initial bivalent configuration; let it be C 0. Suppose the execution has been created up to bivalent configuration C k. By Lemma 4, some process is not critical in C k ; denote this process by p ik. Then, p ik can take a step without resulting in a univalent configuration. We apply the event i k to C k to obtain C k+1 which is also bivalent. If we repeat this procedure forever, we will construct an execution in which all the configurations are bivalent. Thus, no process ever decides, contradicting the termination property of the algorithm and implying Theorem 1. CS586 - Panagiota Fatourou 20 10