CS505: Distributed Systems

Similar documents
Asynchronous Models For Consensus

CS505: Distributed Systems

Distributed Consensus

Section 6 Fault-Tolerant Consensus

Consensus when failstop doesn't hold

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Eventually consistent failure detectors

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Early consensus in an asynchronous system with a weak failure detector*

Failure detectors Introduction CHAPTER

Fault-Tolerant Consensus

Impossibility of Distributed Consensus with One Faulty Process

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

Easy Consensus Algorithms for the Crash-Recovery Model

Unreliable Failure Detectors for Reliable Distributed Systems

Valency Arguments CHAPTER7

The Weakest Failure Detector to Solve Mutual Exclusion

Consensus and Universal Construction"

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

Asynchronous Leasing

Distributed Computing in Shared Memory and Networks

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Model Checking of Fault-Tolerant Distributed Algorithms

Consensus. Consensus problems

On the weakest failure detector ever

Failure Detection and Consensus in the Crash-Recovery Model

Genuine atomic multicast in asynchronous distributed systems

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Shared Memory vs Message Passing

A Realistic Look At Failure Detectors

How to solve consensus in the smallest window of synchrony

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Crash-resilient Time-free Eventual Leadership

Benchmarking Model Checkers with Distributed Algorithms. Étienne Coulouma-Dupont

Approximation of δ-timeliness

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Failure detection and consensus in the crash-recovery model

How can one get around FLP? Around FLP in 80 Slides. How can one get around FLP? Paxos. Weaken the problem. Constrain input values

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value

Weakening Failure Detectors for k-set Agreement via the Partition Approach

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

Effectively Nonblocking Consensus Procedures Can Execute Forever a Constructive Version of FLP

arxiv: v2 [cs.dc] 18 Feb 2015

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems

Byzantine Agreement in Polynomial Expected Time

CS3110 Spring 2017 Lecture 21: Distributed Computing with Functional Processes

Do we have a quorum?

arxiv: v2 [cs.dc] 21 Apr 2017

Byzantine agreement with homonyms

Optimal Resilience Asynchronous Approximate Agreement

Distributed Systems Byzantine Agreement

On the weakest failure detector ever

Atomic m-register operations

Generic Broadcast. 1 Introduction

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

Replication predicates for dependent-failure algorithms

Protocol for Asynchronous, Reliable, Secure and Efficient Consensus (PARSEC)

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

ROBUST & SPECULATIVE BYZANTINE RANDOMIZED CONSENSUS WITH CONSTANT TIME COMPLEXITY IN NORMAL CONDITIONS

Tolerating Permanent and Transient Value Faults

Generalized Consensus and Paxos

CS505: Distributed Systems

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Signature-Free Asynchronous Byzantine Consensus with t < n/3 and O(n 2 ) Messages

Unreliable Failure Detectors for Reliable Distributed Systems

(Leader/Randomization/Signature)-free Byzantine Consensus for Consortium Blockchains

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

Reliable Broadcast for Broadcast Busses

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

Faster Agreement via a Spectral Method for Detecting Malicious Behavior

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

THE chase for the weakest system model that allows

Combining Shared Coin Algorithms

Early-Deciding Consensus is Expensive

Irreducibility and Additivity of Set Agreement-oriented Failure Detector Classes

Simultaneous Consensus Tasks: A Tighter Characterization of Set-Consensus

DEXON Consensus Algorithm

Wait-Free Dining Under Eventual Weak Exclusion

Time Free Self-Stabilizing Local Failure Detection

Technical Report. Automatic classification of eventual failure detectors. Piotr Zieliński. Number 693. July Computer Laboratory

A Connectivity Model for Agreement in Dynamic Systems

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1

Randomized Protocols for Asynchronous Consensus

Learning from the Past for Resolving Dilemmas of Asynchrony

A simple proof of the necessity of the failure detector Σ to implement a register in asynchronous message-passing systems

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

How to Elect a Leader Faster than a Tournament

Information and Computation

Dynamic Group Communication

On the Minimal Synchronism Needed for Distributed Consensus

Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults

Round-Efficient Multi-party Computation with a Dishonest Majority

On Equilibria of Distributed Message-Passing Games

FAIRNESS FOR DISTRIBUTED ALGORITHMS

Quantum Algorithms for Leader Election Problem in Distributed Systems

Transcription:

Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus

Outline Consensus impossibility result Consensus with S Consensus with Ω

Consensus Most famous problem in distributed computing Intuitively: a group of processes need to reach agreement on a common value E.g., total order broadcast: next message(s) to deliver Still light form of agreement; harder problems exist, e.g., Non-blocking atomic commit (for distributed transactions) Leader election (for passive replication) Mutual exclusion (for distributed critical sections)

Definition of (Binary) Consensus Defined by two primitives propose decide Processes propose either 0 or 1 Processes decide on same value

Properties I. Validity Any value decided is a value proposed II. III. IV. Integrity No process decides twice Agreement No two correct processes decide differently Termination Every correct process eventually decides

FLP Impossibility [FLP 87] states there is no deterministic solution to consensus in asynchronous systems if even one single process can fail Any fault-tolerant algorithm solving consensus has runs that never terminate These runs can be very unlikely ( probability zero ) Yet they imply that we can t find a totally correct solution And so consensus is impossible ( not always possible )

Execution, Configuration, Events Set of processes p i, each process with a state Configuration Set of states of all processes at some moment Events send and receive Can change the state at a process In particular latter can be delayed locally Execution Sequence of configuration and events; schedules

C Configuration C Event e =(p,m ) C C Event e =(p,m ) Schedule s=(e,e ) C C Equivalent

Commutative Schedules Schedule s1 C s2 s1 and s2 involve disjoint sets of receiving processes C Schedule s2 C s1

Valency We can classify configurations as Bivalent Decision is not already (pre-)determined, outcome is unpredictable; can be 0 or 1 Univalent 0 - valent, will result in deciding 0 1 - valent, will result in deciding 1

Proof Sketch The goal is to construct an execution that does not decide, showing that the protocol remains forever indecisive Start with an initially bivalent state, identify an execution that would lead to a univalent state, let s say 0-valent The switch from bivalent to univalent is due to an event e = (p,m) in which some process p receives some message m

Proof Sketch (2) We will delay the e event for a while Delivery of m would make the run univalent but m is delayed (fair-game in an asynchronous system) Since the protocol is indeed fault-tolerant (p can fail) there must be a run that leads to the other univalent state e can be lost if p fails Since the configuration is bivalent, both outcomes must still be possible Now let m be delivered, this will bring the system back in a bivalent state

Proof: More Details Initial bivalent configuration I can reach Bivalent configuration Lemma 1: There exists an initial configuration that is bivalent Lemma 2: Starting from a bivalent configuration C and an event e = (p, m) applicable to C,consider C the set of all configurations reachable from C without applying e and D the set of all configurations obtained by applying e to the configurations from C, then D contains a bivalent configuration Theorem: There is always a sequence of events in an asynchronous distributed system such that the group of processes never reach consensus

Lemma 1: Proof Sketch Lemma 1: There exists an initial configuration that is bivalent Assume by contradiction that there is no bivalent initial configuration. List all initial configurations. There must be both 0-valent and 1-valent initial configurations. (Why?) Consider a 0-valent initial configuration C 0 adjacent to a 1- valent configuration C 1 : they differ only in the value corresponding to process p

Lemma 1 (2) Lemma 1: There exists an initial configuration that is bivalent Let this process p crash Note that both C 0 and C 1 will lead to the same final configuration with the exception of internal state of p (they were identical, the only difference was determined by p) If decision reached is 1, then C 0 must be bivalent, if decision is 0 then C 1 must be bivalent Thus, there exists an initial configuration that is bivalent

Lemma 2 A bivalent config. let e=(p,m) be an applicable event to this config. Let C be the set of configs. reachable without applying e

Lemma 2 (2) A bivalent config. let e=(p,m) be an applicable event to the config. e e e e e Let C be the set of configs. reachable without applying e Let D be the set of configs. obtained by applying e to a config. in C

Lemma 2 (3) Claim. D contains a bivalent config. Proof. By contradiction. => assume there is no bivalent config in D There are adjacent configs. C 0 and C 1 in C such that C 1 = C 0 followed by e and e =(p,m ) D 0 = C 0 followed by e=(p,m) D 1 =C 1 followed by e=(p,m) D 0 is 0-valent, D 1 is 1-valent (if not, C 0 would be univalent D) i-valent config E i reachable from C exists (because C is bivalent) If E i in C, then F i = e(e i ) Else e was applied reaching E i Either way there exists F i in D for i=0 and 1 both

Lemma 2 (4) Proof. (contd.) e C 0 e Case I: p is not p D 0 C 1 Case II: p same as p e D 1 e bivalent Why? (commutativity) But D 0 is then bivalent! C [don t apply event e=(p,m)] e e e e e D

Lemma 2 (5) Proof. (contd.) Case I: p is not p e C 0 e C 1 D 0 s e Case II: p same as p s e A (e,e) s D 1 E 0 schedule s finite E 1 deciding run from C 0 p takes no steps But A is then bivalent!

Consensus with S Rotating coordinator paradigm [CT 96] Algorithm proceeds in unsynchronized rounds Requires majority of correct processes S only eventually (weakly) accurate False suspicions could lead to two or more subsets of with different decisions Early termination [MR 99]

For Process p i upon propose(v): r 0 // current round while not decided do c (r mod n) + 1 // current coordinator u // value received from coordinator p c or if none if i = c then send(propose, r, v) to all wait for receive(propose, r, v ) from p c or p c D i if (PROPOSE, r, v ) was received then u v send(vote, r, u) to all wait for receive(vote, r, u ) from majority of processes U set of values received in VOTE messages if U = {u } for some u then send(decide, u ) to all else if U = {u, } then v u r r + 1 upon receive(decide, v): if (not decided) decided true send(decide, v) to all decide(v)

Agreement 1. If two servers decide in the same round, then they decide the same value 2. Suppose some server decides v in round r. Then the value v is contained in the propose message of round r and has been locked in the sense that it is not possible for any server in round r > r to decide u v or to assign u v to its v because every two sets of a majority of processes intersect

Termination 1. If some server decides, then every other correct server eventually decides Cf. Reliable Broadcast, and 2. 2. There is some round in which the coordinator p c is not suspected by any server By the failure detector properties All correct servers decide in this round

Discussion How about uniformity? V. Uniform Agreement No two (correct or faulty) processes decide differently Fairness? Reliable Broadcast?

Consensus with Ω Leader-based consensus [MR 01] Uses leader oracle Idea Rotating coordinator tries to look for a leader If oracle is a corresponding abstraction solution can be more effective

For Process p i upon propose(v): r 0 // current round u v // current estimate while not decided do r r + 1 send(phase1, r, u) to all // phase 1 wait for (receive(phase1, r, v ) from p l s.t. l=leader i ) u v send(phase2, r, u) to all // phase 2 wait for (receive(phase2, r, u ) from majority of processes) U set of values u received in vote messages if U = {u } for some u then aux u else aux send(phase3, r, aux) to all // phase 3 wait for (receive(phase3, r, aux ) from majority of processes) if (received (PHASE3, r, aux ) with aux = v ) then u v if (all (PHASE3, r, aux ) messages are such that aux ) then broadcast(decide, u); decided true upon deliver(decide, v): decided true decide(v)

Termination 1. No correct process blocks forever in a round Phase 1: since there is eventually a correct leader Phase 2 and 3: every process sends to all, so majority is received 2. Every correct process decides Eventual leader and 1. imply that there is round r such that a correct process is leader and is seen by every correct process Phase 1: all get estimate of leader Phase 2 and 3: they exchange that estimate, and thus decide

Agreement No two processes decide different values A process decides v during r At the end of phase two, aux must have been v or for any process (there can only be one majority) A process only decides if it received same aux from majority; since sets overlap, at least one such message was received by other (non-faulty at that time) processes, which thus updated its u

Performance [UHSK 04] Paxos (leader-based) outperforms rotating coordinator (four phases [CT 96]) with at least one crash Also with large number of processes and no crashes

References Impossibility of Distributed Consensus with One Faulty Process. M. Fischer, N. Lynch, and M. Paterson. JACM 32(2): 217-246, 1985. Unreliable Failure Detectors for Reliable Distributed Systems. T.D. Chandra and S. Toueg, JACM, 43(2): 225-267, 1996. Solving Consensus using Chandra-Toueg s Unreliable Failure Detectors: A General Quorum-based Approach, A. Mostfaoui and M. Raynal, DISC 99, 49-63, 1999. Leader-based Consensus. A. Mostefaoui and M. Raynal. IPL 11(1): 95-107, 2001. Performance Comparison of a Rotating Coordinator and a Leader Based Consensus Algorithm, P. Urban and N. Hayashibara and A. Schiper and T. Katayama. SRDS 04, 4-17, 2004.