Unreliable Failure Detectors for Reliable Distributed Systems

Similar documents
Consensus when failstop doesn't hold

Eventually consistent failure detectors

Failure detectors Introduction CHAPTER

Early consensus in an asynchronous system with a weak failure detector*

Asynchronous Models For Consensus

Failure Detection and Consensus in the Crash-Recovery Model

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Shared Memory vs Message Passing

CS505: Distributed Systems

The Weakest Failure Detector to Solve Mutual Exclusion

Failure detection and consensus in the crash-recovery model

Do we have a quorum?

Fault-Tolerant Consensus

Asynchronous Leasing

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

On the weakest failure detector ever

The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Section 6 Fault-Tolerant Consensus

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Unreliable Failure Detectors for Reliable Distributed Systems

Consensus. Consensus problems

Easy Consensus Algorithms for the Crash-Recovery Model

Failure Detectors. Seif Haridi. S. Haridi, KTHx ID2203.1x

On the weakest failure detector ever

Optimal Resilience Asynchronous Approximate Agreement

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

Impossibility of Distributed Consensus with One Faulty Process

Genuine atomic multicast in asynchronous distributed systems

CS505: Distributed Systems

Distributed Consensus

Simple Bivalency Proofs of the Lower Bounds in Synchronous Consensus Problems

Randomized Protocols for Asynchronous Consensus

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

Synchrony Weakened by Message Adversaries vs Asynchrony Restricted by Failure Detectors

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

Consensus and Universal Construction"

A Weakest Failure Detector for Dining Philosophers with Eventually Bounded Waiting and Failure Locality 1

Distributed Systems Byzantine Agreement

CS505: Distributed Systems

Seeking Fastness in Multi-Writer Multiple- Reader Atomic Register Implementations

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value

How can one get around FLP? Around FLP in 80 Slides. How can one get around FLP? Paxos. Weaken the problem. Constrain input values

A Realistic Look At Failure Detectors

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

I R I S A P U B L I C A T I O N I N T E R N E THE NOTION OF VETO NUMBER FOR DISTRIBUTED AGREEMENT PROBLEMS

Clocks in Asynchronous Systems

Dynamic Group Communication

THE chase for the weakest system model that allows

Weakening Failure Detectors for k-set Agreement via the Partition Approach

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

On the Efficiency of Atomic Multi-Reader, Multi-Writer Distributed Memory

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures

Generic Broadcast. 1 Introduction

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms

Crash-resilient Time-free Eventual Leadership

Timeliness, Failure-Detectors, and Consensus Performance

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Reliable Broadcast for Broadcast Busses

Colorless Wait-Free Computation

Time Free Self-Stabilizing Local Failure Detection

Perfect Failure Detection with Very Few Bits

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Distributed Computing in Shared Memory and Networks

arxiv:cs/ v1 [cs.dc] 29 Dec 2004

Model Checking of Fault-Tolerant Distributed Algorithms

Wait-Free Dining Under Eventual Weak Exclusion

Approximation of δ-timeliness

Formal Analysis of Consensus Protocols in Asynchronous Distributed Systems

LOWER BOUNDS FOR RANDOMIZED CONSENSUS UNDER A WEAK ADVERSARY

The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease. Presenter: Jose Calvo-Villagran

On the Number of Synchronous Rounds Required for Byzantine Agreement

Can an Operation Both Update the State and Return a Meaningful Value in the Asynchronous PRAM Model?

1 Lamport s Bakery Algorithm

How to solve consensus in the smallest window of synchrony

Asynchronous Implementation of Failure Detectors with partial connectivity and unknown participants

Chapter 11 Time and Global States

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Tolerating Permanent and Transient Value Faults

DEXON Consensus Algorithm

The Heard-Of model: computing in distributed systems with benign faults

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

On Equilibria of Distributed Message-Passing Games

Self-stabilizing Byzantine Agreement

A Short Introduction to Failure Detectors for Asynchronous Distributed Systems

Crashed router. Instructor s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 Pearson Education

Byzantine agreement with homonyms

Timeliness, Failure-Detectors, and Consensus Performance ALEXANDER SHRAER

Causal Broadcast Seif Haridi

Byzantine Agreement in Polynomial Expected Time

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Concurrent Non-malleable Commitments from any One-way Function

CS 347 Parallel and Distributed Data Processing

Valency Arguments CHAPTER7

Transcription:

Unreliable Failure Detectors for Reliable Distributed Systems

A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms of abstract properties, not specific implementations Identify classes of failure detectors that allow to solve Consensus

The Model General asynchronous system processes fail by crashing a failed process does not recover Failure Detectors outputs set of processes that it currently suspects to have crashed the set may be different for different processes

Completeness Strong Completeness Eventually every process that crashes is permanently suspected by every correct process Weak Completeness Eventually every process that crashes is permanently suspected by some correct process

Accuracy Strong Accuracy No correct process is ever suspected Weak Accuracy Some correct process is never suspected

Accuracy Strong Accuracy No correct process is ever suspected Weak Accuracy Some correct process is never suspected Eventual Strong Accuracy There is a time after which no correct process is ever suspected Eventual Weak Accuracy There is a time after which some correct process is never suspected

Failure detectors Accuracy Completeness Strong Weak Eventual strong Eventual weak Strong Perfect P. Strong S. Weak Quasi... Q Weak W. P Q S W

Reducibility D T D D T D D transforms failure detector. D into failure detector D If we can transform D into D then we say that D is stronger than D and that D is reducible to D Algorithm A uses D If D D and D D then we say that D and D are equivalent: D D

Simplify, Simplify! All weakly complete failure detectors are reducible to strongly complete failure detectors P Q, S W, P Q, S W

Simplify, Simplify! All weakly complete failure detectors are reducible to strongly complete failure detectors P Q, S W, P Q, S W All strongly complete failure detectors are reducible to weakly complete failure detectors (!) Q P, W S, Q P, W S Weakly and strongly complete failure detectors are equivalent!

From Weak Completeness to Strong Completeness Every process p executes the following: output p := 0 cobegin Task 1: repeat forever { p queries its local failure detector module D p } suspects p := D p send ( p, suspects p ) to all Task 2: when receive( q, suspects q ) from some q output p := ( output p suspects p ) - {q} coend

The Theorems Theorem 1 In an asynchronous system with W, consensus can be solved as long as f n 1

The Theorems Theorem 1 In an asynchronous system with W, consensus can be solved as long as f n 1 Theorem 2 There is no f-resilient consensus protocol using P for f n/2

The Theorems Theorem 1 In an asynchronous system with W, consensus can be solved as long as f n 1 Theorem 2 There is no f-resilient consensus protocol using P for f n/2 Theorem 3 In asynchronous systems in which processes can use W, consensus can be solved as long as f <n/2

The Theorems Theorem 1 In an asynchronous system with W, consensus can be solved as long as f n 1 Theorem 2 There is no f-resilient consensus protocol using P for f n/2 Theorem 3 In asynchronous systems in which processes can use W, consensus can be solved as long as f <n/2 Theorem 4 A failure detector can solve consensus only if it satisfies weak completeness and eventual weak accuracy i.e. W is the weakest failure detector that can solve consensus.

Solving consensus using. S S : Strong Completeness, Weak Accuracy at least some correct process c is never suspected Each process p has its own failure detector Input values are chosen from the set {0,1}

Notation We introduce the operators,, They operate element-wise on vectors whose entries have values from the set {0, 1, } v = v v v = v = v v v = v v = v v = v v = v = v = v = v = = Given two vectors A and B, we write A B if A[i] implies B[i]

Solving Consensus using any. D S 1: V p := (,...,, v p,,..., ) {p s estimate of the proposed values} 2: p := (,...,, v p,,..., ) 3: {phase 1} {asynchronous rounds r p, 1 r p n 1} 4: for r p := 1 to n 1 5: send (r p, p, p) to all 6: wait until [ : q received (r p, q, q) or q D p ] {query the failure detector} 7: O p := V p 8: := V p ( q received q ) V p 9: p := V p O p {value is only echoed the first time it is seen} 10: {phase 2} 11: send (r p, V p, p) to all 12: wait until [ : q received (r p, V q, q) or q D p ] 13: V p := q received V q {computes the intersection, including V p } 14: {phase 3} 15: decide on leftmost non- coordinate of V p

A useful Lemma 1: Vp := (,,, vp,,, ) {p s estimate of the proposed values} 2: Δ p := (,,, v p,,, ) 3: {phase 1} {asynchronous rounds r p, 1 r p n 1} 4: for rp := 1 to n-1 5: send (rp, Δp,p) to all 6: wait until [ q: received (r p, Δ q,q) or q D p ] Lemma 1 After phase 1 is complete,. V c V p for all processes p that complete phase 1 7: O p := V p 8: Vp := Vp ( q received Δq) 9: Δ p := V p O p {value is only echoed first time it is seen} 10: {phase 2} 11: send (r p, V p,p) to all 12: wait until [ q: received (r p, V q,q) or q D p ] 13: V p := q received V q {computes the intersection, including V p } 14: {phase 3} 15: decide on leftmost non- coordinate of Vp

A useful Lemma 1: Vp := (,,, vp,,, ) {p s estimate of the proposed values} 2: Δ p := (,,, v p,,, ) 3: {phase 1} {asynchronous rounds r p, 1 r p n 1} 4: for rp := 1 to n-1 5: send (rp, Δp,p) to all 6: wait until [ q: received (r p, Δ q,q) or q D p ] 7: O p := V p 8: Vp := Vp ( q received Δq) 9: Δ p := V p O p {value is only echoed first time it is seen} 10: {phase 2} 11: send (r p, V p,p) to all 12: wait until [ q: received (r p, V q,q) or q D p ] 13: V p := q received V q {computes the intersection, including V p } 14: {phase 3} 15: decide on leftmost non- coordinate of Vp Lemma 1 After phase 1 is complete,. V c V p for all processes p that complete phase 1 Proof We show that V c [i] = v i v i p : V p [i] = v i Let r be the first round when c sees vi. r n 2 will send to all c with v i in round r By weak accuracy, all correct processes receive v i by next round. r =n 1 v i has been forwarded n 1 times: every other process has seen v i

Two additional cool lemmas 1: Vp := (,,, vp,,, ) {p s estimate of the proposed values} 2: Δ p := (,,, v p,,, ) 3: {phase 1} {asynchronous rounds r p, 1 r p n 1} 4: for rp := 1 to n-1 5: send (rp, Δp,p) to all 6: wait until [ q: received (r p, Δ q,q) or q D p ] 7: O p := V p 8: Vp := Vp ( q received Δq) 9: Δ p := V p O p {value is only echoed first time it is seen} 10: {phase 2} 11: send (r p, V p,p) to all 12: wait until [ q: received (r p, V q,q) or q D p ] 13: V p := q received V q {computes the intersection, including V p } 14: {phase 3} 15: decide on leftmost non- coordinate of Vp Lemma 2 After phase 2 is complete, V c = V p for each p that completes phase 2 Proof All processes that completed phase 2 have received V c. Since. V c is the smallest V vector, V c [i] V p [i] p By the definition of V c [i] = V p [i] = p after phase 2 Lemma 3 V c (,,,..., )

Solving consensus 1: Vp := (,,, vp,,, ) {p s estimate of the proposed values} 2: Δ p := (,,, v p,,, ) 3: {phase 1} {asynchronous rounds r p, 1 r p n 1} 4: for rp := 1 to n-1 5: send (rp, Δp,p) to all 6: wait until [ q: received (r p, Δ q,q) or q D p ] 7: O p := V p 8: Vp := Vp ( q received Δq) Theorem The protocol to the left satisfies Validity, Agreement, and Termination Proof Left as an exercise 9: Δ p := V p O p {value is only echoed first time it is seen} 10: {phase 2} 11: send (r p, V p,p) to all 12: wait until [ q: received (r p, V q,q) or q D p ] 13: V p := q received V q {computes the intersection, including V p } 14: {phase 3} 15: decide on leftmost non- coordinate of Vp

A lower bound - I Theorem Consensus with P requires f <n/2

A lower bound - I Theorem Consensus with P requires f <n/2 Proof Suppose n is even, and a protocol exists that solves consensus when f =n/2 Divide the set of processes in two sets of size n/2, P 1 and P 2

A lower bound - II Consider three executions: P 1 0; P 2 0 All processes in. crash before they can propose Detectors work perfectly P 2

A lower bound - II Consider three executions: P 1 0; P 2 0 All processes in. crash before they can propose Detectors work perfectly P 2 P 1 decides 0 after. t 1

A lower bound - II Consider three executions: P 1 0; P 2 0 All processes in. crash before they can propose Detectors work perfectly P 2 P 1 1; P 2 1 All processes in P. 1 crash before they can propose Detectors work perfectly P 1 decides 0 after. t 1

A lower bound - II Consider three executions: P 1 0; P 2 0 All processes in. crash before they can propose Detectors work perfectly P 2 P 1 1; P 2 1 All processes in P. 1 crash before they can propose Detectors work perfectly P 1 decides 0 P2 decides 1 after t 1. after t 2.

A lower bound - II Consider three executions: P 1 0; P 2 0 P 1 0; P 2 1 P 1 1; P 2 1 All processes in P 2. crash before they can propose Detectors work perfectly No process crashes Detectors make mistakes: until max(t 1, t 2 ), P 1. believes P 2 crashed, and vice versa All processes in P. 1 crash before they can propose Detectors work perfectly P 1 decides 0 P2 decides 1 after t 1. after t 2.

A lower bound - II Consider three executions: P 1 0; P 2 0 P 1 0; P 2 1 P 1 1; P 2 1 All processes in P 2. crash before they can propose Detectors work perfectly No process crashes Detectors make mistakes: until max(t 1, t 2 ), P 1. believes P 2 crashed, and vice versa All processes in P. 1 crash before they can propose Detectors work perfectly P 1 decides 0 P 1 decides 0 P2 decides 1 P 2 decides 1 after t 1. after t 2.

The case of the Rotating Coordinator Solving consensus with W (actually, S ) Asynchronous rounds In round r, only messages timestamped r are sent and processed (except for DECIDE messages) Each process p has an opinion v p {0, 1} Each opinion has a time of adoption. t p = 0 Each round has a coordinator c such that c id = (r mod n)+1 t p (initially,

One round, four phases Phase 1 Each process, including c, sends its opinion timestamped r to c

One round, four phases Phase 1 Each process, including c, sends its opinion timestamped r to c Phase 2 c c v c waits for first n/2 + 1 opinions with timestamp r selects v, one of the most recently adopted opinions becomes c s suggestion for round r sends its suggestion to all

One round, four phases Phase 1 Each process, including c, sends its opinion timestamped Phase 2 c c v c waits for first n/2 + 1 opinions with timestamp r selects v, one of the most recently adopted opinions becomes c s suggestion for round r sends its suggestion to all Phase 3 Each p waits for a suggestion, or for failure detector to signal c is faulty If a suggestion is received, it is adopted: v p := v ; t p := r; ACK to c Otherwise, NACK to c r to c

One round, four phases Phase 1 Each process, including c, sends its opinion timestamped Phase 2 c c v c waits for first n/2 + 1 opinions with timestamp r selects v, one of the most recently adopted opinions becomes c s suggestion for round r sends its suggestion to all Phase 3 Each p waits for a suggestion, or for failure detector to signal c is faulty If a suggestion is received, it is adopted: v p := v ; t p := r; ACK to c Otherwise, NACK to c Phase 4 c waits for first n/2 + 1 responses if all ACKs, then c decides on v and sends DECIDE to all if p receives DECIDE, then p decides on v r to c

Consensus using S v p := input bit; r := 0; t p := 0; state p := undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send opinion to current coordinator} p sends {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, v q, t q ) c selects among them the value v q with the largest c sends (c, r, v q ) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c S p if suggestion is received then { v p := v; t p := r; p sends ( r, ACK) to c} else (p, r, v p, t p ) to c p sends ( r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 ( r, ACK) or ( r, NACK) if c receives n/2+1 ACKs, then c sends ( r, DECIDE, v) to all when p delivers ( r, DECIDE, v ) then { p decides v ; state p := decided} t q

Validity vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} The value decided upon must have been suggested by the coordinator in some round A coordinator suggests a value only by selecting it among the participants opinions From the algorithm, it is clear that each opinion correspond to a value proposed by some process

Agreement vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} Strong Agreement All processes that decide, decide the same value Proof Trivially true if no process decides If some process decides, it has delivered (-, DECIDE, -) from a coordinator The coordinator has received a majority of (-, ACK) Let r be the earliest round in which a majority of (-, ACK) have been sent to the coordinator c of r Let v c be the value suggested by c in Phase 2 of round r Enter the Locking Lemma!

The Locking Lemma - I vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} Locking Lemma For all rounds :. r r if a coordinator c sends v c, then v c = v c Proof Trivially holds for r = r Assume it holds for all r : r r < k Let c k be the coordinator for round k If c k suggests v ck, it must have received opinions from a majority of processes There exists some p that sent an ACK in Phase 3 of round r and whose opinion has been received by c k Consider the time of adoption t p In Phase 3 of round r, t p = r In Phase 2 of round k, t p r For any collected in round k, t q < k t q r

The Locking Lemma - II vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} Consider t, the largest time of adoption collected by c k. Clearly, r t < k c k adopted its suggestion from q, where q is the process that sent (q, k, v q, t) The coordinator of round t sent its suggestion in Phase 2 of round t, where r t < k By the Induction Hypothesis, that coordinator sent! Then, sets to c k v ck v c v c Been there, done that?

Agreement vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} All processes that decide, decide Proof v c Suppose p delivers ( r, DECIDE, v c ) The coordinator c for round r has sent ( r, DECIDE, v ) in Phase 4 of round r c To do so c must have received a majority of ( r,ack) in Phase 4 of r r is the earliest round in which a majority of ( r, ACK) have been sent to a round s coordinator Clearly, r r By the locking Lemma, c must have suggested the locked value: v c = v c

Termination vp := input bit; r := 0; tp := 0; statep:= undecided while p undecided do r := r+1 c := (r mod n) + 1 {phase 1: all processes send their opinion to current coordinator} p sends (p, r, vp, tp) to c {phase 2: current coordinator gather a majority of opinions} c waits for first n/2+1 opinions (q, r, vq, tq) c selects among them the value vq with largest tq c sends (c, r, vq) to all {phase 3: all processes wait for new suggestions from the current coordinator} p waits until suggestion (c, r, v) arrives or c Sp if the suggestion is received then {vp := v; tp := r; p sends (r, ACK) to c } else p sends (r, NACK) to c {phase 4: coordinator waits for majority of replies. If majority adopted the coordinator s suggestion, then coordinator sends request to decide} c waits for first n/2+1 (r, ACK) or (r, NACK) if c receives n/2+1 ACKs, then c sends (r, DECIDE, v) to all when p delivers (r, DECIDE, v) then {p decides v ; statep := decided} No correct process is blocked forever at a wait statement By eventual weak accuracy, there is a correct process c and a time t such that no process suspects c after t There is a round r such that: all correct processes reach r after time t (no one suspects c) c is the coordinator for round r If some correct process decides, eventually all do on the same value by Agreement