Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Similar documents
Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Distributed Algorithms Time, clocks and the ordering of events

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Time is an important issue in DS

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Clocks in Asynchronous Systems

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Chapter 11 Time and Global States

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Slides for Chapter 14: Time and Global States

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Distributed Systems Fundamentals

Figure 10.1 Skew between computer clocks in a distributed system

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Ordering and Consistent Cuts Nicole Caruso

Chandy-Lamport Snapshotting

Time. Today. l Physical clocks l Logical clocks

Causality and Time. The Happens-Before Relation

CS505: Distributed Systems

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms

Time. To do. q Physical clocks q Logical clocks

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

DISTRIBUTED COMPUTER SYSTEMS

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

CS505: Distributed Systems

7680: Distributed Systems

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Causality and physical time

Distributed Systems Time and Global State

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

On Equilibria of Distributed Message-Passing Games

Section 6 Fault-Tolerant Consensus

Absence of Global Clock

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

Distributed Mutual Exclusion Based on Causal Ordering

Time, Clocks, and the Ordering of Events in a Distributed System

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Automatic Synthesis of Distributed Protocols

Time in Distributed Systems: Clocks and Ordering of Events

Asynchronous Models For Consensus

Seeking Fastness in Multi-Writer Multiple- Reader Atomic Register Implementations

Distributed Systems 8L for Part IB

Distributed Algorithms

The Weakest Failure Detector to Solve Mutual Exclusion

Formal Methods for Monitoring Distributed Computations

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Fault-Tolerant Consensus

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

CS 347 Parallel and Distributed Data Processing

Clock Synchronization

Verifying Programs that use Causally-ordered Message-passing

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

The Underlying Semantics of Transition Systems

Genuine atomic multicast in asynchronous distributed systems

TECHNICAL REPORT YL DISSECTING ZAB

Dynamic Group Communication

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Petri nets. s 1 s 2. s 3 s 4. directed arcs.

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Immediate Detection of Predicates in Pervasive Environments

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

6.852: Distributed Algorithms Fall, Class 24

Do we have a quorum?

arxiv: v1 [cs.dc] 22 Oct 2018

Chapter 4 Deliberation with Temporal Domain Models

Analysis of Bounds on Hybrid Vector Clocks

Reliable Broadcast for Broadcast Busses

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication 1

Early consensus in an asynchronous system with a weak failure detector*

A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Rollback-Dependency Trackability: A Minimal Characterization and Its Protocol

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Outline F eria AADL behavior 1/ 78

Using Happens-Before Relationship to debug MPI non-determinism. Anh Vo and Alan Humphrey

arxiv: v2 [cs.dc] 18 Feb 2015

Monitoring Functions on Global States of Distributed Programs. The University of Texas at Austin, Austin, Texas June 10, 1994.

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

Logic Model Checking

6.852: Distributed Algorithms Fall, Class 10

}w!"#$%&'()+,-./012345<ya FI MU. Decidable Race Condition for HMSC. Faculty of Informatics Masaryk University Brno

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Shared Memory vs Message Passing

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

A leader election algorithm for dynamic networks with causal clocks

Checking Behavioral Conformance of Artifacts

Design of a Sliding Window over Asynchronous Event Streams

Consensus and Universal Construction"

CS5412: REPLICATION, CONSISTENCY AND CLOCKS

I R I S A P U B L I C A T I O N I N T E R N E N o VIRTUAL PRECEDENCE IN ASYNCHRONOUS SYSTEMS: CONCEPT AND APPLICATIONS

Asynchronous Communication 2

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at h9p://people.rennes.inria.fr/eric.fabre/

Valency Arguments CHAPTER7

Transcription:

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong

Outline Introduction Asynchronous distributed systems, distributed computations, consistency Two different strategies to construct global states Monitor passively observes the system (reactivearchitecture) Monitor actively interrogates the system (snapshot protocol) Properties of global predicates Sample applications: deadlock detection and debugging

Introduction global state = union of local states of individual processes many problems in distributed computing require: construction of a global state and evaluation of whether the state satisfies some predicate difficulties: uncertainties in message delays relative speeds of computations global state obtained can be obsolete, incomplete, or inconsistent

Distributed Systems collection of sequential processes p 1, p 2,, p n unidirectional communication channels between pairs of processes reliable channels messages may be delivered out of order network strongly connected (not necessarily completely)

Asynchronous Distributed Systems no bounds on relative process speeds no bounds on message delays no synchronized local clocks communication is the only possible mechanism for synchronization

Distributed Computations distributed program executed by a collection of processes each process executes a sequence of events communication through events send(m) and receive(m), m as message identifier

Distributed Computations h i = e i1 e i2 local history of process p i canonical enumeration total order imposed by sequential h execution h ik = e i1 e i2 e i k initial prefix of h i containing first k events H = h 1 U U h n global history containing all events does not specify relative timing between events

Distributed Computations to order events, define binary relation to capture cause-and-effect : ee if and only if e causally precedes e concurrent events: neither ee nor e e, write e e distributed computation = partially ordered set defined by (H, )

Distributed Computations e 21 e 36 ; e 22 e 3 6

Global States, Cuts and Runs i k local state of process p i after event e i k = ( 1,, n ) global state of distributed computation n-tuple of local states cut C = h 1 c 1 U U hn c n or (c1,, c n ) subset of global history H

Global States, Cuts and Runs ( 1 c 1,,n c n) global state correspond to cut C (e 1 c 1,,en c n) frontier of cut C set of last events run a total ordering R including all events in global history consistent with each local history

Global States, Cuts and Runs cut C = (5,2,4); cut C = (3,2,6) a consistent run R = e 31 e 11 e 32 e 21 e 33 e 34 e 22 e 12 e 35 e 13 e 14 e 15 e 36 e 23 e 1 6

Consistency cut C is consistent if for all events e and e closed under the causal precedence relation consistent global state corresponds to a consistent cut run R is consistent if for all events, ee implies e appears before e in R

Consistency run R = e 1 e 2 results in a sequence of global states 0 1 2 i is obtained from i-1 by some process executing event e i, or i-1 leads to i denote the transitive closure of the leadsto relation by ~> R is reachable from in run R iff~> R

Lattice of Global States lattice = set of all consistent global states, along with leads-to relation k 1 k n = shorthand for global state ( 1 k 1,,n k n) k 1 + + k n = level of lattice

Lattice of Global States path = sequence of global states of increasing level (downwards) each path corresponds to a consistent run a possible path: 00 01 11 21 31 32 42 43 44 54 64 65

Observing Distributed Computations (reactive-architecture) processes notify monitor process p 0 whenever they execute an event monitor constructs observation as the sequence of events corresponding to the notification messages problem: observation may be inconsistent due to variability in notification message delays

Observing Distributed Computations

Observing Distributed Computations any permutation of run R is a possible observation we need: delivery rule at monitor process to restore message order we have First-In-First-Out (FIFO) delivery using sequence number for all source-destination pair p i, p j : send i (m) send i (m ) => deliver j (m) deliver j (m )

Delivery Rule 1 assume: global real-time clock message delays bound by process includes timestamp (real-time clock value) when notifying p 0 of local event e DR1: At time t, deliver all received messages with timestamps up to t in increasing timestamp order

Delivery Rule 1 let RC(e) denotes value of global clock when e is executed real-time clock satisfies Clock Condition: ee => RC(e) < RC(e ) but logical clocks also satisfies clock condition

Logical Clocks event orderings based on increasing clock values LC(e i ) denotes value of logical clock when e i is executed by p i each sent message m contains timestamp TS(m) update rules by p i at occurrence of e i :

Logical Clocks

Delivery Rule 2 replace real-time clock by logical clock need gap-detection property: given events e, e where LC(e) < LC(e ), determine if some event e exists such that LC(e) < LC(e ) < LC(e ) message is stable at p if no future messages with timestamps smaller than TS(m) can be received by p

Delivery Rule 2 with FIFO, when p 0 receives m from p i with timestamp TS(m), can be certain no other message m from p i with TS(m ) TS(m) message m at p 0 guaranteed stable when p 0 has received at least one message from all other processes with timestamps > TS(m) DR2: Deliver all received messages that are stable at p 0 in increasing timestamp order

Strong Clock Condition DR1, DR2 assume RC(e) < RC(e ) (or LC(e) < LC(e )) => e e recall RC and LC guarantee clock condition: e e => RC(e) < RC(e ) DR1, DR2 can unnecessarily delay delivery want timing mechanism TC that gives Strong Clock Condition: ee TC(e) < TC(e )

Timing Mechanism 1 - Causal Histories causal history as clock value set of all events that causally precede event e: smallest consistent cut that includes e projection of (e) on process p i : i (e) =(e)h i

Timing Mechanism 1 - Causal Histories

Timing Mechanism 1 - Causal Histories To maintain causal histories: initially empty if e i is an internal or send event (e i ) = {e i } U (previous local event of p i ) if e i = receive of message m by p i from p j (e i ) = {e i } U (previous local event of p i ) U (corresponding send event at p j )

Timing Mechanism 1 - Causal Histories new send event: new event e 1 5 new receive event: new event e 2 3

Timing Mechanism 1 - Causal Histories can interpret clock comparison as set inclusion: ee (e) (e ) (why not set membership, [ee e (e )]?) unfortunately, causal histories grow too rapidly

Timing Mechanism 2 - Vector Clocks note: projection i (e) = h ik for some unique k e ir i (e) for all r < k can use single number k to represent i (e) (e) = 1 (e) U U n (e) represent entire causal history by n-dimensional vector clock VC(e), where for all 1 in VC(e)[i] = k, if and only if i (e) = h i k

Timing Mechanism 2 - Vector Clocks

Timing Mechanism 2 - Vector Clocks To maintain vector clock: each process p i initializes VC to contain all zeros update rules by p i at occurrence of e i : VC(e i )[i] number of events p i has executed up to and including e i VC(e i )[j] number of events of p j that causally precede event e i of p i

Timing Mechanism 2 - Vector Clocks causal histories vector clocks new send event: new receive event:

Vector Clock Comparison Define less than relation: V < V (V V ) ( 1 kn: V[k] V [k])

Properties of Vector Clocks 1. Strong Clock Condition: ee VC(e) < VC(e ) 2. Simple Strong Clock Condition: given event e i of p i and event e j of p j, ij e i e j VC(e i )[i] VC(e j )[i]

Properties of Vector Clocks 3. Test for Concurrency: given event e i of p i and event e j of p j e i e j (VC(e i )[i] > VC(e j )[i]) (VC(e j )[j] > VC(e i )[j]) 4. Pairwise Inconsistent: given event e i of p i and e j of p j, ij if e i, e j cannot belong to the frontier of the same consistent cut (VC(e i )[i] < VC(e j )[i]) (VC(e j )[j] < VC(e i )[j]) (concurrent)

Properties of Vector Clocks 5. Consistent Cut: frontier contains no pairwise inconsistent events VC(e i c i)[i] VC(ej c j)[i], 1 i, jn 6. Counting # of events causally precede e i : #(e i ) = ( j=1.. n VC(e i )[j]) 1 # events = 4+1+3-1 = 7

Properties of Vector Clocks 7. Weak Gap-Detection: given event e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some kj, there exists event e k such that (e k e i ) (e k e j )

Causal Delivery and Vector Clocks assume processes increment local component of VC only for events notified to monitor p 0 p 0 maintains set M for messages received but not yet delivered suppose we have: message m from p j m = last message delivered from process p k, kj

Causal Delivery and Vector Clocks To deliver m, p 0 must verify: 1. no earlier message from p j is undelivered (i.e. TS(m)[j] 1 messages have been delivered from p j ) 2. no undelivered message m from p k s.t. send k (m )send k (m )send j (m), kj (i.e. whether TS(m )[k] TS(m)[k] for all k)

Causal Delivery and Vector Clocks p 0 maintains array D[1 n] where D[i] = TS(m i )[i], m i being last message delivered from p i e.g. on right, delivery of m is delayed until m is received and delivered P 0 P k P j 0 0 1 0 2 0 2 1 m' m ' (0, 0) (1, 0) (2, 0) m (2, 1) (0, 0) (1, 0) (2, 0) (2, 1)

Delivery Rule 3 Causal Delivery: for all messages m, m, sending processes p i, p j and destination process p k send i (m) send j (m ) => deliver k (m) deliver k (m ) DR3 (Causal Delivery): Deliver message m from process p j as soon as D[j] = TS(m)[j] 1, and D[k] TS(m)[k], kj p 0 set D[j] to TS(m)[j] after delivery of m

Causal Delivery and Hidden Channels should apply to closed systems incorrect conclusion with hidden channels (communication channel external to the system)

Active Monitoring - Distributed Snapshots monitor p 0 requests states of other processes and combine into global state assume channels implement FIFO delivery channel state i,j for channel p i to p j : messages sent by p i not yet received by p j

Distributed Snapshots notations: IN i = set of processes having direct channels to p i OUT i = set of processes to which p i has a channel for each execution of the snapshot protocol, process p i record its local state i and the states of its incoming channels ( j,i for all p j IN i )

Distributed Snapshots Snapshot Protocol (Chandy-Lamport) 1. p 0 starts the protocol by sending itself a take snapshot message 2. when receiving the take snapshot message for the first time from process p f p i records local state i and relays the take snapshot message along all outgoing channels channel state f,i is set to empty p i starts recording messages on other incoming channels

Distributed Snapshots Snapshot Protocol (Chandy-Lamport) 3. when receiving the take snapshot message beyond the first time from process p s : p i stops recording messages along channel from p s channel state s,i are messages that have been recorded

Distributed Snapshots p 1 done p 2 done dash arrows indicate take snapshot messages constructed global state: 23 ; 1,2 empty; 2,1 = {m}

Properties of Snapshots Let s = global state constructed a = global state when protocol initiated f = global state when protocol terminated s is guaranteed to be consistent actual run that the system followed may not pass through s but a run R such that a ~> R s ~> R f

Properties of Snapshots a = 21 f = 55 r does not pass through s ( = 23 )

Properties of Snapshots but 21 ~> 23 ~> 55

Properties of Global Predicates Now we have two methods for global predicate evaluation: monitor passively observing runs monitor actively constructing snapshots utility of either approach depends (in part) on properties of the predicate

Stable Predicates communication delays => s can only reflect some past state of the system stable predicate: once become true, remain true e.g. deadlock, termination, loss of all tokens, unreachable storage if is stable, then ( is true in s ) => ( is true in f ) and ( is false in s ) => ( is false in a )

Stable Predicates deadlock detection through snapshots (p.29, 30)

Stable Predicates deadlock detection using reactive protocol (p.31, 32)

Nonstable Predicates e.g. debugging, checking if queue lengths exceed some thresholds Two problems: 1. condition may not persist long enough for it to be true when the predicate is evaluated 2. if a predicate is found true, do not know whether ever held during the actual run

Nonstable Predicates e.g. monitoring condition (x = y) 7 states where (x = y) holds but no longer hold after state 54 e.g. (y x) = 2 condition hold only in 31 and 41 monitor might detect (y - x) = 2 even if actual run never goes through 31 or 41

Nonstable Predicates very little value to detect nonstable predicate

Nonstable Predicates With observations, can extend predicates: Possibly(): There exist a consistent observation O of the computation such that holds in a global state of O Definitely(): For every consistent observation O of the computation, there exists a global state of O in which holds e.g. Possibly((y x) = 2), Definitely(x = y)

Nonstable Predicates use of extended predicate in debugging: if = some erroneous state, then Possibly() indicates a bug, even if it is not observed during an actual run if predicate is stable, then Possibly() Definitely()

Detecting Possibly and Definitely detection based on the lattice of consistent global states If any global state in the lattice satisfies, then Possibly() holds Definitely() requires all possible runs to pass through a global state that satisfies

Detecting Possibly and Definitely Possibly((y x) = 2) Definitely(y = x) (why?)

Detecting Possibly and Definitely set of global state current with progressively increasing levels any member of current satisfies => Possibly() true

Detecting Possibly and Definitely iteratively construct set of global states of level l without passing through a state that satisfies set empty => Definitely() true set contains the final state => Definitely() true

Conclusions many distributed system problems require recognizing certain global conditions two approaches to constructing global states: reactive-architecture based snapshot based timing mechanism that captures causal precedence relation applying to distributed deadlock detection and debugging solutions can be adapted to deal with nonstable predicates, multiple observations and failures