Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Similar documents
Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Distributed Algorithms Time, clocks and the ordering of events

Slides for Chapter 14: Time and Global States

Time is an important issue in DS

Clocks in Asynchronous Systems

Figure 10.1 Skew between computer clocks in a distributed system

Agreement. Today. l Coordination and agreement in group communication. l Consensus

CS505: Distributed Systems

Chapter 11 Time and Global States

CS505: Distributed Systems

Time. Today. l Physical clocks l Logical clocks

Clock Synchronization

Distributed Systems Fundamentals

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

CS 347 Parallel and Distributed Data Processing

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

Time. To do. q Physical clocks q Logical clocks

Do we have a quorum?

Time in Distributed Systems: Clocks and Ordering of Events

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson

Distributed Systems Principles and Paradigms

7680: Distributed Systems

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

DISTRIBUTED COMPUTER SYSTEMS

Distributed Systems Time and Global State

Absence of Global Clock

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Distributed Systems 8L for Part IB

arxiv: v1 [cs.dc] 22 Oct 2018

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

Time, Clocks, and the Ordering of Events in a Distributed System

Chandy-Lamport Snapshotting

On Equilibria of Distributed Message-Passing Games

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

Ordering and Consistent Cuts Nicole Caruso

CS5412: REPLICATION, CONSISTENCY AND CLOCKS

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Distributed Algorithms

Early consensus in an asynchronous system with a weak failure detector*

Causality and Time. The Happens-Before Relation

High Performance Computing

Section 6 Fault-Tolerant Consensus

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

How to deal with uncertainties and dynamicity?

Distributed Systems. Time, Clocks, and Ordering of Events

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Asynchronous Models For Consensus

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Failure detectors Introduction CHAPTER

6.852: Distributed Algorithms Fall, Class 24

Fault-Tolerant Consensus

arxiv: v2 [cs.dc] 18 Feb 2015

Causality and physical time

Dynamic Group Communication

The Weakest Failure Detector to Solve Mutual Exclusion

6.852: Distributed Algorithms Fall, Class 10

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Shared Memory vs Message Passing

DES. 4. Petri Nets. Introduction. Different Classes of Petri Net. Petri net properties. Analysis of Petri net models

Distributed Mutual Exclusion Based on Causal Ordering

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

Automatic Synthesis of Distributed Protocols

Seeking Fastness in Multi-Writer Multiple- Reader Atomic Register Implementations

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

Parallel Performance Evaluation through Critical Path Analysis

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

Byzantine Agreement. Gábor Mészáros. CEU Budapest, Hungary

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Genuine atomic multicast in asynchronous distributed systems

Asynchronous Communication 2

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

TTA and PALS: Formally Verified Design Patterns for Distributed Cyber-Physical

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

CS505: Distributed Systems

7. Queueing Systems. 8. Petri nets vs. State Automata

Valency Arguments CHAPTER7

Distributed Systems. Time, clocks, and Ordering of events. Rik Sarkar. University of Edinburgh Spring 2018

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Unreliable Failure Detectors for Reliable Distributed Systems

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

The Weighted Byzantine Agreement Problem

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Using Happens-Before Relationship to debug MPI non-determinism. Anh Vo and Alan Humphrey

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

Distributed Deadlock-Avoidance. IMDEA Software Institute, Spain

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P

Overview: Synchronous Computations

I Can t Believe It s Not Causal! Scalable Causal Consistency with No Slowdown Cascades

Eventually consistent failure detectors

THE WEAKEST FAILURE DETECTOR FOR SOLVING WAIT-FREE, EVENTUALLY BOUNDED-FAIR DINING PHILOSOPHERS. A Dissertation YANTAO SONG

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries

Transcription:

Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions: 1. Synchronous System 2. Asynchronous System No upper bound on message delivery No bound on relative process speeds No centralized clock External Clock Synchronization: keep processor clock within some maximum deviation from an external source. can exchange of info about timing events of different systems can take actions at real- deadlines synchronization within 0.1 ms Internal Clock Synchronization: keep processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system

Synchronizion clocks: Take 1 Assume an upper bound max and a lower bound min on message delivery Guarantee that processes stay synchronized within max - min Synchronizion clocks: Take 1 Assume an upper bound max and a lower bound min on message delivery Guarantee that processes stay synchronized within max - min Problem: % of messages 5000 message run (IBM Almaden) 4.22 4.48 Time (ms) 93.17 Clock Synchronization: Take 2 Probabilistic Clock Synchronization (Cristian) No upper bound on message delivery......but lower bound min on message delivery Use out maxp to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4 maxp - min Master-Slave architecture Master is connected to external source Slaves read master s clock and adjust their own How accurately can a slave read the master s clock?

The Idea Asynchronous systems Clock accuracy depends on message roundtrip if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... but very accurate reading can be achieved by repeated attempts Weakest possible assumptions cfr. finite progress axiom Weak assumptions less vulnerabilities Asynchronous! slow Interesting model wrt failures (ah ah ah!) Client-Server Client-Server Processes exchange messages using Remote Procedure Call (RPC) Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response A client requests a service by sending the server a message. The client blocks while waiting for a response The server computes the response (possibly asking other servers) and returns it to the client #!?%! c s c s

Deadlock! Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds Wait-For Graphs Wait-For Graphs Draw arrow from p i to p j if p j has received Draw arrow from p i to p j if p j has received a request but has not responded yet a request but has not responded yet Cycle in WFG Deadlock deadlock cycle in WFG

The protocol An execution sends a message to... On receipt of s message, replies with its state and wait-for info p i An execution An execution Ghost Deadlock!

Houston, we have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events Mmmmhhh... Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e i p is the i-th event of process p h p The local history of process p is the sequence of events executed by process p h k p h 0 p : prefix that contains first k events : initial, empty sequence The history H is the set h p0 h p1... h pn 1 NOTE: In H, local histories are interpreted as sets, rather than sequences, of events Ordering events Observation 1: Events in a local history are totally ordered Happened-before (Lamport[1978]) A binary relation defined over events p i 1. if e k i, e l i h i and k < l, then e k i el i Observation 2: For every message m, send(m) precedes receive(m) 2. if e i = send(m) and e j = receive(m), then e i e j p i m 3. if e e and e e then e e p j

Space-Time diagrams Space-Time diagrams A graphic representation of a distributed execution A graphic representation of a distributed execution Space-Time diagrams Space-Time diagrams A graphic representation of a distributed execution A graphic representation of a distributed execution

Space-Time diagrams Space-Time diagrams A graphic representation of a distributed execution A graphic representation of a distributed execution H and impose a partial order H and impose a partial order Space-Time diagrams Space-Time diagrams A graphic representation of a distributed execution A graphic representation of a distributed execution H and impose a partial order H and impose a partial order

Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2,..., h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by A single distributed computation may correspond to several consistent runs! Cuts A cut C is a subset of the global history of H C = h c 1 1 hc 2 2... hc n n Cuts Global states and cuts A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1, ec 2 2,... ec n n C = h c 1 1 hc 2 2... hc n n The global state of a distributed computation is an n-tuple of local states Σ = (σ 1,... σ n ) To each cut (c 1... c n ) corresponds a global state (σ c 1 1,... σc n n )

Consistent cuts and consistent global states What sees A cut is consistent if e i, e j : e j C e i e j e i C A consistent global state is one corresponding to a consistent cut What sees Our task Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by but not the corresponding send event Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...

Our approach Snapshot I Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m stamped with with T (send(m)) i. selects t ss ii. sends take a snapshot at to all processes iii. when clock of p i reads t ss then p a. records its local state σ i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with stamp greater than or equal to t ss t ss Snapshot I Correctness i. selects t ss Theorem Snapshot I produces a consistent cut ii. sends take a snapshot at to all processes iii. when clock of p i reads t ss then p a. records its local state σ i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with stamp greater than or equal to t ss t ss Proof Need to prove e j C e i e j e i C < Definition > < 0 and 1> 0. e j C T (e j ) < t ss 3. T (e j ) < t ss < Assumption > 1. e j C < Property of real > 4. e i e j T (e i ) < T (e j ) < Assumption > < 2 and 4> 2. e i e j 5. T (e i ) < T (e j ) < 5 and 3> 6. T (e i ) < t ss < Definition > 7. e i C

Clock Condition Lamport Clocks Each process maintains a local variable LC LC(e) value of LC for event e p e i p e i+1 p LC(e i p) < LC(e i+1 p ) < Property of real > 4. e i e j T (e i ) < T (e j ) p e i p Can the Clock Condition be implemented some other way? q e j q LC(e i p) < LC(e j q) Increment Rules Space-Time Diagrams and Logical Clocks p e i p e i+1 p LC(e i+1 p ) = LC(e i p) + 1 2 3 6 7 8 p e i p 1 7 8 q e j q LC(e j q) = max(lc(e j 1 q ), LC(e i p)) + 1 4 5 6 9 Timestamp m with T S(m) = LC(send(m))

A subtle problem An obvious problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after LC = t t ss No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks Fixes: if e is internal/send and LC = t 2 execute e and then S if e = receive(m) (T S(m) t) (LC t 1) put message back in channel re-enable e ; set LC = t 1; execute S An obvious problem An obvious problem t ss No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks mmmmhhhh... t ss No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks mmmmhhhh... Doing so assumes upper bound on message delivery upper bound relative process speeds We better relax it...

Snapshot II Relaxing synchrony processor selects Ω sends take a snapshot at Ω to all processes and sets its logical clock to Ω p i when clock of reads Ω then records its local state σ i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with stamp greater than or equal to Ω p i p i take a snapshot at Ω Process does nothing for the protocol during this! empty message: T S(m) Ω records local state σ i sends empty message: T S(m) Ω monitors channels Use empty message to announce snapshot! Snapshot III Snapshots: a perspective processor sends itself take a snapshot when p i receives take a snapshot for the first from p j : records its local state σ i sends take a snapshot along its outgoing channels sets channel from p j to empty starts recording messages received over each of its other incoming channels when p i receives take a snapshot beyond the first from p k : stops recording channel from p k p i when has received take a snapshot on all channels, it sends collected state to and stops. The global state saved by the snapshot protocol is a consistent global state

Snapshots: a perspective Snapshots: a perspective The global state saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that could have occurred The global state saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that could have occurred We are evaluating predicates on states that may have never occurred!

Σ 10 Σ 11 Σ 11

Σ 12 Σ 21 Σ 12 Σ 21 Σ 22 Σ 12

Σ 21 Σ 22 Σ 12 Σ 21 Σ 22 Σ 12 Σ 32 Σ 42 Σ 32 Σ 32 Σ 42 Σ 21 Σ 22 Σ 12 Σ 41 Σ 63 Σ 31 Σ 42 53 Σ Σ 64 Σ 21 Σ 22 Σ 32 Σ 43 Σ 33 Σ 54 Σ 65 Σ 12 Σ 13 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 55 Σ 45 Σ 03 Σ 04 Σ 14

Reachability Reachability Σ kl is reachable from if there is a path from Σ kl to Σ ij in the lattice Σ ij Σ 41 Σ 63 Σ 31 Σ 42 53 Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 54 Σ 45 Σ 55 Σ 65 Σ 03 Σ 04 Σ 14 Σ kl is reachable from if there is a path from Σ kl to Σ ij in the lattice Σ ij Σ 41 Σ 63 Σ 31 Σ 42 53 Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 54 Σ 45 Σ 55 Σ 65 Σ 03 Σ 04 Σ 14 Reachability Reachability Σ kl is reachable from if there is a path from Σ kl to Σ ij in the lattice Σ ij Σ 41 Σ 63 Σ 31 Σ 42 53 Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 54 Σ 45 Σ 55 Σ 65 Σ 03 Σ 04 Σ 14 Σ kl is reachable from if there is a path from Σ kl to Σ ij in the lattice Σ ij Σ kl Σ ij Σ 41 Σ 63 Σ 31 Σ 42 53 Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 54 Σ 45 Σ 55 Σ 65 Σ 03 Σ 04 Σ 14

So, why do we care about again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in and terminates in, then Σ i R Σ f Σ i Σ f So, why do we care about again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in and terminates in, then Σ i R Σ f Σ i Σ f Deadlock in implies deadlock in Σ f So, why do we care about again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in and terminates in, then Σ i R Σ f Σ i Σ f Deadlock in implies deadlock in Σ f No deadlock in implies no deadlock in Σ i