Ordering and Consistent Cuts Nicole Caruso

Similar documents
Figure 10.1 Skew between computer clocks in a distributed system

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Slides for Chapter 14: Time and Global States

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Chandy-Lamport Snapshotting

Agreement. Today. l Coordination and agreement in group communication. l Consensus

CS 347 Parallel and Distributed Data Processing

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Distributed Systems Fundamentals

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Absence of Global Clock

CS505: Distributed Systems

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

Time is an important issue in DS

Chapter 11 Time and Global States

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Distributed Algorithms

Causality and Time. The Happens-Before Relation

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks

Distributed Algorithms Time, clocks and the ordering of events

The Underlying Semantics of Transition Systems

Distributed Systems Time and Global State

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems Principles and Paradigms

7680: Distributed Systems

Clocks in Asynchronous Systems

MODELING TIME AND EVENTS IN A DISTRIBUTED SYSTEM

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Minimal Spanning Tree

Time, Clocks, and the Ordering of Events in a Distributed System

Modeling and Stability Analysis of a Communication Network System

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Distributed Mutual Exclusion Based on Causal Ordering

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Clock Synchronization

Distributed Computing. Synchronization. Dr. Yingwu Zhu

On Equilibria of Distributed Message-Passing Games

Automata-based Verification - III

Parallel & Distributed Systems group

Formal Methods for Monitoring Distributed Computations

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Lecture 4 Event Systems

CS5412: REPLICATION, CONSISTENCY AND CLOCKS

Time. To do. q Physical clocks q Logical clocks

TECHNICAL REPORT YL DISSECTING ZAB

Automata-based Verification - III

DISTRIBUTED COMPUTER SYSTEMS

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Asynchronous group mutual exclusion in ring networks

Equivalence of Regular Expressions and FSMs

Chapter VI. Relations. Assumptions are the termites of relationships. Henry Winkler

Monitoring Functions on Global States of Distributed Programs. The University of Texas at Austin, Austin, Texas June 10, 1994.

Using Happens-Before Relationship to debug MPI non-determinism. Anh Vo and Alan Humphrey

Reliable Broadcast for Broadcast Busses

CL16, 12th September 2016

Time. Today. l Physical clocks l Logical clocks

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Modeling Concurrent Systems

Information System Design IT60105

CS375: Logic and Theory of Computing

inpg O MB No Special Technical May 1991 Using Consistent Subcuts for Detecting Stable Properties Keith Marzullo, Laura Sabel

Temporal Reachability Graphs

Lecture 2 Automata Theory

Problem One: Order Relations i. What three properties does a binary relation have to have to be a partial order?

INTEGERS. In this section we aim to show the following: Goal. Every natural number can be written uniquely as a product of primes.

Construction Operation Simulation

CS145: Probability & Computing Lecture 24: Algorithms

Lecture 2 Automata Theory

COS433/Math 473: Cryptography. Mark Zhandry Princeton University Spring 2017

Causality and physical time

Flows. Chapter Circulations

Early consensus in an asynchronous system with a weak failure detector*

Microwave Network Analysis Lecture 1: The Scattering Parameters

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

Job Scheduling and Multiple Access. Emre Telatar, EPFL Sibi Raj (EPFL), David Tse (UC Berkeley)

Decision, Computation and Language

Stochastic Petri Nets. Jonatan Lindén. Modelling SPN GSPN. Performance measures. Almost none of the theory. December 8, 2010

Minimal Spanning Tree

Syntax: form ::= A: lin j E: lin ::= 3 lin j lin ^ lin j :lin j bool lin lin is a temporal formula dened over a global sequence. bool is true in g if

Simulation & Modeling Event-Oriented Simulations

Distributed Termination Detection for Dynamic Systems

arxiv: v1 [cs.dc] 22 Oct 2018

Variations on Itai-Rodeh Leader Election for Anonymous Rings and their Analysis in PRISM

SFM-11:CONNECT Summer School, Bertinoro, June 2011

Direct Proof and Proof by Contrapositive

Analyzing Isochronic Forks with Potential Causality

Decision Procedures for Satisfiability and Validity in Propositional Logic

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at h9p://people.rennes.inria.fr/eric.fabre/

Time and Timed Petri Nets

Reasoning about Time and Reliability

Maximum flow problem (part I)

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

Performance Analysis of ARQ Protocols using a Theorem Prover

Recap. CS514: Intermediate Course in Operating Systems. What time is it? This week. Reminder: Lamport s approach. But what does time mean?

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

Transcription:

Ordering and Consistent Cuts Nicole Caruso Cornell University Dept. of Computer Science

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Stanford Research Institute

About the Author Leslie Lamport Stanford Research Institute

Introduction Our concept of time Distributed system s concept of time Person Cars Observer See green light Person Cars Observer See green light? Cross street Read news Cross street Read news

Introduction Coordination of Distributed Systems Lack of Understanding Partial Ordering of Events Total Ordering of Events

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks

Partial Ordering of Events System Definition System contains spatially separated processes Process contains a sequence of events Event manifestation is arbitrary, but must include message sending and message receiving

Partial Ordering of Events Mathematical Properties Asymmetric If a\<b, then b<a If a\<b and b\<a, then a=b Transitive If a\<b and b\<c, then a\<c Reflexive a\<a

Partial Ordering of Events Happened Before Relation ( ) Asymmetric: If a b, then b\ a If a and b are in same process a b if a occurs before b If a and b are in different processes a b if a is sending of message and b is receipt of message If a is concurrent with b a\ b and b\ a Transitive: If a b and b c, then a c Reflexive: a\ a

Partial Ordering of Events Typical Space- Time Diagram for a Distributed System Person Cars Observer See green light Cross street Read news

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks

Logical Clocks Process Clock C i Clock assigns number to event to represent time Assigns C i (a) to each event a within P i Belongs to one process P i System Clock C Clock C(a) = C i (a)

Logical Clocks Clock Condition: If a b, then C(a) < C(b) If events a and b are in the same process P i C i (a)<c i (b) if a occurs before b If events a and b are in processes P i and P j C i (a)<c j (b) a is the sending of a message b is the receipt of the message

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks

Total Ordering of Events Total ordering eliminates concurrency Identify message with event sending it Following Example Multiple processes compete for resource Told from point of view of one process P i

Total Ordering of Events Process P i is granted resource Request T m :P i In P i s request queue Time-stamped before other requests in the queue Acknowledge T m :P i Received from P j Time-stamped later than Request T m :P i

Total Ordering of Events Step 1: P i Sends Request Resource P i sends Request T m :P i to P j P i puts Request T m :P i on its request queue T 0 :P 1 T 1 :P 1 P 1 P 2 P 3

Total Ordering of Events Step 1: P i Sends Request Resource P i sends Request T m :P i to P j P i puts Request T m :P i on its request queue T 0 :P 1 T 1 :P 1 request P 1 request P 2 P 3

Total Ordering of Events Step 2: P j Adds Message P j puts Request T m :P i on its request queue P j sends Acknowledgement T m :P j to P i T 0 :P 1 T 1 :P 1 P 1 T 0 :P 1 P 2 P 3 T 1 :P 1

Total Ordering of Events Step 2: P j Adds Message P j puts Request T m :P i on its request queue P j sends Acknowledgement T m :P j to P i T 0 :P 1 T 1 :P 1 ack P 1 ack T 0 :P 1 P 2 P 3 T 1 :P 1

Total Ordering of Events Step 3: P i Sends Release Resource P i removes Request T m :P i from request queue P i sends Release T m :P i to each P j P 1 T 0 :P 1 P 2 P 3 T 1 :P 1

Total Ordering of Events Step 3: P i Sends Release Resource P i removes Request T m :P i from request queue P i sends Release T m :P i to each P j release P 1 release T 0 :P 1 P 2 P 3 T 1 :P 1

Total Ordering of Events Step 4: P j Removes Message P j receives Release T m :P i from P i P j removes Request T m :P i from request queue P 1 P 2 P 3

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks

Anomalous Behavior Discrepancy between universe/system Event sets and happens before relations P A P B P C P A P B P C Universal Event Set S: a b System Event Set S: a\ b and b\ a

Anomalous Behavior Strong Clock Condition For events a and b in system event set S If a b, Then C(a)<C(b) Attainable via physical clocks

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks

Physical Clocks Physical Clock C i (t) PC1 к << 1 for all i: dc i (t)/dt 1 < к PC2 ϵ for all i,j: C i (t) C j (t) < ϵ Also µ < smallest transmission time

Physical Clocks Prevent anomalous behavior Must ensure that C j (t) < C i (t+µ) How small must к and ϵ be? ϵ/(1- к) < µ

Discussion Partial ordering Total ordering Anomalous Behavior Physical clocks

Conclusions Coordination of Distributed Systems Partial Ordering of Events Total Ordering of Events

Distributed Snapshots: Determining Global States of Distributed Systems K. Mani Chandy University of Texas at Austin Leslie Lamport Stanford Research Institute Distributed Snapshots: Determining Global States of Distributed Systems

About the Authors K. Mani Chandy University of Texas at Austin Leslie Lamport Stanford Research Institute Distributed Snapshots: Determining Global States of Distributed Systems

Introduction Panoramic dynamic scene Cannot capture with single snapshot Must piece together multiple snapshots Questions How should snapshots be taken? What criteria must overall picture satisfy? Distributed Snapshots: Determining Global States of Distributed Systems

Introduction Process can record its own state States of all processes form global state Record valid global system state Detect stable properties y(s) = true implies y(all states reachable from S) = true Distributed Snapshots: Determining Global States of Distributed Systems

Outline Distributed System Model Global State Detection Algorithm Recorded Global State Properties Stability Detection

Distributed System Model Process State(t) Event(t) Channel MessagesSent(t) Event(t) Event Process P State S before State S after Channel C (incoming or outgoing from P) Messages (received by P or sent from P) Distributed Snapshots: Determining Global States of Distributed Systems

Distributed System Model Example 1: Single Token System States Global: in-p P : si C : empty C : empty Q : so P P Q Q P P Q Q States Global: in-c P : so C : empty C : token Q : so Event P sends States Global: in-c P : so C : token C : empty Q : so Event Q sends States Global: in-q P : so C : empty C : empty Q : si Event Q receives Distributed Snapshots: Determining Global States of Distributed Systems

Distributed System Model Example 2: Nondeterministic System States Global: S0 P : si C : empty C : empty Q : si P P Q Q P P Q Q States Global: S3 P : si C : M C : empty Q : so Event P sends M States Global: S1 P : so C : M C : empty Q : si Event Q sends M States Global: S2 P : so C : M C : M Q : so Event P receives M Distributed Snapshots: Determining Global States of Distributed Systems

Outline Distributed System Model Global State Detection Algorithm Recorded Global State Properties Stability Detection

Algorithm Marker Sending Rule P records its state P sends marker along each outgoing channel C Marker Receiving Rule Q receives a marker along incoming channel C If Q has not recorded its state Q records its state Else Q records C s state as a sequence of messages Distributed Snapshots: Determining Global States of Distributed Systems

Algorithm Example: Process P Obtains Global State from Process Q Q receives P s marker along channel C Q records its state Computation Q receives P s marker along channel C Q records C s state Distributed Snapshots: Determining Global States of Distributed Systems

Outline Distributed System Model Global State Detection Algorithm Recorded Global State Properties Stability Detection

Recorded Global State Properties Markers produce concurrent subsequence S* may not actually exist S* from combination of concurrent events No effect on preceding/following events S* reachable from S i S o reachable from S*

Recorded Global State Properties Theorem 1: Exists Computation {e 0...e n } Events { e 0... e i -1 } is equivalent to { e 0... e i -1 } { e i... e o -1 } is a permutation of { e i... e o -1 } { e o... e n } is equivalent to { e o... en } States { S 0... S i } is equivalent to { S 0... S i } For some k, where i<k<o, S k = S* { S o... S n } is equivalent to { S o... S n }

Outline Distributed System Model Global State Detection Algorithm Recorded Global State Properties Stability Detection

Stability Detection Algorithm Initialize: definite=false, y(s i )=definite Repeat: record S*, definite=y(s*) Implications of definite definite == false: no stable property at start definite == true: stable property at termination Correctness S i can lead to S*, S* can lead to S o for all j: y(s j ) = y(s j +1)

Discussion Partial Ordering Recorded Global State Global State Detection Algorithm Stable Property Detection

Conclusions Processes form recorded global state Record its own state Piece together multiple records Questions addressed How should the snapshots be taken? What criteria must overall picture satisfy?