Distributed Systems Fundamentals

Similar documents
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Slides for Chapter 14: Time and Global States

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Time in Distributed Systems: Clocks and Ordering of Events

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Clocks in Asynchronous Systems

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Chapter 11 Time and Global States

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

Time is an important issue in DS

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Figure 10.1 Skew between computer clocks in a distributed system

Absence of Global Clock

DISTRIBUTED COMPUTER SYSTEMS

Distributed Systems Principles and Paradigms

Distributed Algorithms Time, clocks and the ordering of events

Time, Clocks, and the Ordering of Events in a Distributed System

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

CS505: Distributed Systems

Causality and Time. The Happens-Before Relation

CS505: Distributed Systems

Causality and physical time

Ordering and Consistent Cuts Nicole Caruso

Time. Today. l Physical clocks l Logical clocks

Chandy-Lamport Snapshotting

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Time. To do. q Physical clocks q Logical clocks

Distributed Systems 8L for Part IB

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson

CS 347 Parallel and Distributed Data Processing

CS5412: REPLICATION, CONSISTENCY AND CLOCKS

Recap. CS514: Intermediate Course in Operating Systems. What time is it? This week. Reminder: Lamport s approach. But what does time mean?

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Clock Synchronization

TECHNICAL REPORT YL DISSECTING ZAB

An Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

Distributed Mutual Exclusion Based on Causal Ordering

Causal Broadcast Seif Haridi

requests/sec. The total channel load is requests/sec. Using slot as the time unit, the total channel load is 50 ( ) = 1

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

arxiv: v1 [cs.dc] 22 Oct 2018

cs/ee/ids 143 Communication Networks

Appendix A Prototypes Models

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P

arxiv: v1 [cs.dc] 8 Mar 2015

Strong Performance Guarantees for Asynchronous Buffered Crossbar Schedulers

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

Simulation & Modeling Event-Oriented Simulations

Networked Control Systems

Distributed Systems Time and Global State

6.852: Distributed Algorithms Fall, Class 10

A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Design of a Sliding Window over Asynchronous Event Streams

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at h9p://people.rennes.inria.fr/eric.fabre/

7680: Distributed Systems

These are special traffic patterns that create more stress on a switch

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Byzantine Agreement. Gábor Mészáros. CEU Budapest, Hungary

Genuine atomic multicast in asynchronous distributed systems

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Actors for Reactive Programming. Actors Origins

Early consensus in an asynchronous system with a weak failure detector*

Intro Refresher Reversibility Open networks Closed networks Multiclass networks Other networks. Queuing Networks. Florence Perronnin

NICTA Short Course. Network Analysis. Vijay Sivaraman. Day 1 Queueing Systems and Markov Chains. Network Analysis, 2008s2 1-1

Routing Algorithms. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

TIME BOUNDS FOR SHARED OBJECTS IN PARTIALLY SYNCHRONOUS SYSTEMS. A Thesis JIAQI WANG

Do we have a quorum?

On Equilibria of Distributed Message-Passing Games

Congestion Control. Need to understand: What is congestion? How do we prevent or manage it?

The Weakest Failure Detector to Solve Mutual Exclusion

Exclusive Access to Resources in Distributed Shared Memory Architecture

Computer Networks ( Classroom Practice Booklet Solutions)

Wireless Internet Exercises

cs/ee/ids 143 Communication Networks

Online Packet Routing on Linear Arrays and Rings

A Calculus for Concurrent Update

Shared Memory vs Message Passing

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Congestion Control. Topics

Lecture on Sensor Networks

On queueing in coded networks queue size follows degrees of freedom

Asynchronous group mutual exclusion in ring networks

Interplay of security and clock synchronization"

Fast and Accurate Transaction-Level Model of a Wormhole Network-on-Chip with Priority Preemptive Virtual Channel Arbitration

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks

Distributed Algorithms

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Akihito NAKAMURA and Makoto TAKIZAWA. Tokyo Denki University. Ishizaka, Hatoyama, Hiki-gun, Saitama , JAPAN

Random Access Protocols ALOHA

CS505: Distributed Systems

Transcription:

February 17, 2000 ECS 251 Winter 2000 Page 1 Distributed Systems Fundamentals 1. Distributed system? a. What is it? b. Why use it? 2. System Architectures a. minicomputer mode b. workstation model c. processor pool 3. Issues a. global knowledge b. naming c. scalability d. compatibility e. process synchronization, communication f. security g. structure 4. Networks a. goals b. message, packet, subnet, session c. switching: circuit, store-and-forward, message, packet, virtual circuit, dynamic routing d. OSI model: PDUs, layering i. physical: ethernet, aloha, etc. ii. data link layer: frames, parity checks, link encryption iii. network layer: virtual circult vs. datagram, routing via flooding, static routes, dynamic routes, centralized routing vs. distributed routing; congestion solutions (packet discarding, isarithmic, choke packets) iv. transport: services provided (UDP vs. TCP), functions to higher layers, addressing schemes (flat, DNS, etc.), gateway fragmentation and reassembly v. session: adds session characteristics like authentication vi. presentation: compression, end-to-end encryption, virtual terminal vii. application: user-level programs 5. Clocks a. happened-before relation b. Lamport s distributed clocks: a b means C(a) < C(b) c. where C(a) < C(b) does not mean a b d. Vector clocks and causal relation e. ordering of messages so you receive them in the order sent i. why ii. for broadcast (ISIS): Birman-Schiper-Stephenson iii. for point to point: Schiper-Eggli-Sandoz 6. Global state a. Show problem of slicing state when something is in transit b. Define local state; send(m ij ) LS i iff time of send(m ij ) < current time of LS i ; similar for receive c. transit(ls i, LS j ); inconsistent(ls i, LS j ); consistent state is one with inconsistent set empty for all pairs LS i, LS j d. Consistent global state: Chandry-Lamport 7. Termination detection a. Haung

February 17, 2000 ECS 251 Winter 2000 Page 2 Lamport s Clocks Lamport s clocks keep a virtual time among distributed systems. The goal is to provide an ordering upon events within the system. Notation P i process C i. clock associated with process P i 1. Increment clock C i between any two successive events in process P i : C i C i + d (d > 0) 2. Let event a be the sending of a message by process P i ; it is given the timestamp t a = C i (a). Let b be the receipt of that message by P j. Then when P j receives the message, C j max(c j, t a + d) (d > 0) P 1 e11 e12 e13 P 2 e21 e22 e23 e24 P 3 e31 e32 Assume all clocks start at 0, and d is 1 (that is, each event incrememts the clock by 1). At event e 12, C 1 (e 12 ) = 2. Event e 12 is the sending of a message to P 2. When P 2 receives the message (event e 23 ), its clock C 2 = 2. The clock is reset to 3. Event e 24 is P 2 s sending a message to P 3. That message is received at e 32. C 3 is 1 (as one event has passed). By rule 2, C 3 is reset to the maximum of C 2 (c 24 )+1 and the current value of C 3, so C 3 becomes 5. Problem Clearly, if a b, then C(a) < C(b). But if C(a) < C(b), does a b? The answer, surprisingly, is not necessarily. In the above example, C 3 (e 31 ) = 1 < 2 = C 1 (e 12 ). But e 31 and e 12 are causally unrelated; that is, e 31 / e 12. However, C 1 (e 11 ) < C 3 (e 32 ), and clearly e 11 e 32. Hence one cannot say one way or the other.

February 17, 2000 ECS 251 Winter 2000 Page 3 Vector Clocks This is based upon Lamport s clocks, but each process keeps track of what is believes the other processes interrnal clocks are (hence the name, vector clocks). The goal is to provide an ordering upon events within the system. Notation n processes P i process C i. vector clock associated with process P i ; jth element is C i [j] and contains P i s latest value for the current time in process P k. 1. Increment clock C i between any two successive events in process P i : C i [i] C i [i] + d (d > 0) 2. Let event a be the sending of a message by process P i ; it is given the vector timestamp t a = C i (a). Let b be the receipt of that message by P j. Then when P j receives the message, it updates its vector clock for all k = 1,, n: C j [k] max(c j [k], t a [k] + d) (d > 0) P 1 e11 e12 e13 P 2 e21 e22 e23 e24 P 3 e31 e32 Here is the progression of time for the three processes: e 11 : C 1 = (1, 0, 0) e 31 : C 3 = (0, 0, 1) e 21 : C 2 = (0, 0, 1) as t a = C 3 (e 31 ) = (0, 0, 1) and previously, C 3 was (0, 0, 1) e 22 : C 2 = (0, 1, 1) e 12 : C 1 = (2, 0, 0) e 23 : C 2 = (2, 1, 1) as t a = C 1 (e 12 ) = (2, 0, 0) and previously, C 2 was (0, 1, 1) e 24 : C 2 = (2, 2, 1) e 13 : C 1 = (2, 1, 1) as t a = C 2 (e 22 ) = (0, 1, 1) and previously, C 1 was (2, 0, 0) e 32 : C 3 = (2, 2, 1) as t a = C 2 (e 24 ) = (2, 2, 1) and previously, C 3 was (0, 0, 1) Notice that C 1 (e 11 ) < C 3 (e 32 ), so e 11 e 32, but C 1 (e 11 ) and C 3 (e 31 ) are incomparable, so e 11 and e 31 are concurrent.

February 17, 2000 ECS 251 Winter 2000 Page 4 Birman-Schiper-Stephenson The goal of this protocol is to preserve ordering in the sending of messages. For example, if send(m 1 ) send(m 2 ), then for all processes that receive both m 1 and m 2, receive(m 1 ) receive(m 2 ). The basic idea is that m 2 is not given to the process until m 1 is given. This means a buffer is needed for pending deliveries. Also, each message has an associated vector that contains information for the recipient to determine if another message preceded it. Also, we shall assume all messages are broadcast. Clocks are updated only when messages are sent. Notation n processes P i process C i. vector clock associated with process P i ; jth element is C i [j] and contains P i s latest value for the current time in process P k. t m vector timestamp for message m (stamped after local clock is incremented) P i sends a message to P j 1. P i increments C i [i] and sets the timestamp t m = C i [i] for message m. P j receives a message from P i 1. When P j, j i, receives m with timestamp t m, it delays the message s delivery until both: a. C j [i] = t m [i] 1; and b. for all k n and k i, C j [k] t m [k]. 2. When the message is delivered to P j, update P j s vector clock 3. Check buffered messages to see if any can be delivered. P 1 e11 e12 P 2 e21 e22 P 3 e31 e32 Here is the protocol applied to the above situation: e 31 : P 3 sends message a; C 3 = (0, 0, 1); t a = (0, 0, 1) e 21 : P 2 receives message a. As C 2 = (0, 0, 0), C 2 [3] = t a [3] 1 = 1 1 = 0 and C 2 [1] t a [1] and C 2 [2] t a [2] = 0. So the message is accepted, and C 2 is set to (0, 0, 1) e 11 : P 1 receives message a. As C 1 = (0, 0, 0), C 1 [3] = t a [3] 1 = 1 1 = 0 and C 1 [1] t a [1] and C 1 [2] t a [2] = 0. So the message is accepted, and C 1 is set to (0, 0, 1) e 22 : P 2 sends message b; C 2 = (0, 1, 1); t b = (0, 1, 1)

February 17, 2000 ECS 251 Winter 2000 Page 5 e 12 : P 1 receives message b. As C 1 = (0, 0, 1), C 1 [2] = t b [2] 1 = 1 1 = 0 and C 1 [1] t b [1] and C 1 [3] t b [2] = 0. So the message is accepted, and C 1 is set to (0, 1, 1) e 32 : P 3 receives message b. As C 3 = (0, 0, 1), C 3 [2] = t b [2] 1 = 1 1 = 1 and C 1 [1] t b [1] and C 1 [3] t b [2] = 0. So the message is accepted, and C 3 is set to (0, 1, 1) Now, suppose t a arrived as event e 12, and t b as event e 11. Then the progression of time in P 1 goes like this: e 11 : P 1 receives message b. As C 1 = (0, 0, 0), C 1 [2] = t b [2] 1 = 1 1 = 0 and C 1 [1] t b [1], but C 1 [3] < t b [3], so the message is held until another message arrives. The vector clock updating algorithm is not run. e 12 : P 1 receives message a. As C 1 = (0, 0, 0), C 1 [3] = t a [3] 1 = 1 1 = 0, C 1 [1] t a [1], and C 1 [2] t a [2]. The message is accepted and C 1 is set to (0, 0, 1). Now the queue is checked. As C 1 [2] = t b [2] 1 = 1 1 = 0, C 1 [1] t b [1], and C 1 [3] t b [3], that message is accepted and C 1 is set to (0, 1, 1).

February 17, 2000 ECS 251 Winter 2000 Page 6 Schiper-Eggli-Sandoz The goal of this protocol is to ensure that messages are given to the receiving processes in order of sending. Unlike the Birman-Schiper-Stephenson protocol, it does not require using broadcast messages. Each message has an associated vector that contains information for the recipient to determine if another message preceded it. Clocks are updated only when messages are sent. Notation n processes P i process C i. vector clock associated with process P i ; jth element is C i [j] and contains P i s latest value for the current time in process P k. t m vector timestamp for message m (stamped after local clock is incremented) t i current time at process P i V i vector of P i s previously sent messages; V i [j] = t m, where P j is the destination process and t m the vector timestamp of the message; V i [j][k] is the kth component of V i [j]. V m vector accompanying message m P i sends a message to P j 1. P i sends message m, timestamped t m, and V i, to process P j. 2. P i sets V i [j] = t m. P j receives a message from P i 1. When P j, j i, receives m, it delays the message s delivery if both: a. V m [j] is set; and b. V m [j] < t j 2. When the message is delivered to P j, update all set elements of V j with the corresponding elements of V m, except for Vj[j], as follows: a. If Vj[k] and Vm[k] are uninitialized, do nothing. b. If Vj[k] is uninitialized and Vm[k] is initialized, set Vj[k] = Vm[k]. c. If both Vj[k] and Vm[k] are initialized, set Vj[k][k ] = max(vj[k][k ], Vm[k][k ]) for all k = 1,, n 3. Update P j s vector clock. 4. Check buffered messages to see if any can be delivered.

February 17, 2000 ECS 251 Winter 2000 Page 7 P 1 e11 e12 e13 P 2 e21 e22 e23 P 3 e31 e32 Here is the protocol applied to the above situation: e 31 : P 3 sends message a to P 2. C 3 = (0, 0, 1); t a = (0, 0, 1), V a = (?,?,?); V 3 = (?, (0, 0, 1),?) e 21 : P 2 receives message a from P 1. As V a [2] is uninitialized, the message is accepted. V 2 is set to (?,?,?) and C 2 is set to (0, 0, 1). e 22 : P 2 sends message b to P 1. C 2 = (0, 1, 1); t b = (0, 1, 1), V b = (?,?,?); V 2 = ((0, 1, 1),?,?) e 11 : P 1 sends message c to P 3. C 1 = (1, 0, 0); t c = (1, 0, 0), V c = (?,?,?); V 1 = (?,?, (1, 0, 0)), e 12 : P 1 receives message b from P 2. As V b [1] is uninitialized, the message is accepted. V 1 is set to (?,?,?) and C 1 is set to (1, 1, 1). e 32 : P 3 receives message c from P 1. As V c [3] is uninitialized, the message is accepted. V 3 is set to (?,?,?) and C 3 is set to (1, 0, 1). e11 e12 e13 P e 23 : P 2 sends 1 message d to P 1. C 2 = (0, 2, 1); t d = (0, 2, 1), V d = ((0, 1, 1),?,?); V 2 = ((0, 2, 1),?, (0, 0, 1)), e 13 : P 1 receives message d from P 2. As V d [1] < C 1 [1], so the message is accepted. V 1 is set to ((0, 1, 1),?,?) and C 1 is e21 e22 e23 e24 set to (1, P2, 2 1). Now, suppose t b arrived as event e 13, and t d as event e 12. Then the progression in P 1 goes like this: e 12 : P 1 receives message d from P 2. But V d [1] = (0, 1, 1) </ (1, 0, 0) = C 1, so the message is queued for later delivery. e31 e 13 : P 1 receives message b from P 2. As V b e32 P 3 [1] is uninitialized, the message is accepted. V 1 is set to (?,?,?) and C 1 is set to (1, 1, 1). The message on the queue is now checked. As V d [1] = (0, 1, 1) < (1, 1, 1) = C 1, the message is now accepted. V 1 is set to ((0, 1, 1),?,?) and C 1 is set to (1, 2, 1).

February 17, 2000 ECS 251 Winter 2000 Page 8 Chandy-Lamport Global State Recording The goal of this distributed algorithm is to capture a consistent global state. It assumes all communication channels are FIFO. It uses a distinguished message called a marker to start the algorithm. P i sends marker 1. Pi records its local state 2. For each C ij on which P i has not already sent a marker, P i sends a marker before sending other messages. P i receives marker from Pj 1. If P i has not recorded its state: a. Record the state of C ji as empty b. Send the marker as described above 2. If P i has recorded its state LS i a. Record the state of C ji to be the sequence of messages received between the computation of LS i and the marker from C ji. s1 s2 Here, all processes are connected by communications channels Cij. Messages being sent over the channels are represented by arrows between the processes. Snapshot s 1 : P 1 records LS 1, sends markers on C 12 and C 13 P 2 receives marker from P 1 on C 12 ; it records its state LS 2, records state of C 12 as empty, and sends marker on C 21 and C 23 P 3 receives marker from P 1 on C 13 ; it records its state LS 3, records state of C 13 as empty, and sends markers on C 31 and C 32. P 1 receives marker from P 2 on C 21 ; as LS 1 is recorded, it records the state of C 21 as empty. P 1 receives marker from P 3 on C 31 ; as LS 1 is recorded, it records the state of C 31 as empty. P 2 receives marker from P 3 on C 32 ; as LS 2 is recorded, it records the state of C 32 as empty. P 3 receives marker from P 2 on C 23 ; as LS 3 is recorded, it records the state of C 23 as empty. Snapshot s 2 : now a message is in transit on C 12 and C 21. P 1 records LS 1, sends markers on C 12 and C 13 P 2 receives marker from P 1 on C 12 after the message from P 1 arrives; it records its state LS 2, records state of C 12 as empty, and sends marker on C 21 and C 23 P 3 receives marker from P 1 on C 13 ; it records its state LS 3, records state of C 13 as empty, and sends markers on C 31 and C 32.

February 17, 2000 ECS 251 Winter 2000 Page 9 P 1 receives marker from P 2 on C 21 ; as LS 1 is recorded, and a message has arrived since LS 1 was recorded, it records the state of C 21 as containing that message. P 1 receives marker from P 3 on C 31 ; as LS 1 is recorded, it records the state of C 31 as empty. P 2 receives marker from P 3 on C 32 ; as LS 2 is recorded, it records the state of C 32 as empty. P 3 receives marker from P 2 on C 23 ; as LS 3 is recorded, it records the state of C 23 as empty.

February 17, 2000 ECS 251 Winter 2000 Page 10 Huang s Termination Detection The goal of this protocol is to detect when a distributed computation terminates. Notation n processes P i process; without loss of generality, let P 0 be the controlling agent W i. weight of process P i ; initially, W 0 = 1 and for all other i, W i = 0. B(W) computation message with assigned weight W C(W) control message sent from process to controlling agent with assigned weight W P i sends a computation message to P j 1. Set W i and W j to values such that W i + W j = W i, W i > 0, W j > 0. (W i is the new weight of P i.) 2. Send B(W j ) to P j P j receives a computation message B(W) from P i 1. W j = W j + W 2. If P j is idle, P j becomes active P i becomes idle: 1. Send C(W i ) to P 0 2. W i = 0 3. P i becomes idle P i receives a control message C(W): 1. W i = W i + W 2. If W i = 1, the computation has completed. P1 The picture shows a process P 0, designated the controlling agent, with W 0 = 1. It asks P 1 and P 2 to do some computation. It sets W 1 to 0.2, W 2 to P3 0.3, and W 3 to 0.5. P 2 in turn asks P 3 and P 4 to do some computations. It sets W 3 to 0.1 and W 4 to 0.1. P0 P2 When P 3 terminates, it sends C(W 3 ) = C(0.1) to P 2, which changes W 2 to 0.1 + 0.1 = 0.2. P4 When P 2 terminates, it sends C(W 2 ) = C(0.2) to P 0, which changes W 0 to 0.5 + 0.2 = 0.7. When P 4 terminates, it sends C(W 4 ) = C(0.1) to P 0, which changes W 0 to 0.7 + 0.1 = 0.8. When P 1 terminates, it sends C(W 1 ) = C(0.2) to P 0, which changes W 0 to 0.8 + 0.2 = 1. P 0 thereupon concludes that the computation is finished. Total number of messages passed: 8 (one to start each computation, one to return the weight).