Distributed Systems Principles and Paradigms

Similar documents
Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Time. To do. q Physical clocks q Logical clocks

Time. Today. l Physical clocks l Logical clocks

DISTRIBUTED COMPUTER SYSTEMS

Clock Synchronization

Time in Distributed Systems: Clocks and Ordering of Events

Distributed Computing. Synchronization. Dr. Yingwu Zhu

CS 347 Parallel and Distributed Data Processing

Distributed Systems 8L for Part IB

CS505: Distributed Systems

Absence of Global Clock

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

CS505: Distributed Systems

Distributed systems Lecture 4: Clock synchronisation; logical clocks. Dr Robert N. M. Watson

Distributed Mutual Exclusion Based on Causal Ordering

Crashed router. Instructor s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 Pearson Education

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Slides for Chapter 14: Time and Global States

Chapter 11 Time and Global States

Clocks in Asynchronous Systems

Distributed Systems. Time, Clocks, and Ordering of Events

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Time is an important issue in DS

Distributed Algorithms Time, clocks and the ordering of events

Distributed Systems Fundamentals

Recap. CS514: Intermediate Course in Operating Systems. What time is it? This week. Reminder: Lamport s approach. But what does time mean?

CS5412: REPLICATION, CONSISTENCY AND CLOCKS

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

Time, Clocks, and the Ordering of Events in a Distributed System

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

Causal Broadcast Seif Haridi

Distributed Systems. Time, clocks, and Ordering of events. Rik Sarkar. University of Edinburgh Spring 2018

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

Asynchronous group mutual exclusion in ring networks

6.852: Distributed Algorithms Fall, Class 10

Clock Synchronization

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

7680: Distributed Systems

Verification of clock synchronization algorithm (Original Welch-Lynch algorithm and adaptation to TTA)

TTA and PALS: Formally Verified Design Patterns for Distributed Cyber-Physical

The Weakest Failure Detector to Solve Mutual Exclusion

arxiv: v1 [cs.dc] 22 Oct 2018

Unreliable Failure Detectors for Reliable Distributed Systems

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Time Synchronization

Chandy-Lamport Snapshotting

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Do we have a quorum?

Causality and physical time

Genuine atomic multicast in asynchronous distributed systems

Distributed Systems Time and Global State

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Distributed Algorithms

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

cs/ee/ids 143 Communication Networks

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

Causality and Time. The Happens-Before Relation

Failure detectors Introduction CHAPTER

Actors for Reactive Programming. Actors Origins

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

FAIRNESS FOR DISTRIBUTED ALGORITHMS

6.852: Distributed Algorithms Fall, Class 24

CS505: Distributed Systems

Early consensus in an asynchronous system with a weak failure detector*

An Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS

A leader election algorithm for dynamic networks with causal clocks

How to deal with uncertainties and dynamicity?

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]

Section 6 Fault-Tolerant Consensus

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

CS505: Distributed Systems

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

Improper Nesting Example

Approximation of δ-timeliness

Real Time Operating Systems

Eventually consistent failure detectors

Counters. We ll look at different kinds of counters and discuss how to build them

Shared Memory vs Message Passing

Seeking Fastness in Multi-Writer Multiple- Reader Atomic Register Implementations

These are special traffic patterns that create more stress on a switch

Figure 10.1 Skew between computer clocks in a distributed system

Asynchronous Models For Consensus

CORRECTNESS OF A GOSSIP BASED MEMBERSHIP PROTOCOL BY (ANDRÉ ALLAVENA, ALAN DEMERS, JOHN E. HOPCROFT ) PRATIK TIMALSENA UNIVERSITY OF OSLO

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling

Dynamic Group Communication

Ordering and Consistent Cuts Nicole Caruso

Distributed Systems Part II Solution to Exercise Sheet 10

arxiv: v1 [cs.dc] 8 Mar 2015

Concurrent Counting is Harder than Queuing

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

arxiv: v2 [cs.dc] 21 Apr 2017

Transcription:

Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2) 598 7784 E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/ steen/ 1 Introduction 2 Architectures 3 Processes 4 Communication 5 Naming 6 Synchronization 7 Consistency and Replication 8 Fault Tolerance 9 Security 1 Distributed Object-Based Systems 11 Distributed File Systems 12 Distributed Web-Based Systems 13 Distributed Coordination-Based Systems 1 /

Clock Synchronization Physical clocks Logical clocks Vector clocks 6 1 Distributed Algorithms/6.1 Clock Synchronization

Physical Clocks (1/3) Problem: Sometimes we simply need the exact time, not just an ordering. Solution: Universal Coordinated Time (UTC): Based on the number of transitions per second of the cesium 133 atom (pretty accurate). At present, the real time is taken as the average of some 5 cesium-clocks around the world. Introduces a leap second from time to time to compensate that days are getting longer. UTC is broadcast through short wave radio and satellite. Satellites can give an accuracy of about ±.5 ms. 6 2 Distributed Algorithms/6.1 Clock Synchronization

Physical Clocks (2/3) Problem: Suppose we have a distributed system with a UTC-receiver somewhere in it we still have to distribute its time to each machine. Basic principle: Every machine has a timer that generates an interrupt H times per second. There is a clock in machine p that ticks on each timer interrupt. Denote the value of that clock by C p (t), where t is UTC time. Ideally, we have that for each machine p, C p (t) = t, or, in other words, dc/dt = 1. 6 3 Distributed Algorithms/6.1 Clock Synchronization

Physical Clocks (3/3) Clock time, C Fast clock dc dt > 1 Perfect clock Slow clock dc dt = 1 dc dt < 1 UTC, t In practice: 1 ρ dc dt 1 + ρ. Goal: Never let two clocks in any system differ by more than δ time units synchronize at least every δ/(2ρ) seconds. 6 4 Distributed Algorithms/6.1 Clock Synchronization

Global Positioning System (1/2) Basic idea: You can get an accurate account of the time as a side-effect of GPS. Principle: Height Point to be ignored (14,14) r = 16 (-6,6) r = 1 x Problem: Assuming that the clocks of the satellites are accurate and synchronized: It takes a while before a signal reaches the receiver The receiver s clock is definitely out of synch with the satellite 6 5 Distributed Algorithms/6.1 Clock Synchronization

Global Positioning System (2/2) r is unknown deviation of the receiver s clock. x r, y r, z r are unknown coordinates of the receiver. T i is timestamp on a message from satellite i i = (T now T i ) + r is measured delay of the message sent by satellite i. Measured distance to satellite i: c i (c is speed of light) Real distance is d i = (x i x r ) 2 + (y i y r ) 2 + (z i z r ) 2 4 satellites 4 equations in 4 unknowns (with r as one of them): d i + c r = c i 6 6 Distributed Algorithms/6.1 Clock Synchronization

Clock Synchronization Principles Principle I: Every machine asks a time server for the accurate time at least once every δ/(2ρ) seconds (Network Time Protocol). Okay, but you need an accurate measure of round trip delay, including interrupt handling and processing incoming messages. Principle II: Let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its present time. Okay, you ll probably get every machine in sync. Note: you don t even need to propagate UTC time. Fundamental: You ll have to take into account that setting the time back is never allowed smooth adjustments. 6 7 Distributed Algorithms/6.1 Clock Synchronization

The Happened-Before Relationship Problem: We first need to introduce a notion of ordering before we can order anything. The happened-before relation on the set of events in a distributed system: If a and b are two events in the same process, and a comes before b, then a b. If a is the sending of a message, and b is the receipt of that message, then a b If a b and b c, then a c Note: this introduces a partial ordering of events in a system with concurrently operating processes. 6 8 Distributed Algorithms/6.2 Logical Clocks

Logical Clocks (1/2) Problem: How do we maintain a global view on the system s behavior that is consistent with the happenedbefore relation? Solution: attach a timestamp C(e) to each event e, satisfying the following properties: P1: If a and b are two events in the same process, and a b, then we demand that C(a) < C(b). P2: If a corresponds to sending a message m, and b to the receipt of that message, then also C(a) < C(b). Problem: How to attach a timestamp to an event when there s no global clock maintain a consistent set of logical clocks, one per process. 6 9 Distributed Algorithms/6.2 Logical Clocks

Logical Clocks (2/2) Solution: Each process P i maintains a local counter C i and adjusts this counter according to the following rules: 1: For any two successive events that take place within P i, C i is incremented by 1. 2: Each time a message m is sent by process P i, the message receives a timestamp ts(m) = C i. 3: Whenever a message m is received by a process P j, P j adjusts its local counter C j to max{c j,ts(m)}; then executes step 1 before passing m to the application. Property P1 is satisfied by (1); Property P2 by (2) and (3). Note: it can still occur that two events happen at the same time. Avoid this by breaking ties through process IDs. 6 1 Distributed Algorithms/6.2 Logical Clocks

Logical Clocks Example P 1 P 2 P 3 P 1 P 2 P 3 6 12 m 1 8 16 1 2 6 12 m 1 8 16 1 2 18 24 3 24 32 4 m 2 3 4 5 18 24 3 P 2 adjusts 24 32 4 m 2 3 4 5 36 48 6 36 its clock 48 6 42 48 56 64 m 3 7 8 42 48 61 69 m 3 7 8 54 m 4 72 9 7 m 4 77 9 6 8 1 76 P adjusts 85 1 1 its clock (a) (b) Note: Adjustments take place in the middleware layer: Application layer Application sends message Message is delivered to application Adjust local clock and timestamp message Adjust local clock Middleware layer Middleware sends message Message is received Network layer 6 11 Distributed Algorithms/6.2 Logical Clocks

Example: Totally Ordered Multicast (1/2) Problem: We sometimes need to guarantee that concurrent updates on a replicated database are seen in the same order everywhere: P 1 adds $1 to an account (initial value: $1) P 2 increments account by 1% There are two replicas Update 1 Update 2 Update 1 is performed before update 2 Replicated database Update 2 is performed before update 1 Result: in absence of proper synchronization: replica #1 $1111, while replica #2 $111. 6 12 Distributed Algorithms/6.2 Logical Clocks

Example: Totally Ordered Multicast (2/2) Solution: Process P i sends timestamped message msg i to all others. The message itself is put in a local queue queue i. Any incoming message at P j is queued in queue j, according to its timestamp, and acknowledged to every other process. P j passes a message msg i to its application if: (1) msg i is at the head of queue j (2) for each process P k, there is a message msg k in queue j with a larger timestamp. Note: We are assuming that communication is reliable and FIFO ordered. 6 13 Distributed Algorithms/6.2 Logical Clocks

Vector Clocks (1/2) Observation: Lamport s clocks do not guarantee that if C(a) < C(b) that a causally preceded b: P 1 P 2 P 3 6 m 1 8 1 12 16 m 2 2 18 24 3 36 24 32 4 48 m 3 3 4 5 6 42 61 m 7 4 48 69 8 7 76 m 5 77 85 9 1 Observation: Event a: m 1 is received at T = 16. Event b: m 2 is sent at T = 2. We cannot conclude that a causally precedes b. 6 14 Distributed Algorithms/6.2 Logical Clocks

Vector Clocks (1/2) Solution: Each process P i has an array VC i [1..n], where VC i [j] denotes the number of events that process P i knows have taken place at process P j. When P i sends a message m, it adds 1 to VC i [i], and sends VC i along with m as vector timestamp vt(m). Result: upon arrival, recipient knows P i s timestamp. When a process P j receives a message m from P i with vector timestamp ts(m), it (1) updates each VC j [k] to max{vc j [k],ts(m)[k]} (2) increments VC j [j] by 1. Question: What does VC i [j] = k mean in terms of messages sent and received? 6 15 Distributed Algorithms/6.2 Logical Clocks

Causally Ordered Multicasting (1/2) Observation: We can now ensure that a message is delivered only if all causally preceding messages have already been delivered. Adjustment: P i increments VC i [i] only when sending a message, and P j only adjusts VC j when receiving a message (i.e., does not increment VC j [j]. P j postpones delivery of m until: ts(m)[i] = VC j [i] + 1. ts(m)[k] VC j [k] for k = j. 6 16 Distributed Algorithms/6.2 Logical Clocks

Causally Ordered Multicasting (2/2) Example 1: P VC = (1,,) VC = (1,1,) m P 1 VC = (1,1,) 1 m* VC = (1,1,) 2 P 2 VC 2 = (,,) VC 2 = (1,,) Example 2: Take VC 3 = [,2,2], ts(m) = [1,3,] from P 1. What information does P 3 have, and what will it do when receiving m (from P 1 )? 6 17 Distributed Algorithms/6.2 Logical Clocks

Mutual Exclusion Problem: A number of processes in a distributed system want exclusive access to some resource. Basic solutions: Via a centralized server. Completely decentralized, using a peer-to-peer system. Completely distributed, with no topology imposed. Completely distributed along a (logical) ring. Centralized: Really simple: 1 2 1 2 1 Request OK Request Release No reply 3 Queue is 3 2 3 Coordinator empty OK 2 (a) (b) (c) 6 18 Distributed Algorithms/6.3 Mutual Exclusion

Decentralized Mutual Exclusion Principle: Assume every resource is replicated n times, with each replica having its own coordinator access requires a majority vote from m > n/2 coordinators. A coordinator always responds immediately to a request. Assumption: When a coordinator crashes, it will recover quickly, but will have forgotten about permissions it had granted. Issue: How robust is this system? Let p denote the probability that a coordinator crashes and recovers in a period t probability that k out m coordinators reset: P[violation] = p v = n k=2m n ( ) m p k (1 p) m k k With t = 1s, n = 32, m =.75n, p v < 1 4 6 19 Distributed Algorithms/6.3 Mutual Exclusion

Mutual Exclusion: Ricart & Agrawala Principle: The same as Lamport except that acknowledgments aren t sent. Instead, replies (i.e. grants) are sent only when: The receiving process has no interest in the shared resource; or The receiving process is waiting for the resource, but has lower priority (known through comparison of timestamps). In all other cases, reply is deferred, implying some more local administration. 8 Accesses resource 8 12 8 OK 1 2 1 2 1 2 12 OK OK OK Accesses resource 12 (a) (b) (c) 6 2 Distributed Algorithms/6.3 Mutual Exclusion

Mutual Exclusion: Token Ring Algorithm Essence: Organize processes in a logical ring, and let a token be passed between them. The one that holds the token is allowed to enter the critical region (if it wants to) 1 2 3 2 4 3 7 1 6 5 4 7 6 5 (a) (b) Comparison: Algorithm # msgs Delay Problems Centralized 3 2 Coordinator crash Decentralized 3mk, k = 1,2,... 2 m Starvation, low eff. Distributed 2 (n 1) 2 (n 1) Crash of any process Token ring 1 to to n 1 Lost token, proc. crash 6 21 Distributed Algorithms/6.3 Mutual Exclusion

Global Positioning of Nodes Problem: How can a single node efficiently estimate the latency between any two other nodes in a distributed system? Solution: construct a geometric overlay network, in which the distance d(p, Q) reflects the actual latency between P and Q. 6 22 Distributed Algorithms/6.4 Node Positioning

Computing Position (1/2) Observation: a node P needs k + 1 landmarks to compute its own position in a d-dimensional space. Consider two-dimensional case: (x,y ) 2 2 d 2 d 3 P d 1 (x,y ) 1 1 (x,y ) 3 3 Solution: P needs to solve three equations in two unknowns (x P,y P ): d i = (x i x P ) 2 + (y i y P ) 2 6 23 Distributed Algorithms/6.4 Node Positioning

Computing Position (2/2) Problems: measured latencies to landmarks fluctuate computed distances will not even be consistent: 3.2 1. 2. 1 2 3 4 P Q R Solution: Let the L landmarks measure their pairwise latencies d(b i,b j ) and let each node P minimize L i=1 [ d(b i, P) d(b ˆ i, P) d(b i, P) ] 2 where d(b ˆ i, P) denotes the distance to landmark b i given a computed coordinate for P. 6 24 Distributed Algorithms/6.4 Node Positioning

Election Algorithms Principle: An algorithm requires that some process acts as a coordinator. The question is how to select this special process dynamically. Note: In many systems the coordinator is chosen by hand (e.g. file servers). This leads to centralized solutions single point of failure. Question: If a coordinator is chosen dynamically, to what extent can we speak about a centralized or distributed solution? Question: Is a fully distributed solution, i.e. one without a coordinator, always more robust than any centralized/coordinated solution? 6 25 Distributed Algorithms/6.5 Election Algorithms

Election by Bullying (1/2) Principle: Each process has an associated priority (weight). The process with the highest priority should always be elected as the coordinator. Issue: How do we find the heaviest process? Any process can just start an election by sending an election message to all other processes (assuming you don t know the weights of the others). If a process P heavy receives an election message from a lighter process P light, it sends a take-over message to P light. P light is out of the race. If a process doesn t get a take-over message back, it wins, and sends a victory message to all other processes. 6 26 Distributed Algorithms/6.5 Election Algorithms

Election by Bullying (2/2) 1 1 1 4 2 Election Election Election 5 3 6 4 2 OK OK 5 3 6 4 2 Election 5 Election 3 Election 6 7 7 7 Previous coordinator has crashed (a) (b) (c) 2 1 5 2 1 5 4 OK 6 4 Coordinator 6 7 3 7 3 (d) (e) Question: We re assuming something very important here what? 6 27 Distributed Algorithms/6.5 Election Algorithms

Election in a Ring Principle: Process priority is obtained by organizing processes into a (logical) ring. Process with the highest priority should be elected as coordinator. Any process can start an election by sending an election message to its successor. If a successor is down, the message is passed on to the next successor. If a message is passed on, the sender adds itself to the list. When it gets back to the initiator, everyone had a chance to make its presence known. The initiator sends a coordinator message around the ring containing a list of all living processes. The one with the highest priority is elected as coordinator. Question: Does it matter if two processes initiate an election? Question: What happens if a process crashes during the election? 6 28 Distributed Algorithms/6.5 Election Algorithms

Superpeer Election Issue: How can we select superpeers such that: Normal nodes have low-latency access to superpeers Superpeers are evenly distributed across the overlay network There is be a predefined fraction of superpeers Each superpeer should does not need to serve more than a fixed number of normal nodes DHT: Reserve a fixed part of the ID space for superpeers. Example: if S superpeers are needed for a system that uses m-bit identifiers, simply reserve the k = log 2 S leftmost bits for superpeers. With N nodes, we ll have, on average, 2 k m N superpeers. Routing to superpeer: Send message for key p to node responsible for p AND 11 11 6 29 Distributed Algorithms/6.5 Election Algorithms