Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Distributed Algorithms (CAS 769) Week 1: Introduction, Logical clocks, Snapshots Dr. Borzoo Bonakdarpour Department of Computing and Software McMaster University Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 1/44

Presentation outline Introduction Logical Clocks Snapshots (Global States) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 2/44

Acknowledgments Most of the contents of these slides are obtained from the following books: Distributed Algorithms: An Intuitive Approach - Wan Fokkink Elements of Distributed Computing - Vijay K. Garg Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 3/44

Distributed Systems Some Definitions There is no universally accepted definition of a distributed system. What makes a system distributed? One man s constant is another man s variable. - Alan Perlis A distributed system is a system where I can t get my work done because a computer has failed that Ive never even heard of. A distributed system is one in which the failure of a computer you didn t even know existed can render your own computer unusable. - Leslie Lamport Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 4/44

Distributed Systems Some Definitions A distributed system is one that has multiple machines is connected by a network is cooperating on some task Communication in Distributed Systems Message passing Shared memory Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 5/44

Distributed Systems We begin with message passing systems. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 6/44

Preliminaries Message Passing Framework In a message passing framework, a distributed system consists of a finite graph of N processes (a process is a running program and each process has its local state) Each process carries a unique ID Processes communicate through FIFO channels Characteristics of Communication Communication is asynchronous; i.e., sending and receiving messages are distinct events, respectively Delay in channels is arbitrary but finite There are no garbled, duplicated or lost messages Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 7/44

Preliminaries Other Assumptions Absence of a shared clock Absence of shared memory Absence of accurate failure detection Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 8/44

Example {x1=0} Process P1() { e0 1 : send(p2,m1); e1 1 : x1=5; e2 1 : x1=10; e3 1 : recv(m2); } {x2=0} Process P2() { e0 2 : recv(m1); e1 2 : x2=15; e2 2 : x2=20; e3 2 : send(p1,m2); } Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 9/44

Preliminaries Transition Systems The behavior of a distributed algorithm, which runs on a distributed system is often captured by a transition system, which consists of: A set C of configurations (i.e., the composition of local states of its processes plus the messages in transit) A binary transition relation on C A set I C of initial configurations A configuration γ is terminal, if there does not exist γ C such that γ γ An execution of the distributed system is a sequence γ = γ 0 γ 1 γ 2 such that: γ 0 I for all i 0, we have γ i γ i+1 A configuration δ is reachable if there is a γ 0 I and a finite execution γ 0 γ 1 γ k, such that γ k = δ. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 10/44

Example For example, in the distributed algorithm on Slide 9: Configuration (x1 = 0, x2 = 0) is the only initial configuration. Configuration (x1 = 10, x2 = 20) is the only terminal configuration. (x1 = 0, x2 = 0) (x1 = 5, x2 = 0) (x1 = 10, x2 = 0) (x1 = 10, x2 = 15) (x1 = 10, x2 = 20) is a valid execution. And so is (x1 = 0, x2 = 0) (x1 = 5, x2 = 0) (x1 = 5, x2 = 15) (x1 = 10, x2 = 15) (x1 = 10, x2 = 20). Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 11/44

Preliminaries Question: Is configuration reachability decidable? Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 12/44

Preliminaries A transition between two configurations is associate to an event. A process can perform an internal (i.e., change of local state of a process), send, or receive event. A process if called an initiator if its first event is either internal or send. An assertion is a predicate on the configuration of an algorithms (e.g., x y + 1). We use assertions to define safety properties. An assertion P is an invariant if: P(γ) for all γ I, and if γ γ and P(γ), then P(γ ). Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 13/44

Example For example, in the distributed algorithm on Slide 9: Instruction x1 = 5 is an internal event. Process P1 is an initiator. (x1 100 x2 50) is an invariant. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 14/44

Preliminaries Properties A property is a set of executions. Safety Properties A safety property typically expresses that something bad will never happen. For example: The temprature of a boiler never reaches 100 degress. If an interrupt occurs, a message will be printed in one second. Formally, a safety property is a set S of infinite executions where: γ S : α γ : γ : α γ γ S where α γ denotes the fact that α is a prefix of γ. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 15/44

Preliminaries Liveness Properties A liveness property typically expresses that something good will eventually happen. Formally, if L is a liveness property, then the following holds: α : γ : α γ L where α is a finite execution and γ is an infinite execution. Examples of liveness properties: Non-starvation. If an interrupt occurs, a message will be printed. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 16/44

Presentation outline Introduction Logical Clocks Snapshots (Global States) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 17/44

Causal Order In an asynchronous distributed system, in each configuration, different events can occur in different processes. Such occurrence of events are independent. The causal order is a binary relation on events in an execution, such that a b iff event a happened before event b. I.e., events in an execution cannot be reordered, so that a happens after b. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 18/44

Causal Order Causal Order (Happened Before) Formally, the causal order (also called happened before) is the smallest binary relation, where if a and b are events at the same process and a occurs before b, then a b, if a is a send event and b the corresponding receive event, then a b, and if a b and b c, then a c. Notice that the happened before relation is a partial order. We write a b if either a b or a = b. If a b and b a, then we say a and b are concurrent events. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 19/44

Computation A permutation of concurrent events in an execution does not affect the result of the execution. P 2 P 1 e 1 0 e 2 0 e 2 1 e 1 1 e 2 2 e 1 2 e 2 3 e 1 3 Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 20/44

Computation The set of all permutations form the computation lattice. e 1 3, e2 3 e 1 2, e2 3 e 1 2, e2 2 e 1 1, e2 3 e 1 2, e2 1 e 1 1, e2 2 e 1 0, e2 3 e 1 2, e2 0 e 1 1, e2 1 e 1 0, e2 2 e 1 2 e 1 1, e2 0 e 1 0, e2 1 e 1 1 e 1 0, e2 0 e 1 0 {} Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 21/44

Happened before Vs. Physical Time Question: If a safety property holds in the happened before relation, does it hold in physical time as well? Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 22/44

Logical Clocks Since a physical shared clock does not exists in a distributed system, we use logical clocks. A logical clock C maps occurrences of events in a computation to a partially ordered set such that a b C(a) < C(b) Lamport s clock LC assigns to each event a the length k of a longest causality chain a 1 a k = a in the computation. Obviously, a b LC(a) < LC(b) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 23/44

Logical Clocks Algorithm for Handling Lamport s clocks Consider an event a, and let k be the clock value of the previous event at the same process (k = 0 if there is no such previous event). If a is an internal or send event, then LC(a) = k + 1 If a is a receive event and b the corresponding send event, then LC(a) = max{k, LC(b)} + 1 Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 24/44

Vector Clocks The vector clock VC has the property a b VC(a) < VC(b) Let a distributed system consist of processes p 0,..., p N 1. The vector clock assigns events a computation values in N N, whereby this set is provided with a partial order defined by: (k 0,..., k N 1 ) (l 0,..., l N 1 ) k i l i, for all i {0,..., N 1} The vector clock is defined as follows: VC(a) = (k 0,..., k N 1 ), where k i is the length of a longest causality chain a1 i ak i i of events at process p i with ak i i a. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 25/44

Example Demonstrate the evolution of the vector clock for this computation: P 2 P 1 e 1 0 e 2 0 e 2 1 e 1 1 e 2 2 e 1 2 e 2 3 e 1 3 Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 26/44

Presentation outline Introduction Logical Clocks Snapshots (Global States) Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 27/44

Sanpshot (Global State) Definitions Snapshot cannot be defined based on physical time (e.g., the composition of all local state at the same time instant). We use the happened before relation to compute concurrent local states and, hence, snapshots. A (global) snapshot of an execution of a distributed algorithm is a configuration of this execution, consisting of the local states of the processes and the messages in transit. Intuitively, a snapshot is consistent if it represents a configuration of the current execution or a configuration of an execution in the same computation. Snapshots are useful to determine stable properties of a distributed system (i.e., properties that when become true, will remain true). E.g., deadlock, termination, loss of a token, etc. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 28/44

Sanpshot The Challenge Why is it difficult to compute a snapshot of a distributed system at run time? Taking a global snapshot is like taking the picture of the sky: the scene is so big that it cannot be captured by a single photograph. The challenge is taking multiple photographs at the same time is not quite possible. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 29/44

Sanpshot Terminology Suppose we design an algorithm that takes a snapshot of another distributed algorithm. We call the messages of the underlying algorithm basic messages and messages of the snapshot algorithm control messages. An event is called presnapshot if it occurs at a process before the local snapshot at this process is taken. Otherwise it s called postsnapshot. Consistent Snapshot A snapshot is consistent if for each presnapshot event a, all events that are causally before a are also presnapshot, a basic message included in a channel state iff the corresponding send event is presnapshot while the corresponding receive event is postsnapshot. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 30/44

Example G 1 G 2 P 1 m 1 P 2 m 2 m 3 P 3 Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44

Example G 1 G 2 P 1 m 1 P 2 m 2 m 3 P 3 G 1 is not a consistent snapshot, but G 2 is. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44

Chandy-Lamport Algorithm Assumption All channels are FIFO. Challenges All recorded local state are mutually concurrent The state of all channels are captured correctly. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 32/44

Chandy-Lamport Algorithm Solution We associate with each process a variable called color that is either red or white. All processes are initially white. Intuitively, the computed global snapshot corresponds to the state of the system just before the processes turn red. The algorithm relies on special control messages called markers Once a process turns red, it send a marker along all its outgoing channels before it sends out any message. A process turns red on receiving a marker if it has not already done so. No white process receives a marker from a red process. Why? This guarantees that local states are mutually concurrent. Why? Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 33/44

Chandy-Lamport Algorithm Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 34/44

Chandy-Lamport Algorithm Classification of Basic Messages (ww messages) These messages are sent by a white process to a white process. These message correspond to the messages sent and received before the global snapshot. (rr messages) These message correspond to the messages sent and received after the global snapshot. (rw messages) These messages cross the global snapshot in the backward directions. Such a message will make the snapshot inconsistent. It is not possible to have such messages, if a marker is used. Why? (wr messages) These messages cross the global snapshot in the forward directions and participate in the state of the channel in the snapshot, because they are in transit when the snapshot is taken. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 35/44

Chandy-Lamport Algorithm P 1 ww rw wr rr P 2 Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 36/44

Chandy-Lamport Algorithm (Example) C A B

Chandy-Lamport Algorithm (Example) A m 1, mkr C mkr B Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 37/44

Chandy-Lamport Algorithm (Example) m 1, mkr C m2 A B mkr B computes the state of channel AB as {}. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 38/44

Chandy-Lamport Algorithm (Example) C mkr, m 2 m 1 A B C computes the state of channel AC as {}. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 39/44

Chandy-Lamport Algorithm (Example) C m 1 {m 2 } A B B computes the state of channel CB as {m 2 }. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 40/44

Chandy-Lamport Algorithm (Example) Question: Is the computed snapshot a configuration of the actual execution? Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 41/44

Lai-Yang Algorithm Assumptions This algorithm does not assume FIFO channels. But it assumes message piggybacking. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 42/44

Lai-Yang Algorithm The Algorithm Any initiator can decide to take a local snapshot. As long as a process hs not taken a local snapshot, it appends false to its outgoing basic messages. When a process has taken its local snapshot, it appends true to each outgoing basic message. When a process that hasn t yet taken a snapshot receives a message with true or a control message (see next slide) for the first time, it takes a local snapshot of its state before reception of this message. A process q computes as channel state of pq the basic messages without the tag true that it receives via pq after its local snapshot. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 43/44

Lai-Yang Algorithm The Algorithm Question: How does q know when it can determine the channel state of pq? p sends a control message to q, informing q how many basic messages without the tag true p sent into pq. These control messages also ensure that all processes eventually take a local snapshot. Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 44/44