Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

Size: px

Start display at page:

Download "Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H"

Noel O’Connor’
6 years ago
Views:

1 Cuts Cuts A cut C is a subset of the global history of H C = h c 1 1 hc hc n n A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1,ec 2 2,...ec n n C = h c 1 1 hc hc n n Global states and cuts Consistent cuts and consistent global states The global state of a distributed computation is an n-tuple of local states To each cut state (σ c 1 1,...σc n Σ =(σ 1,...σ n ) (c 1...c n ) n ) corresponds a global A cut is consistent if e i,e j : e j C e i e j e i C A consistent global state is one corresponding to a consistent cut

What sees What sees Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by but not the corresponding send event Our task Our approach Develop a

2 What sees What sees Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by but not the corresponding send event Our task Our approach Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system... Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T (send(m))

3 Snapshot I Snapshot I i. selects i. selects ii. sends take a snapshot at to all processes iii. when clock of p i reads then p σ i a. records its local state b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to ii. sends take a snapshot at to all processes iii. when clock of p i reads then p σ i a. records its local state b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to Correctness Clock Condition Theorem Snapshot I produces a consistent cut Proof Need to prove e j C e i e j e i C < Definition > < 0 and 1> 0. e j C T (e j ) < 3.T(e j ) < < 5 and 3> 6. T(e i ) < < Assumption > 1. e j C < Property of real time> 4. e i e j T (e i ) <T(e j ) < Definition > 7. e i C < Property of real time> 4. e i e j T (e i ) <T(e j ) < Assumption > < 2 and 4> 2. e i e j 5. T(e i ) <T(e j ) Can the Clock Condition be implemented some other way?

4 Lamport Clocks Increment Rules Each process maintains a local variable LC p e i p e i+1 p LC(e) value of LC for event e LC(e i+1 p )=LC(e i p)+1 p e i p e i+1 p LC(e i p) <LC(e i+1 p ) p e i p p e i p LC(e i p) <LC(e j q) q LC(e j q)=max(lc(e j 1 q ), LC(e i p)) + 1 e j q q e j q Timestamp m with TS(m) =LC(send(m)) Space-Time Diagrams and Logical Clocks A subtle problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be S is anyway executed after LC = t Fixes: if e is internal/send and LC = t 2 execute e and then S if e = receive(m) (TS(m) t) (LC t 1) put message back in channel re-enable e ; set LC = t 1; execute S t

5 An obvious problem An obvious problem No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks mmmmhhhh... An obvious problem Snapshot II No! Choose Ω large enough that it cannot be reached by applying the update rules of logical clocks mmmmhhhh... Doing so assumes upper bound on message delivery time upper bound relative process speeds We better relax it... processor selects Ω sends take a snapshot at Ω to all processes; it waits for all of them to reply and then sets its logical clock to Ω p i when clock of reads Ω then records its local state σ i sends an empty message along its outgoing channels starts recording messages received on each incoming channel p i stops recording a channel when receives first message with timestamp greater than or equal to Ω

6 Relaxing synchrony Snapshot III p i take a snapshot at Ω Process does nothing for the protocol during this time! empty message: TS(m) Ω records local state σ i sends empty message: TS(m) Ω Use empty message to announce snapshot! monitors channels processor sends itself take a snapshot when receives take a snapshot for the first time from : p i records its local state sends take a snapshot along its outgoing channels sets channel from σ i to empty starts recording messages received over each of its other incoming channels when receives take a snapshot beyond the first time from : p i stops recording channel from p i p j p k when has received take a snapshot on all channels, it sends! collected state to and stops. p j p k Snapshots: a perspective Snapshots: a perspective The global state saved by the snapshot protocol is a consistent global state The global state saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that could have occurred

7 Snapshots: a perspective The global state saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that could have occurred We are evaluating predicates on states that may have never occurred! Σ 10

8 Σ 11 Σ 11

9 Σ 12 Σ 21 Σ 12 Σ 21 Σ 22 Σ 12 Σ 21 Σ 22 Σ 12 Σ 32

10 Σ 21 Σ 22 Σ 12 Σ 21 Σ 22 Σ 12 Σ 42 Σ 32 Σ 42 Σ 32 Σ 41 Σ 63 Σ 31 Σ Σ Σ 64 Σ 21 Σ 32 Σ 22 Σ 43 Σ 33 Σ 54 Σ 65 Σ 12 Σ 13 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 55 Σ 45 Σ 03 Σ 04 Σ 14 Σ kl is reachable from there is a path from in the lattice Reachability Σ ij Σ ij if to Σ kl Σ 41 Σ 63 Σ 31 Σ Σ Σ 64 Σ 21 Σ 32 Σ 22 Σ 43 Σ 33 Σ 54 Σ 65 Σ 12 Σ 13 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 55 Σ 45 Σ 03 Σ 04 Σ 14

11 Reachability Reachability Σ kl is reachable from there is a path from in the lattice Σ ij Σ ij if to Σ kl Σ 41 Σ 63 Σ 31 Σ Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 54 Σ 45 Σ 55 Σ 65 Σ 03 Σ 04 Σ 14 Σ kl is reachable from there is a path from in the lattice Σ ij Σ ij if to Σ kl Σ 41 Σ 63 Σ 31 Σ Σ Σ 64 Σ 21 Σ 12 Σ 22 Σ 13 Σ 32 Σ 43 Σ 33 Σ 54 Σ 55 Σ 65 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 45 Σ 03 Σ 04 Σ 14 Σ kl is reachable from there is a path from in the lattice Σ ij Σ kl Reachability Σ ij Σ ij if to Σ kl Σ 41 Σ 63 Σ 31 Σ Σ Σ 64 Σ 21 Σ 32 Σ 22 Σ 43 Σ 33 Σ 54 Σ 55 Σ 65 Σ 12 Σ 13 Σ 23 Σ 24 Σ 34 Σ 44 Σ 35 Σ 45 Σ 03 Σ 04 Σ 14 So, why do we care about again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in Σ i and terminates in Σ f, then Σ i R Σ f

12 So, why do we care about again? So, why do we care about again? Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in Σ i and terminates in Σ f, then Σ i R Σ f Deadlock is a stable property Deadlock Deadlock If a run R of the snapshot protocol starts in Σ i and terminates in Σ f, then Σ i R Σ f Deadlock in implies deadlock in Σ f Deadlock in implies deadlock in Σ f No deadlock in implies no deadlock in Σ i Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events. Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications e 1 1

13 Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications e 1 1 e 2 1 e 1 1 e 2 1 Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications e 1 1 e 2 1 e 1 1 e 2 1 To obtain a run, messages must be delivered to the monitor in FIFO order To obtain a run, messages must be delivered to the monitor in FIFO order What about consistent runs?

14 Causal delivery Causal delivery FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) Causal delivery Causal delivery FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) m send event receive event m send event receive event deliver event deliver event

15 Causal delivery Causal delivery FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) m send event receive event m send event receive event deliver event deliver event m m 1 Causal delivery FIFO delivery guarantees: send i (m) send i (m ) deliver j (m) deliver j (m ) Causal delivery generalizes FIFO: send i (m) send k (m ) deliver j (m) deliver j (m ) Causal Delivery in Synchronous Systems We use the upper bound on message delivery time m send event receive event deliver event m 1 2

16 Causal Delivery in Synchronous Systems We use the upper bound on message delivery time Causal Delivery with Lamport Clocks! DR1.1:! Deliver all received messages in! increasing (logical clock) timestamp order. DR1: At time t, delivers all messages it received with timestamp up to t in increasing timestamp order Causal Delivery with Lamport Clocks! DR1.1:! Deliver all received messages in! increasing (logical clock) timestamp order. Causal Delivery with Lamport Clocks! DR1.1:! Deliver all received messages in! increasing (logical clock) timestamp order Should deliver?

17 Causal Delivery with Lamport Clocks! DR1.1:! Deliver all received messages in! increasing (logical clock) timestamp order. 1 4 Should deliver? Stability DR2:!Deliver all received stable messages in increasing (logical clock) timestamp order. Problem: Lamport Clocks don t provide gap detection Given two events e and e and their clock values LC(e) and LC(e ) where LC(e) < LC(e ) determine whether some event e exists s.t. LC(e) < LC(e ) < LC(e ) A message m received by p is stable at p if will never receive a future message m s.t. TS(m ) <TS(m) p Implementing Stability Implementing Stability Real-time clocks wait for time units Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) >LC(e) Design better clocks!

18 Clocks and STRONG Clocks Causal Histories Lamport clocks implement the clock condition: e e LC(e) <LC(e ) The causal history of an event e in (H, ) is the set θ(e) ={e H e e} {e} We want new clocks that implement the strong clock condition: e e SC(e) <SC(e ) Causal Histories Causal Histories The causal history of an event e in (H, ) is the set θ(e) ={e H e e} {e} The causal history of an event e in (H, ) is the set θ(e) ={e H e e} {e} e 1 1 e 2 1 e 3 1 e 4 1 e 5 1 e 1 1 e 2 1 e 3 1 e 4 1 e 5 1 e 1 3 e 2 3 e 3 3 e 4 3 e 1 3 e 2 3 e 3 3 e 4 3 e e θ(e) θ(e )

19 How to build θ(e) Pruning causal histories p i Each process : initializes θ : if e k i e k i θ := is an internal or send event, then θ(e k i ):={e k i } θ(e k 1 i ) if is a receive event for message m, then θ(e k i ):={e k i } θ(e k 1 i ) θ(send(m)) Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode θ(e) Vector Clocks Update rules Consider θ i (e), the projection of θ(e) on p i θ i (e) is a prefix of h i : θ i (e) =h k i i it can be encoded using k i p i e i VC(e i )[i] := VC[i]+1 θ(e) =θ 1 (e) θ 2 (e)... θ n (e) encoded using k 1,k 2,...,k n can be Represent θ using an n-vector VCsuch that VC(e)[i] =k θ i (e) =h k i i p i VC(e i ) := max(vc,ts(m)) VC(e i )[i] := VC[i]+1 m e i Message m is timestamped with TS(m) =VC(send(m))

20 Example Operational interpretation [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [0,1,0] [1,2,3] [4,3,3] [0,1,0] [1,2,3] [4,3,3] [1,0,1] [1,0,2] [1,0,3] [5,1,4] VC(e i )[i] = [1,0,1] [1,0,2] [1,0,3] [5,1,4] VC(e i )[j] = Operational interpretation Operational interpretation [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [1,0,0] [2,1,0] [3,1,2] [4,1,2] [5,1,2] [0,1,0] [1,2,3] [4,3,3] [0,1,0] [1,2,3] [4,3,3] [1,0,1] [1,0,2] [1,0,3] [5,1,4] [1,0,1] [1,0,2] [1,0,3] [5,1,4] VC(e i )[i] = no. of events executed by p i up to and including VC(e i )[j] = e i VC(e i )[i] = no. of events executed by p i up to and including VC(e i )[j] = no. of events executed by that happen before of p j e i e i p i

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No! A subtle problem An obvious problem when LC = t do S doesn t make sense for Lamport clocks! there is no guarantee that LC will ever be S is anyway executed after LC = t Fixes: if e is internal/send and