Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Similar documents
Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

On the Growth of the Prime Numbers Based Encoded Vector Clock

Agreement. Today. l Coordination and agreement in group communication. l Consensus

I Can t Believe It s Not Causal! Scalable Causal Consistency with No Slowdown Cascades

Time. Today. l Physical clocks l Logical clocks

Clocks in Asynchronous Systems

Immediate Detection of Predicates in Pervasive Environments

Distributed Systems Principles and Paradigms

Time in Distributed Systems: Clocks and Ordering of Events

Chapter 11 Time and Global States

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Time. To do. q Physical clocks q Logical clocks

Distributed Systems. 06. Logical clocks. Paul Krzyzanowski. Rutgers University. Fall 2017

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong

Causality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks

CptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014

416 Distributed Systems. Time Synchronization (Part 2: Lamport and vector clocks) Jan 27, 2017

A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Time is an important issue in DS

DISTRIBUTED COMPUTER SYSTEMS

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

Genuine atomic multicast in asynchronous distributed systems

MAD. Models & Algorithms for Distributed systems -- 2/5 -- download slides at

Slides for Chapter 14: Time and Global States

CS 347 Parallel and Distributed Data Processing

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

1 Introduction During the execution of a distributed computation, processes exchange information via messages. The message exchange establishes causal

Distributed Mutual Exclusion Based on Causal Ordering

Cuts. Cuts. Consistent cuts and consistent global states. Global states and cuts. A cut C is a subset of the global history of H

A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

Distributed Systems Fundamentals

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Exclusive Access to Resources in Distributed Shared Memory Architecture

Efficient Dependency Tracking for Relevant Events in Shared-Memory Systems

Shared Memory vs Message Passing

Distributed Algorithms (CAS 769) Dr. Borzoo Bonakdarpour

Seeking Fastness in Multi-Writer Multiple- Reader Atomic Register Implementations

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

CS505: Distributed Systems

An Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS

DEXON Consensus Algorithm

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Lower Bounds for Achieving Synchronous Early Stopping Consensus with Orderly Crash Failures

Snapshots. Chandy-Lamport Algorithm for the determination of consistent global states <$1000, 0> <$50, 2000> mark. (order 10, $100) mark

Absence of Global Clock

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Distributed Algorithms Time, clocks and the ordering of events

Analysis of Bounds on Hybrid Vector Clocks

Early consensus in an asynchronous system with a weak failure detector*

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:

A subtle problem. An obvious problem. An obvious problem. An obvious problem. No!

Time, Clocks, and the Ordering of Events in a Distributed System

Clock Synchronization

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Efficient Dependency Tracking for Relevant Events in Concurrent Systems

arxiv: v1 [cs.dc] 26 Nov 2018

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

TECHNICAL REPORT YL DISSECTING ZAB

Causal Broadcast Seif Haridi

arxiv: v1 [cs.dc] 22 Oct 2018

Distributed Algorithms

On Equilibria of Distributed Message-Passing Games

Robust Network Codes for Unicast Connections: A Case Study

Data Structures in. Sean Cribbs. Friday, April 26, 13

Dynamic Group Communication

Fundamental Limits of Cloud and Cache-Aided Interference Management with Multi-Antenna Edge Nodes

I R I S A P U B L I C A T I O N I N T E R N E N o VIRTUAL PRECEDENCE IN ASYNCHRONOUS SYSTEMS: CONCEPT AND APPLICATIONS

arxiv: v2 [cs.dc] 21 Apr 2017

Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG

6.852: Distributed Algorithms Fall, Class 10

Multimedia Systems WS 2010/2011

Abstract. The paper considers the problem of implementing \Virtually. system. Virtually Synchronous Communication was rst introduced

A Calculus for Concurrent Update

Lightweight Causal Cluster Consistency

Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony

Data byte 0 Data byte 1 Data byte 2 Data byte 3 Data byte 4. 0xA Register Address MSB data byte Data byte Data byte LSB data byte

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

CS505: Distributed Systems

10 Distributed Matrix Sketching

I reduce synchronization costs in parallel computations by

Crash-resilient Time-free Eventual Leadership

Outline F eria AADL behavior 1/ 78

Lecture 4 Event Systems

The Weakest Failure Detector to Solve Mutual Exclusion

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

An Algebraic Approach to Network Coding

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

Linear Algebra and Eigenproblems

Structural Delay Testing Under Restricted Scan of Latch-based Pipelines with Time Borrowing

On Verifying Causal Consistency

TIME BOUNDS FOR SHARED OBJECTS IN PARTIALLY SYNCHRONOUS SYSTEMS. A Thesis JIAQI WANG

The Weighted Byzantine Agreement Problem

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce

S1 S2. checkpoint. m m2 m3 m4. checkpoint P checkpoint. P m5 P

A leader election algorithm for dynamic networks with causal clocks

Transcription:

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication Min Shen, Ajay D. Kshemkalyani, TaYuan Hsu University of Illinois at Chicago Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 1 / 46

Outline 1 Introduction 2 System Model 3 Algorithms Full-Track Algorithm: partially replicated memory Opt-Track Algorithm: partially replicated memory Opt-Track-CRP: fully replicated memory 4 Complexity measures of causal memory algorithms 5 Conclusion 6 References Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 2 / 46

Introduction Introduction Data Replication - a technique for fault tolerance in distributed systems Reduces access latency in the cloud and geo-replicated systems. Consistency of data - a core issue in the distributed shared memory Consistency Models Represent a trade-off (cost V.S. convenient semantics) linearizability (the strongest) sequential consistency causal consistency [1] pipelined RAM slow memory eventual consistency (the weakest) Industry interest For example, Google, Amazon, Microsoft, Facebook, LinkedIn Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 3 / 46

Introduction Geo-Replicated Cloud Storage Features CAP theorem (Brewer, 2000) cannot provide all 3 features in the same system Consistency of Replicas Availability of Writes Partition Tolerance low Latency high Scalability Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 4 / 46

Introduction Related Works Causal consistency in distributed shared memory systems (Ahamad et al.) Causal consistency has been studied (by Baldoni et al., Mahajan et al., Belaramani et al., Petersen et al.). In the past four years, ChainReaction (S. Almeida et al.) Bolt-on causal consistency (P. Bailis et al.) Orbe and GentleRain (J. Du et al.) Wide-Area Replicated Storage (K. Lady et al.) COPS, Eiger (W. Lloyd et al.) The above works assume full replication. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 5 / 46

Introduction Partial Replication Figure : Case for Partial Replication. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 6 / 46

Introduction Case for Partial Replication Partial replication is more natural for some applications. As shown in the previous case,... With p replicas places at some p of the total of n DCs, each write operation that would have triggered an update broadcast to the n DCs now becomes a multicast to just p of the n DCs. For write-intensive workloads, partial replication gives a direct savings in the number of messages. Allowing flexibility in the number of DCs required in causally consistent replication remains an interesting aspect of future work. The supposedly higher cost of tracking dependency metadata is relatively small for applications such as Facebook. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 7 / 46

System Model System Model A system with n application processes - ap 1, ap 2,...,ap n - interacting through a shared memory É composed of q variables x 1,x 2,...,x q Each ap i can perform either a read or a write operation on any of the q variables. r i x j µv: a read operation performed by ap i on variable x j which returns value v w i x j µv: a write operation performed by ap i on variable x j which writes value v Each variable has an initial value. local history h i : a series of read and write operations generated by process ap i global history H: the set of local histories h i from all n application processes Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 8 / 46

System Model Causally Consistent Memory [1] Program Order: under which a local operation o 1 precedes another operation o 2, denoted as o 1 po o 2 Read-from Order: there are variable x and value v such that read operation o 2 = r x µv retrieves the value v written by the write operation o 1 = w x µv from a distinct process, denoted as o 1 ro o 2 for any operation o 2, there is at most one operation o 1 such that o 1 ro o 2 if o 2 = r x µv for some x and no operation o 1 such that o 1 ro o 2, then v Causality Order: for two operations o 1 and o 2 in O H, o 1 co o 2 if and only if one of the following conditions holds: ap i s.t. o 1 po o 2 (program order) ap i, ap j s.t. o 1 ro o 2 (read-from order) o 3 ¾ O H s.t. o 1 co o 3 and o 3 co o 2 (transitive closure) Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 9 / 46

System Model Underlying Distributed Communication System The shared memory abstraction and its causal consistency model is implemented on top of the distributed message passing system. With n sites (connected by FIFO channels), each site s i hosts an application process ap i and holds only a subset of variables x h ¾ É. When an application process ap i performs a write operation w x 1 µv, it invokes the Multicast(m) to deliver the message m containing w x 1 µv to all sites replicating x 1. When an application process ap i performs a read operation r x 2 µv, it invokes the RemoteFetch(m) to deliver the message m containing r x 2 µv to a pre-designated site replicating x 2 to fetch its value. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 10 / 46

System Model Events Generated at Each Site Send event. Multicast(m) by ap i generates event send i mµ. Fetch event. RemoteFetch(m) by ap i generates event fetch i mµ. Message receipt event. The receipt of a message m at site s i generates event receipt i mµ. Apply event. Applying the value written by w j x h µv to x h s local replica at ap i, an event apply i w j x h µv µ is generated. Remote return event. After the occurrence of receipt i mµ corresponding to the remote r j x h µu performed by ap j, an event remote_return i r j x h µuµ is generated to transmit x h s value u to site s j. Return event. Event return i x h v µ corresponding to the return of x h s value v either through a previous fetch i f µ event or read from the local replica. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 11 / 46

System Model Activation Predicate Baldoni et al. [2] defined a new relation, co, on send events. Let w x µa and w yµb be two write operations in O H. For their corresponding send events, send i m w x µa µ co send j m w yµb µ iff one of the following conditions holds: 1 i = j and send i m w x µa µ locally precedes send j m w yµb µ 2 i j and return j x aµ locally precedes send j m w yµb µ 3 send k m w z µc µ, s.t. send i m w x µa µ co send k m w z µc µ co send j m w yµb µ co (Lamport s happened before relation) With the co relation, an optimal activation predicate is shown: A OPT m w eµ m w ¼ send j m ¼ w µ co send k m w µ apply i w ¼ µ ¾ E i e µ (1) It is optimal because the moment this A OPT m w eµ becomes true is the earliest instant that the update m w can be applied. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 12 / 46

Algorithms Algorithms Two algorithms implement causal consistency in a partially replicated distributed shared memory system. Full-Track Opt-Track (a message and space optimal algorithm) Adopt the optimal activation predicate A OPT A special case of Opt-Track - for full replication. Opt-Track-CRP (optimal) : a lower message size, time, space complexity than the Baldoni et al. algorithm [2] Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 13 / 46

Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Algorithm 1 is for a non-fully replicated system. Each application process performing write operation will only write to a subset of all the sites. Each site s i needs to track the number of write operations performed by every ap j to every site s k, denoted as Write i j k. the Write clock piggybacked with messages generated by the Multicast mµ should not be merged with the local Write clock at the message reception, but only at a later read operation reading the value that comes with the message. optimal in terms of the activation predicate Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 14 / 46

Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Data structures 1 Write i - the Wrtie clock Write i j k : the number of updates sent by application process ap j to site s k that causally happened before under the co relation. 2 Apply i - an array of integers Apply i j : the total number of updates written by application process ap j that have been applied at site s i. 3 LastWriteOn i variable id, Write - a hash map of Write clocks LastWriteOn i h : the Write clock value associated with the last write operation on variable x h locally replicated at site s i. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 15 / 46

Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 16 / 46

Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track The activation predicate A OPT is implemented. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 17 / 46

Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Each message corresponding to a write operation piggybacks an O n 2 µ matrix in Algorithm 1. Algorithm 2 further reduces the message size and storage cost. Exploits the transitive dependency of causal deliveries of messages as given by the KS algorithm [3][4] Each site keeps a record of the most recently received message from each other site (along with the list of destinations of the message). optimal in terms of the activation predicate optimal in terms of log space and message space overhead achieve another optimality that no redundant destination information is recorded. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 18 / 46

Algorithms Opt-Track Algorithm: partially replicated memory Two Situations for Destination Information to be Redundant Figure : s 2 is a destination of M. The causal future of the relevant message delivery events are shown in dotted lines. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 19 / 46

Algorithms Opt-Track Algorithm: partially replicated memory Two Situations for Destination Information to be Redundant Figure : s 2 is a destination of m. The causal future of the relevant apply and return events are shown in dotted lines. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 20 / 46

Algorithms Opt-Track Algorithm: partially replicated memory If the Destination List Becomes, then... Figure : Illustration of why it is important to keep a record even if its destination list becomes empty. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 21 / 46

Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Data Structures 1 clock i local counter at site s i for write operations performed by application process ap i. 2 Apply i - an array of integers Apply i j : the total number of updates written by application process ap j that have been applied at site s i. 3 LOG i j clock j Dests - the local log Each entry indicates a write operation in the causal past. 4 LastWriteOn i variable id, LOG - a hash map of LOGs LastWriteOn i h : the piggybacked LOG from the most recent update applied at site s i for locally replicated variable x h. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 22 / 46

Algorithms Opt-Track Algorithm: partially replicated memory Figure : Write process at site s i Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 23 / 46 Algorithm 2: Opt-Track

Algorithms Opt-Track Algorithm: partially replicated memory Figure : Read, receiving processes at site s i Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 24 / 46 Algorithm 2: Opt-Track

Algorithms Opt-Track Algorithm: partially replicated memory Procedures used in Opt-Track Figure : PURGE and MERGE functions at site s i Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 25 / 46

Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Special case of Algorithm 2 Opt-Track for full replication. Same optimizations as for Algorithm Opt-Track. Since in the full replication case, every write operation will be sent to exactly the same set of sites, there is no need to keep a list of the destination information with each write operation. Each time a write operation is sent, all the write operations it piggybacks as its dependency will share the same set of destinations as the one being sent, thus their destination list will be pruned. When a write operation is received, all the write operations it piggybacks also have the receiver as part of their destination. We represent each individual write operation using only a 2-tuple i clock i at site s i. the cost of a write operation from O nµ down to O 1µ. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 26 / 46

Algorithms Opt-Track-CRP: fully replicated memory Further Improved Scalability In Algorithm 2, keeping entries with empty destination list is important. In the fully replicated case, we can also decrease this cost. Figure : In fully replicated systems, the local log will be reset after each write operation. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 27 / 46

Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Figure : There is no need to maintain the destination list for each write operation in the local log. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 28 / 46

Complexity measures of causal memory algorithms Parameters n: the number of sites in the system q: the number of variables in the distributed shared memory system p: the replication factor, i.e., the number of sites where each variable is replicated w: the number of write operations performed in the distributed shared memory system r: the number of read operations performed in the distributed shared memory system d: the number of write operations stored in local log (used only in Opt-Track-CRP algorithm) Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 29 / 46

Complexity measures of causal memory algorithms Complexity Figure : Complexity measures of causal memory algorithms for fully-replicated memory. OptP: Optimal propagation-based protocol proposed by Baldoni et al. [2]. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 30 / 46

Complexity measures of causal memory algorithms Comparisons Compared with OptP, our algorithms also adopt the optimal activation predicate A OPT but incur a lower cost in the message size, space, and time (for read and write operations) complexities. Compared with other causal consistency algorithms, our algorithms have the additional ability to implement causal consistency in partially replicated distributed shared memory systems. Our algorithms provide scalability without using a form of log serialization and exchange to implement causal consistency. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 31 / 46

Complexity measures of causal memory algorithms The Benefit of Partial Replication V.S. Full Replication Reduces the number of messages sent with each write operation. The overall number of messages can be lower if the replication factor is low and readers tend to read variables from the local replica instead of remote one (e.g., Hadoop HDFS and MapReduce). Also reduces the total size of messages transmitted with the system (Consider the size of the data that is actually being replicated). Modern social multimedia networks are such examples. Decreases the cost brought by full replication in the write-intensive workload. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 32 / 46

Complexity measures of causal memory algorithms Message Count as a Function of w rate Message count is the most important metric. Partial replication gives a lower message count than full replication if n pµ pw 2r nw µ w 2 r n n w rate w w r µ w 2 rate 2 n (2) (3) Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 33 / 46

Complexity measures of causal memory algorithms Partial Replication versus Full Replication Figure : The graph illustrates message count for partial replication vs. full replication, by plotting message count as a function of w rate. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 34 / 46

Complexity measures of causal memory algorithms Simulation Parameters of KS algorithm n : Number of processes MIMT - Mean intermessage time : the average period of time between two message sending events at any process M/T - Multicast frequency : the ratio of the number of send events at which data is multicast to more than one process (M) to the total number of message send events (T) MTT - Mean transmission time : the transmission time of a message usually refers to the message size/bandwidth+propagation delay. B/T : the fraction of send events that broadcast messages Baseline is n 2 matrix size of RST algorithm. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 35 / 46

Complexity measures of causal memory algorithms Average message space overhead as a function of n Figure : The simulations were performed for (MTT, MIMT, M/T). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 36 / 46

Complexity measures of causal memory algorithms Average Message Space Overhead as a Function of MTT Figure : The simulations were performed for (MIMT, M/T, n). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 37 / 46

Complexity measures of causal memory algorithms Average Message Space Overhead as a Function of MIMT Figure : The simulations were performed for (MTT, M/T,n). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 38 / 46

Complexity measures of causal memory algorithms Average Message Space Overhead as a Function of M T Figure : The simulations were performed for (MTT, MIMT, n). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 39 / 46

Complexity measures of causal memory algorithms Average Message Space Overhead as a Function of Dests n Figure : The simulations were performed for (MTT, MIMT, n). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 40 / 46

Complexity measures of causal memory algorithms Average Message Space Overhead as a Function of B T Figure : The simulations were performed for (MTT, MIMT, n). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 41 / 46

Conclusion Future Work For some applications where the data size is small (e.g, wall posts in Facebook), the size of the meta-data can be a problem. quadratic in n in the worst case, even for Algorithm Opt-Track Future work aims to reduce the size of the meta-data for maintaining causal consistency in partially replicated systems. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 42 / 46

Conclusion Notion of Credits Figure : Reduce the meta-data at the cost of some possible violations of causal consistency. The amount of violations can be made arbitrarily small by controlling a tunable parameter (credit). Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 43 / 46

Conclusion Instantiation of Credits Integrate the notion of credits into the Opt-Track algorithm, to give an algorithm that can fine-tune the amount of causal consistency by trading off the size of meta-data overhead. Give three instantiations of the notion of credits (hop count, time-to-live, and metric distance) Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 44 / 46

Conclusion Conclusions A suite of algorithms implementing causal consistency in large-scale geo-replicated storage under partial replication. For the partially replicated scenario, adopted the optimal activation predicate in the sense that each update is applied at the earliest instant while removing false causality: Full-Track Algorithm The second algorithm further minimizes the size of meta-information carried on messages and stored in local logs. Opt-Track Algorithm: partially replicated scenario Provides less overhead (transmission and storage) than the full replication case. A derived optimized algorithm of the second one reduces the message overhead, the processing time, and the local storage cost at each site in the fully replicated scenario. Opt-Track-CRP Algorithm Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 45 / 46

References References M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto. Causal memory: Definitions, implementation and programming. Distributed Computing, 9(1):37 49, 1994. R. Baldoni, A. Milani, and S. Tucci-piergiovanni. Optimal propagation based protocols implementing causal memories. Distributed Computing, 18(6):461 474, 2006. P. Chandra, P. Gambhire, and A. D. Kshemkalyani. Performance of the optimal causal multicast algorithm: A statistical analysis. IEEE Transactions on Parallel and Distributed Systems, 15(1):40 52, 2004. A. Kshemkalyani and M. Singhal. Necessary and sufficient conditions on information for causal message ordering and their optimal implementation. Distributed Computing, 11(2):91 111, April 1998. Min Shen, Ajay D. Kshemkalyani, TaYuan Causal Hsu Consistency University offor Illinois Geo-Replicated ChicagoCloud Storage under Partial Replication 46 / 46