Allocate-On-Use Space Complexity of Shared-Memory Algorithms

Similar documents
Allocate-On-Use Space Complexity of Shared-Memory Algorithms

Allocate-On-Use Space Complexity of Shared-Memory Algorithms

A modular approach to shared-memory consensus, with applications to the probabilistic-write model

Sub-Logarithmic Test-and-Set Against a Weak Adversary

On the time and space complexity of randomized test-and-set

An O(sqrt(n)) space bound for obstruction-free leader election

Fast Randomized Test-and-Set and Renaming

Lower Bounds for Restricted-Use Objects

Atomic snapshots in O(log 3 n) steps using randomized helping

Faster Randomized Consensus With an Oblivious Adversary

An O(1) RMRs Leader Election Algorithm

Communication-Efficient Randomized Consensus

An O(1) RMRs Leader Election Algorithm

CSE373: Data Structures and Algorithms Lecture 2: Math Review; Algorithm Analysis. Hunter Zahn Summer 2016

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing

N/4 + N/2 + N = 2N 2.

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

Tutorial 4. Dynamic Set: Amortized Analysis

How many hours would you estimate that you spent on this assignment?

Latches. October 13, 2003 Latches 1

Lower Bounds for Restricted-Use Objects

The Las-Vegas Processor Identity Problem (How and When to Be Unique)

A Message-Passing and Adaptive Implementation of the Randomized Test-and-Set Object

Design and Analysis of Algorithms

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Another way of saying this is that amortized analysis guarantees the average case performance of each operation in the worst case.

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Dynamic Ordered Sets with Exponential Search Trees

How to Allocate Tasks Asynchronously

A Lower Bound for the Distributed Lovász Local Lemma

Section Handout 5 Solutions

Searching. Sorting. Lambdas

Combining Shared Coin Algorithms

Complexity Tradeoffs for Read and Update Operations

CS505: Distributed Systems

6. Finite State Machines

Algorithm Theory - Exercise Class

Computers also need devices capable of Storing data and information Performing mathematical operations on such data

CSCI 3110 Assignment 6 Solutions

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

How to Elect a Leader Faster than a Tournament

On the Importance of Having an Identity or, is Consensus really Universal?

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points.

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

Skip Lists. What is a Skip List. Skip Lists 3/19/14

Valency Arguments CHAPTER7

A Modular Approach to Shared-Memory Consensus, with Applications to the Probabilistic-Write Model

6.852: Distributed Algorithms Fall, Class 24

CS 310 Advanced Data Structures and Algorithms

ROBUST & SPECULATIVE BYZANTINE RANDOMIZED CONSENSUS WITH CONSTANT TIME COMPLEXITY IN NORMAL CONDITIONS

CS 124 Math Review Section January 29, 2018

Lower Bounds on the Amortized Time Complexity of Shared Objects

An $Omega(n log n)$ Lower Bound on the Cost of Mutual Exclusion Rui Fan and Nancy Lynch

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

Shannon-Fano-Elias coding

How to beat a random walk with a clock?

Amortized analysis. Amortized analysis

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Today s Outline. CS 362, Lecture 4. Annihilator Method. Lookup Table. Annihilators for recurrences with non-homogeneous terms Transformations

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]

Boolean circuits. Lecture Definitions

Hardness of MST Construction

1 Normal Distribution.

Verifying Randomized Distributed Algorithms with PRISM

Problem Set 1. CSE 373 Spring Out: February 9, 2016

Divide-and-Conquer Algorithms Part Two

UNIVERSITY OF CALIFORNIA

Speculative Parallelism in Cilk++

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

How to beat a random walk with a clock?

Average Case Analysis. October 11, 2011

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS505: Distributed Systems

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

A Time Complexity Lower Bound for Randomized Implementations of Some Shared Objects (Preliminary Version)

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

arxiv: v3 [cs.dc] 16 Apr 2017

Online algorithms December 13, 2007

More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018

Randomized Protocols for Asynchronous Consensus

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Introduction to Algorithms

The Weakest Failure Detector to Solve Mutual Exclusion

CS361 Homework #3 Solutions

Announcements. CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis. Today. Mathematical induction. Dan Grossman Spring 2010

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

Faster Agreement via a Spectral Method for Detecting Malicious Behavior

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

1 Introduction (January 21)

Data Structures and Algorithms

2.4 Conditional Probability

Divide and Conquer. CSE21 Winter 2017, Day 9 (B00), Day 6 (A00) January 30,

Byzantine Agreement in Polynomial Expected Time

Chapter 4: Dynamic Programming

Solving ANTS with Loneliness Detection and Constant Memory. Casey O Brien

Atomic m-register operations

Lecture 7: Amortized analysis

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

6.854 Advanced Algorithms

Transcription:

Allocate-On-Use Space Complexity of Shared-Memory Algorithms James Aspnes Bernhard Haeupler Alexander Tong Philipp Woelfel DISC 2018

Asynchronous shared memory model Processes: p 1 p 2 p 3 p n Objects: O 1 O 2 O 3 O 4 O 5 O m Processes apply atomic operations to objects, scheduled by an adversary.

Time complexity O 1 O 2 O 3 O 4 O 5 O m Many popular time measures: Total step complexity: how many operations did we do? Per-process step complexity: how many operations did I do? RMR complexity: how many times did I see a register change? All of these measures are per execution: Expected step complexity, High probability step complexity, Adaptive step complexity, etc.

Space complexity (traditional version) O 1 O 2 O 3 O 4 O 5 O m Space complexity = number of objects m. Does not change from one execution to the next. Linear lower bounds for mutex (Burns-Lynch), perturbable objects (Jayanti-Tan-Toueg), consensus (Zhu). Trouble for both theory and practice: Theory: hides effect of randomness. Practice: hides effect of memory management. Real systems don t charge you for pages you don t touch.

Space complexity (improved version) O 1 O 2 O 3 O 4 O 5 O m Space complexity = number of objects used in some execution. An object is used when an operation is applied to it. Represents an allocate-on-use policy. Gives a per-execution measure.

Example: RatRace (Alistarh et al., DISC 2010) Adaptive test-and-set on a binary tree of depth 3 log 2 n. Splitters allow descending processes to claim nodes. Three-process consensus objects allow ascending processes to escape subtrees. Process that escapes the whole tree wins test-and-set. Requires Θ(n 3 ) objects but only Θ(k) are used w.h.p.

A hidden trade-off for randomized test-and-set? Two algorithms for randomized test-and-set with an oblivious adversary: Time Space (Alistarh-Aspnes, DISC 2011) Θ(log log n) Θ(log log n) (Giakkoupis-Woelfel, PODC 2012) Θ(log n) Θ(log n) Both use sifter objects to get rid of losing processes quickly. Both use Θ(n 3 ) worst-case space for backup RatRace. Not clear if space-time trade-off is necessary, but without allocate-on-use space complexity it s not even visible.

Mutual exclusion Critical section Mutual exclusion: at most one process at a time in critical section. Deadlock freedom: some process reaches critical section eventually. (Burns-Lynch, 1993): n registers needed in worst case, by constructing a single bad execution for any given algorithm. We want to beat this bound for expected space complexity.

Monte Carlo mutual exclusion CS 0 1 1 1 17 Processes climb a slippery ladder of one-bit register rungs to reach the critical section (CS). Process flips a coin at each rung: Heads: Write 1 and climb. Tails: Read: 0 stay at same rung. 1 fall to holding pen. About half fall from each rung. O(log n) rungs leave one process in CS with high probability. After finishing CS, winner resets rungs and increments gate.

Monte Carlo mutual exclusion: Analysis CS 0 1 1 1 Deadlock freedom: some process is first to write 1. Mutual exclusion: Potential function Φ sums 2 height for processes. w height for 1 registers. Plus a few extra terms. Φ increases slowly on average. Φ is big when two processes in CS. whp have mutual exclusion in polynomially-long executions. O(n) amortized RMRs per CS. 17

Mutual exclusion in O(log n) expected space CS Two-process mutex Splitter Backup mutex Splitter detects when Monte Carlo algorithm violates mutex. In this case, switch to O(n)-space backup mutex. Gives mutex always, O(log n) space whp, O(n) amortized RMRs per critical section.

Allocate-on-update space complexity O 1 O 2 O 3 O 4 O 5 O m Many systems allocate pages only on write. Analogous notion is allocate-on-update. Reading an object is free! Changing an object is not. How does this compare to allocate-on-use?

Simulating allocate-on-update with allocate-on-use 1 1 0 0 1 0 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Idea: Use one-bit registers to mark which ranges have changed. Balanced binary tree gives O(log m) overhead. Unbalanced tree gives O(log(max address updated)). So models are equivalent up to log factor.

Conclusion and open problems O 1 O 2 O 3 O 4 O 5 O m Allocate-on-use space complexity reveals differences in algorithms that are hidden by worst-case space complexity. What other problems allow low allocate-on-use space? Space lower bounds for allocate-on-use? What happens with an adaptive adversary? Low-space mutex with better RMR complexity? Implement allocate-on-use in a model that doesn t provide it?