Notes on Logarithmic Lower Bounds in the Cell Probe Model

Similar documents
Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

A Tight Lower Bound for Dynamic Membership in the External Memory Model

Lecture 18 April 26, 2012

Lecture 2 14 February, 2007

Optimal Color Range Reporting in One Dimension

Lecture 2 September 4, 2014

Dominating Set Counting in Graph Classes

Dictionary: an abstract data type

Friday Four Square! Today at 4:15PM, Outside Gates

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM

1 Approximate Quantiles and Summaries

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Sanjit A. Seshia EECS, UC Berkeley

Logarithmic space. Evgenij Thorstensen V18. Evgenij Thorstensen Logarithmic space V18 1 / 18

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity

Advanced Data Structures

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Lecture 22: Counting

Advanced Data Structures

Lecture 4 Thursday Sep 11, 2014

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

Disjoint-Set Forests

The Complexity of Constructing Evolutionary Trees Using Experiments

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

Search Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015

Splay trees (Sleator, Tarjan 1983)

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Lecture 7: More Arithmetic and Fun With Primes

25. Minimum Spanning Trees

25. Minimum Spanning Trees

Problems. Mikkel Thorup. Figure 1: Mihai lived his short life in the fast lane. Here with his second wife Mirabela Bodic that he married at age 25.

Assignment 5: Solutions

Dictionary: an abstract data type

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Lecture 11: Proofs, Games, and Alternation

Space and Nondeterminism

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

1 Terminology and setup

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan

Math 262A Lecture Notes - Nechiporuk s Theorem

Asymmetric Communication Complexity and Data Structure Lower Bounds

Advanced topic: Space complexity

Ma/CS 117c Handout # 5 P vs. NP

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Nearest Neighbor Search with Keywords: Compression

Concrete models and tight upper/lower bounds

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Lecture 1: September 25, A quick reminder about random variables and convexity

Compute the Fourier transform on the first register to get x {0,1} n x 0.

Guess & Check Codes for Deletions, Insertions, and Synchronization

New Attacks on the Concatenation and XOR Hash Combiners

CS 410/584, Algorithm Design & Analysis: Lecture Notes 3

Chapter 5 The Witness Reduction Technique

3F1 Information Theory, Lecture 3

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Lecture 15 April 9, 2007

Dynamic Structures for Top-k Queries on Uncertain Data

Hardness of Function Composition for Semantic Read once Branching. Programs. June 23, 2018

Lecture 4 : Adaptive source coding algorithms

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

CPSC 320 Sample Final Examination December 2013

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Lecture 4 February 2nd, 2017

DRAFT. Algebraic computation models. Chapter 14

Dynamic Ordered Sets with Exponential Search Trees

Randomization in Algorithms and Data Structures

arxiv: v1 [cs.ds] 15 Feb 2012

CS361 Homework #3 Solutions

Data Structures 1 NTIN066

arxiv: v2 [cs.ds] 17 Apr 2017

1 Review of Vertex Cover

How many hours would you estimate that you spent on this assignment?

Complexity Theory Part II

ON RANGE SEARCHING IN THE GROUP MODEL AND COMBINATORIAL DISCREPANCY

3F1 Information Theory, Lecture 3

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Lecture 16: Communication Complexity

Lecture 1 : Data Compression and Entropy

Jeffrey D. Ullman Stanford University

CS1800: Mathematical Induction. Professor Kevin Gold

1 Maintaining a Dictionary

Lecture 8: Path Technology

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86

Cache-Oblivious Hashing

Outline for Today. Static Optimality. Splay Trees. Properties of Splay Trees. Dynamic Optimality (ITA) Balanced BSTs aren't necessarily optimal!

Lecture 21: Algebraic Computation Models

Easy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P

Coding of memoryless sources 1/35

Outline for Today. Static Optimality. Splay Trees. Properties of Splay Trees. Dynamic Optimality (ITA) Balanced BSTs aren't necessarily optimal!

Lecture 8: Complete Problems for Other Complexity Classes

Higher Cell Probe Lower Bounds for Evaluating Polynomials

On the Distortion of Embedding Perfect Binary Trees into Low-dimensional Euclidean Spaces

Succinct Data Structures for the Range Minimum Query Problems

Cache-Oblivious Algorithms

Transcription:

Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.) Published in SIAM Journal on Computing, 2006. Based on papers in STOC and SODA 2004. Topic is lower bounds for data structures problems. What is a data structures problem? We want to compute a sequence of functions where f i s are from a fixed set Ops. f i (x i, x i 1,..., x 1 ), for i = 1,..., n, Most importantly, only x i is explicitly passed to the i-th function. The algorithm must store x 1,..., x i 1 (or some functions thereof) in memory so it can access them when computing f i. The idea is to arrange the values in some clever way (the data structure) in order to make computing the f i (and updating memory to contain x i ) efficient. What do we mean by efficient? We will use the cell probe model. In particular, this means that we only measure the number of memory accesses (reads and writes of b-bit words) of the algorithm. All computation is free. Obviously, this is unrealistic, but it gives genuine lower bounds: any lower bound on the amount of memory accessed is a lower bound on the actual running time. Finally, we consider the amortized cost of the operations, that is, the average cost of operations over the whole sequence (i.e., sum of all individual costs divided by n). The paper proves lower bounds for two problems partial sums and dynamic connectivity (to be defined shortly). For each it proves a lower bound of Ω(log n) on the amortized cost of each operation (i.e., a bound of Ω(n log n) for the sequence). 1

2 Lower Bounds for Partial Sums Definition The partial sum problem asks us to maintain a function F : [n] G, with values in an arbitrary group G, under two operations. We start with F (i) = 0 (identity of G) for all i [n]. The first operation Update(i, ) with i [n] and G changes F so that F (i) F (i) +. The second operation Sum(i) returns i j=1 F (j). (Here, we are writing the operation as + although this operation need not be commutative.) A naive solution is to maintain the function values in an array (i.e., write F (j) into index j in memory). This takes O(1) time for a Update but O(n) time for Sum. A better solution is to also store a perfect binary tree of size m. [[Picture]] History Each node v stores the sum of all the F (j) in that subtree. Update requires us to change only the log n nodes above the updated index i. This takes O(log n) time. To compute Sum(i), we walk the path up to the root from the leaf at i. We need to maintain the sum of all values to the left in the current subtree. If we are the left child of the parent, then there is nothing to do. If we are the right child, then we can add the entire left subtree by adding in the value stored at the root of that tree. Again, this takes just O(log n) time. (Note that this binary tree can also be stored in an array if we want.) There were existing lower bounds of Ω(log n/ log log n), but even these were in more restrictive models such as only allowing group operations on the data, only oblivious algorithms, etc. 1 This was a well-known, open problem for 15 years prior to this paper. Lower Bound General idea will be to prove relations c 1 IL(.) H(.) c 2 IT (.), for some constants c 1, c 2. The information transfer, IT, will be a lower bound on the running time. The interleaving number, IL, will be a function of the input instance π. 1 There was an Ω(log n) lower bound, but this was in the semigroup model, i.e., no subtraction was allowed! 2

We will show that π can be chosen so that IL(π) Ω(log n), which gives a lower bound on the running time by the above relation. We will call time t the start of the t-th operation. Consider two adjacent time intervals t 0 < t 1 < t 2 and the quantity H(A t1,t 2 I,t0, I t1, ), where A are the answers (function values) and I are the inputs. This is a natural quantity to consider if we think of entropy as a bound on data compression. The algorithm must store some function of the inputs I t0,t 1 in order to compute the answers A t1,t 2, and this entropy is a bound on how much memory that must take. We define the information transfer IT (t 0, t 1, t 2 ) to be the set of all memory addresses read in [t 1, t 2 ] when last written in [t 0, t 1 ] along with their associated values at time t 1. Proposition 2.1. H(A t1,t 2 I,t0, I t1, ) b(2 E [ IT (t 0, t 1, t 2 ) ] + 1). Proof. We can expression the entropy in terms of IT as follows: H(A t1,t 2 I,t0, I t1, ) H(A t1,t 2, IT (t 0, t 1, t 2 ) I,t0, I t1, ) = H(IT (t 0, t 1, t 2 ) I,t0, I t1, ) + H(A t1,t 2 IT (t 0, t 1, t 2 ), I,t0, I t1, ) Let l = IT (t 0, t 1, t 2 ). Then in the same manner as above, we have H(IT (t 0, t 1, t 2 ) I,t0, I t1, ) H(l I,t0, I t1, ) + H(IT (t 0, t 1, t 2 ) l = k, I,t0, I t1, ). Since the support of l is at most 2 b, we have H(l I,t0, I t1, ) b. The second term is E [H(IT (t 0, t 1, t 2 ) I,t0, I t1, ) l = k], which is at most the entropy of a uniform distribution, E [bk l = k] = 2b E [l]. 2 On the other hand, H(A t1,t 2 IT (t 0, t 1, t 2 ), I,t0, I t1, ) is zero because the answers in [t 1, t 2 ] are a deterministic function of the fixed inputs and the information transfer. Specifically, we can simulate the algorithm in order to compute the answers. We simulate write to index i by recording it and read of index i depending on when it was written: If it was written after t 1, then we wrote it down earlier in the simulation. If it was written in [t 0, t 1 ], then it is in the information transfer. Otherwise, it was written before t 0, and since we have all the inputs from the start until that time, we can simulate the algorithm to get that value as well. 2 Note that IT includes both the index and its value. 3

Note that Proposition 2.1 did not depend at all on the input distribution. Hence, we are free to pick it as we see fit. We choose all operations to come in pairs, Sum(j i ) followed by Update(j i, ), where G is uniformly random. Let δ = log G. Next, we pick the indices to be chosen from a permutation j i = π(i) for some π S [m] yet to be chosen. We define the interleaving number as follows. Sort the indices π(t 0 ),..., π(t 2 ) to get a list π(j 1 ), π(j 2 ),..., π(j L ). IL(t 0, t 1, t 2 ) to be the number of times that we have j i 1 < t 1 j i. Proposition 2.2. H(A t1,t 2 I,t0, I t1, ) = δ IL(t 0, t 1, t 2 ). Proof. If we let S = {j i t 1 j i }, then we can write this entropy as H(A t1,t 2 I,t0, I t1, ) = L i=1 : j i S H(A ji A t1,j i 1, I,t0, I t1, ) using the chain rule. Then, we can analyze the individual terms in cases: If j i 1 < t 1 j i, then A ji is A ji 1 plus a uniformly random. Whatever the value of A ji 1, this answer is uniformly random over G. Finally, if t 1, j i 1 < t 1 j i, then A ji is A ji 1 plus a known value, so there is no entropy. Hence, the entropy is exactly that of the answers in IL, and since we choose our s to be uniform, we get exactly δ bits of entropy per. Theorem 2.3. The total cost of any algorithm is Ω ( δ b n log n) on some input sequence. Proof. Consider a perfect binary tree T over [n]. We define IT (v), for node v T over the time range t 0 < t 1 < t 2, to be the set of reads in [t 1, t 2 ] that were last written in [t 0, t 1 ]. Each write of an index i followed by a read of i is counted in exactly one IT (v): the v that is the lowest common ancestor of the times of that read and write. No other node has the read on one side and the write on the other. We can define IL(v) similarly: the set of interleaves between the two subtrees of the indices accessed. Each index followed by a higher index is counted in exactly one IL(v): again, the lowest common ancestor. Hence, the total cost of the algorithm is at least 1 b v T IT (v) δ b v T IL(v). It remains only to show that this final sum can be made large by our choice of π. If we choose π uniformly at random, then the odds of having an interleave at any given pair (i 1, i) is 1 4 o(1).3 So we get 1 4 k + O(1) for each interval of size k. This is 1 4n + O(1) for each level, so we have a total of Ω(n log n) interleaves. 3 This latter term is roughly 1 for an interval of size k. It will not affect the total sum by a significant amount. 2k 4

Notes In fact, choosing a random π is not necessary. A fixed permutation works: the bit-reversal permutation. This is defined by π(i) being the reverse of the bits of i. This means, for example, that the lowest order bit is determines whether each i is in the left or right half, which means the two halves interleave perfectly. This continues recursively. The proof in the paper is somewhat different from this. It uses the fact that entropy is a lower bound for how well data can be compressed. So it upper bounds the entropy by showing that it can encode and decode the answers by storing the information transfer. This proof was a bit shorter, but the essence is the same idea: the answers are a deterministic function of the information transfer. Our proof showed a uniform bound for the costs of Update and Sum. The paper shows a more general trade-off between ( these ) two costs. Specifically, if we call them t u and t s, respectively, then t s log tu t s Ω log n log b/δ and vice versa. The proof works by constructing a B-ary tree, with B = 2 max{ tu t s, ts t u }, instead of a binary tree, and then considering the information transfer between the first B 1 children and the last child. The two parts then have roughly the same cost and the proof goes through much as before. 3 Lower Bounds for Dynamic Connectivity Definition The dynamic connectivity problem asks us to maintain a graph G = ([n], E) under three operations. History We start with E =. The first two operations Insert(u, v) and Delete(u, v), with u, v [m], add or remove an edge from E. The last operation Connected(u, v) queries whether u and v are currently connected in the graph. Lower bound was open for a long time as above. A classic upper bound of O(log n) for trees was given by Sleator and Tarjan in STOC 1981. Several other graph problems can be reduced to this one: connected graph, planar graph, minimum spanning forest, etc. 5

Lower Bound The main idea of the lower bound is a reduction from partial sum. Recall that our bound holds for any group G. We choose G = S n, the symmetric group on n letters. Note that, for any i, σ 1 σ i is a permutation as well. How can we model this with a graph? Suppose we have a graph on a n n grid, where we restrict all edges to go between adjacent columns, and where we further restrict the edges between any two adjacent columns to be a permutation. [[Picture]] Then, the question of where element j is taken by the permutation we get when composing the permutations between the first i + 1 columns is the same as asking which node in the i + 1-st column is connected to j. We can implement Update(i, σ) by performing n edge Deletes followed by n Inserts. We have δ = log n log n, while b = log n, so our lower bound for partial sum implies S n that these operations must take Ω( n log n) time. Since we performed n inserts and deletes, this is Ω(log n) per operation. Unfortunately, there is no obvious way to perform sum using connected queries. It would seem to require n 2 connected queries. The Trick, part 1: Suppose we only had to implement Verify-Sum(i, s) rather than Sum. Now, the reduction is fine because we can verify with n connected queries. We just need to show that Verify-Sum has the same lower bound. original proof method falls apart there. Unfortunately, the The Trick, part 2: Suppose we were working with nondeterministic algorithms. Now, Sum can be reduced to Verify-Sum with only an Θ(log n) additive term since we can just guess the sum. (And clearly, we can reduce the other direction as well.) We just need to show that our original bound for Sum holds for nondeterministic algorithms. What exactly is a nondeterministic algorithm? Algorithm performs any number of reads and nondeterministic branching. Then it decideds whether to accept or rejct. If it accepts, then it may perform writes (and more reads). The cost is defined to be only that of the accepting thread. 6

Most of the proof goes through unchanged since entropy does not depend on how the algorithm computes. But Proposition 2.1, assumed the answers in [t 1, t 2 ] were a deterministic function of the fixed inputs and the information transfer. It is no longer obvious that this is true in the presence of nondeterminism. The accepting thread is fine since we defined IT for nondeterministic algorithms in terms of what the accepting thread does. But there is one problem with rejecting threads: they may read a value that was last written in [t 0, t 1 ] that was not read by the accepting thread (hence, the value is not in IT ). Our existing simulation would use an earlier written value, which may cause the thread to accept instead of reject. In order to fix our proof, all we need is to be able to detect that this case is about to occur. Since it is a rejecting thread, there is no harm in simply rejecting at that point. Since it cannot write any values, there is no effect of quitting early. The Trick, part 3: Suppose we had a data structure that could record a function f : S [2], for some subset S U = [2 b ], using only S + O(b) bits of space. That this can be done is an easy application of perfect hashing. (Note that b = log log U. We choose S = (W (t 0, t 1 ) R(t 1, t 2 ))\IT (t 0, t 1, t 2 ) and have the function value indicate whether the index is was written to, W (t 0, t 1 ), or read from, R(t 1, t 2 ). (If it were both, then it would be in IT.) The total size of R plus W over each level is T, so the total cost is height(t ) mathcalt. However, the other term in our bound is 2b T, and height(t ) log n b under the standard assumption that indices can fit in one word. So our bound is still Θ(b T ) Ω(δn log n) as before. 7