Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

Similar documents
COS597D: Information Theory in Computer Science October 5, Lecture 6

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

6.1 Occupancy Problem

18.5 Crossings and incidences

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 4: Hashing and Streaming Algorithms

Lecture 5: Derandomization (Part II)

Sparse Fourier Transform (lecture 1)

1 Maintaining a Dictionary

COS597D: Information Theory in Computer Science September 21, Lecture 2

1 Difference between grad and undergrad algorithms

14.1 Finding frequent elements in stream

Lecture 11: Quantum Information III - Source Coding

1 Randomized Computation

Lecture 5: Probabilistic tools and Applications II

Lecture 4: Codes based on Concatenation

2 Completing the Hardness of approximation of Set Cover

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

Approximating MAX-E3LIN is NP-Hard

Lecture: Analysis of Algorithms (CS )

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

PCP Theorem and Hardness of Approximation

Design and Analysis of Algorithms

Lecture 14: Cryptographic Hash Functions

Lecture 4: Proof of Shannon s theorem and an explicit code

A Polynomial-Time Algorithm for Pliable Index Coding

Lecture 6: Expander Codes

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017

CS 6820 Fall 2014 Lectures, October 3-20, 2014

An Algorithmist s Toolkit September 24, Lecture 5

CPSC 536N: Randomized Algorithms Term 2. Lecture 9

Lecture 3: Error Correcting Codes

CPSC 467: Cryptography and Computer Security

Lecture 6: September 22

Compressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park

On a hypergraph matching problem

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010

Lecture 24: Approximate Counting

Lectures One Way Permutations, Goldreich Levin Theorem, Commitments

Bloom Filters and Locality-Sensitive Hashing

Lecture 10 + additional notes

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Lecture 03: Polynomial Based Codes

Lecture 23: Alternation vs. Counting

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Analysis of Algorithms I: Perfect Hashing

6.842 Randomness and Computation Lecture 5

How many randomly colored edges make a randomly colored dense graph rainbow hamiltonian or rainbow connected?

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Lecture 1: Contraction Algorithm

Randomness and Computation March 13, Lecture 3

Asymptotically optimal induced universal graphs

CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010

Essential facts about NP-completeness:

CSCI-B609: A Theorist s Toolkit, Fall 2016 Oct 4. Theorem 1. A non-zero, univariate polynomial with degree d has at most d roots.

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

Lecture 12: Lower Bounds for Element-Distinctness and Collision

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

SPECIAL POINTS AND LINES OF ALGEBRAIC SURFACES

Lecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012

Lower Bounds for Testing Bipartiteness in Dense Graphs

Math 5707: Graph Theory, Spring 2017 Midterm 3

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

Randomized Algorithms Multiple Choice Test

EE5585 Data Compression April 18, Lecture 23

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 9: Counting Matchings

Lecture 3: Oct 7, 2014

Lecture 18: Zero-Knowledge Proofs

ALGEBRAIC STRUCTURE AND DEGREE REDUCTION

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Testing Lipschitz Functions on Hypergrid Domains

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 18: Inapproximability of MAX-3-SAT

1 Terminology and setup

Lecture 18: Oct 15, 2014

Compute the Fourier transform on the first register to get x {0,1} n x 0.

PRGs for space-bounded computation: INW, Nisan

Lecture Lecture 3 Tuesday Sep 09, 2014

Lecture 8 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

1) The line has a slope of ) The line passes through (2, 11) and. 6) r(x) = x + 4. From memory match each equation with its graph.

1 Adjacency matrix and eigenvalues

CS 151 Complexity Theory Spring Solution Set 5

Lecture 22: Counting

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010

(K2) For sum of the potential differences around any cycle is equal to zero. r uv resistance of {u, v}. [In other words, V ir.]

Graph coloring, perfect graphs

Lecture 5: February 21, 2017

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Lecture 12: November 6, 2017

Multi-join Query Evaluation on Big Data Lecture 2

Lecture 8 HASHING!!!!!

Randomized Algorithms

Lecture 2: Minimax theorem, Impagliazzo Hard Core Lemma

Lecture 17. In this lecture, we will continue our discussion on randomization.

Equivalence of the random intersection graph and G(n, p)

CMSC 858K Advanced Topics in Cryptography March 4, 2004

Transcription:

CSE533: Information Theory in Computer Science September 8, 010 Lecturer: Anup Rao Lecture 6 Scribe: Lukas Svec 1 A lower bound for perfect hash functions Today we shall use graph entropy to improve the obvious lower bound on good hash functions. Definition 1 (k-perfect hash functions). Given a family of functions H = {h : [N] [b]}, we say that H is a k-perfect hash family, if S [N], S = k, there exists h H such that h is injective on S. Such families are very useful in computer science. Imagine that [N] is the set of all possible videos of a certain size (a huge set), and think of k as being a much smaller number: the total number of videos we want to store in a database. Now suppose k 1 videos have already been stored. One way to check whether a new video is in the database or not is to compare the bits of the new video with every stored video, which would be computationally expensive. If one has access to a k-perfect hash family, we can instead have a different server maintaining a hash table for each of the hash functions in the family, and each such server can check whether the new video matches up with a stored video or not. The k-perfect property guarantees that some server in the family will identify the video as being new. In addition, all of the computations can be done in parallel. Let t = H be the size of the k-perfect family. How small can t be? Claim. t logn/logb. Proof This is an application of the pigeonhole principle. Suppose t < logn/logb. Then b t < N, so two elements of [N] must be hashed in exactly the same way by every hash function. Thus, the family is not even -perfect in this case. Claim 3. If b 100k, then there is a k-perfect hash function family of size t = O(klogN). Proof Idea Pick t random functions and let them be in the family. Then for any fixed set of S of k-elements, each function is injective on the with constant probability. By the Chernoff bound, the probability that no function in the family is injective is at most exp( Ω(t)). The total number of such sets S is at most N k, so by the union bound, the probability that any set does not get an injective function is at most N k exp( Ω(t)), which can be made to be less than t for t as small as in the claim. The following theorem improves the bound given by the pigeonhole principle: ( ) b Theorem 4. t k 1 log(n k+) b(b 1) (b k+) log(b k+). 6-1

The theorem was first proved my Fredman and Komlós [1]. The graph entropy based interpretation of the proof that we discuss here is due to Körner [3]. Today we shall prove the theorem in the case that b N, though essentially the same proof works in the general setting. Let G denote the following graph: Vertices of G: {(D,x) : D [N] with D = k,x D [N]} Edges of G: {{(D,x 1 ),(D,x )} : x 1 x } Complete Graphs with n k+ vertices D 1 D Figure 1: The graph G. G consists of several connected components, one for every value of D, and each of these connected components is a complete graph on N k + vertices. Thus, the graph entropy of each component is log(n k+), and the graph entropy of their union is just a convex combination of the graph entropy of each part, so it is also H(G) = log(n k +). D1 no edges x1 D (b k+) partite graph x (a) h is not injective on D 1 (b) h is not injective on D 1 Figure : The structure of G h. Fix a k-perfect hash function family H. For every h H, define a subgraph G h G with the following subset of edges: Edges of G h : {{(D,x 1 ),(D,x )} : h is injective on D {x 1 } {x }}. If D is such that h is not injective on D, then G h has no edges in the component corresponding tod. Ontheotherhand, ifhisinjectiveond, wecanpartitionthevertices (D,x) ofthecomponent corresponding to D into b k +3 sets depending on the value of h(x). For every i / h(d), define the set A i = {(D,x) : h(x) = i}. The component corresponding to D is the (b k+)-partite graph 6-

D x 1 x x 3 Collides with D Figure 3: The regions in D are partitioned by h 1. obtained by partitioning the vertices into the sets A i. The graph entropy of each such component is thus bounded by log(b k+). Further, the fact that H is perfect translates to G = G h. h H This already gives us a lower bound, since H(G) h H H(G h), it must be the case that H(G) tlog(b k+), in other words, t log(n k +)/log(b k+). To get a better lower bound, we shall give a better upper bound on H(G h ). The fact that we haven t yet exploited is that many of the vertices of G h are isolated vertices. A vertex (D,x) is isolated in G h exactly when h is not injective on D {x}. By the lemma about the graph entropy of aunionof connected components, weknow thath(g h ) Pr[h is injective on D {x}]log(b k+), where the probability is taken over a random vertex (D,x). This probability is exactly equal to Pr[h is injective on S], for a random set S of size k 1. Recall that we are assuming here that b N. Claim 5. Pr[h is injective on S] is maximized when h 1 (1) = h 1 () =... = h 1 (b). Sketch of Proof 1 h (1) 1 h () Suppose h 1 (1) > h 1 ()+1. Let x be an element with h(x) = 1, and let us see what happens to Pr S [h is injective on S] when we change the value of h(x) from 1 to. We can write Pr[h is injective on S] = Pr[x S] Pr[h is injective on S x S]+Pr[x S] Pr[h is injective on S x S] Observe that the second term in the sum is unaffected by changing the value of h(x), and the first term can only increase. Thus, Pr[h is injective on S] is maximized when all preimages are of the same size. Next we bound the probability in the case that all preimages are the same size. Claim 6. Pr[h is injective on S] b(b 1)...(b k+) b k 1. 6-3

N/b N/b N/b... b Proof By Claim 5, it suffices to prove the bound when all preimages of h are of size N/b. We can think of the elements of [N] as partitioned into b buckets. We need to bound the probability that a random subset of k 1 elements gets split into k 1 different buckets. The total number of ways of picking k 1 elements is ( N k 1). The number of ways of picking elements from separate buckets is ( b k 1) (N/b) k 1. Thus, Pr[h is injective on S] Thus, H(G h ) b(b 1) (b k+) b k 1 ( b k 1 ) (N/b) k 1 ( N ) k 1 = b(b 1) (b k+) N k 1 b k 1 N(N 1) (N k+) b(b 1) (b k+) b k 1 log(b k+), proving the theorem. A sorting algorithm Suppose we are given n integers x 1,...,x n. Then, by the pigeonhole principle we need to make at least log(n!) = Θ(n log n) comparisons to sort them and rearrange them into ascending order, since if we use fewer comparisons, two different permutations of the integers can give the same outcome for every comparison, so we would not be able to distinguish the two. Indeed, we know of several ways to sort using Θ(nlogn) queries. Recall that a partial order is a relation on the elements such that x x for all elements, and x y z implies that x z. A total order is a partial order in which any two elements are comparable. Every partial order can be completed to a total order. Question: Given a partial order S on x 1,...,x n, how many queries are necessary to complete the partial order to a total order? Define e(s) to be the number of total orders consistent with S. Just as before, it is immediate from the pigeonhole principle that at least log(e(s)) queries are necessary. Kahn and Kim [] gave a sorting algorithm matching this lower bound. Theorem 7. There a polynomial time algorithm that given S and x 1,...,x n, makes log(e(s)) queries to determine the total order of x 1,...,x n. 6-4

Their algorithm uses graph entropy. Note that unlike the other examples we have seen, they use graph entropy to get an algorithm, rather than a lower bound. Here we sketch the high level outline of their algorithm. Let G S denote the comparability graph of S, namely the graph whose vertices are x 1,...,x n, and edges are pairs of elements that are comparable in S. Thus, if S is a total order, G S is the complete graph, and H(G S ) = logn. On the other hand, if G S is the empty graph, then S is the empty order. Kahn and Kim proceed by proving two theorems about the graph G S. The first gives us an estimate of distance between our graph G S and the complete graph Theorem 8. There is a constant C such that n(logn H(G s )) log(e(s)) Cn(logn H(G s ) The second theorem states that it is always possible to increase the entropy of the graph G S by some factor c/n if one picks the right vertices x,y on the graph. Let S(x y) denote the partial order obtained by extending S with the relation x y (and all its consequences). Theorem 9. There is a constant c such that if S is not a total order, x,y s.t. min{h(g S(x y),h(g S(y x) } H(G S )+ c n H(G) can be computed using a convex program, and there is an polynomial time algorithm that can approximate its value up to a polynomially small error. The algorithm for sorting operates as follows. In each step, the algorithm computes the left hand side of Theorem 9 for every pair x,y, using the current partial order S, and compares the pair that maximizes this value. The query must increase the graph entropy by c/n. After log(e(s))/c steps, the algorithm must reach a total order by Theorem 8 3 Locally Decodable Codes Here we sketch a beautiful lower bound for -query locally decodable codes, discovered by Alex Samorodnitsky. Rather than give a formal definition, let us start with an example, the famous Hadamard code. Suppose we want to store an n bit string x F n, but we are afraid that some of the bits we store may get corrupted. In the Hadamard code, we would store the value of a,x = n i=1 a i x i for every a F n. This, takes a lot of space we are storing n bits instead of just n bits. The advantage of this scheme is that it can tolerate many errors, and has a fast decoding algorithm. 3.1 Decoding x i To decode a particular bit x i, 1. Pick a F n uniformly at random.. Query q 1 = a,x and q = a +e i,x, where e i = (0,...,0,1,0,...,0), the unit vector in the i th direction. 6-5

3. Output q 1 +q. Claim 10. If at most 1% of the stored information is corrupted, we recover x i with probability at least 98%. Proof Since both a and a+e i are uniformly random vectors, the probability that either query is corrupted is at most %. If neither is corrupted a,x + a+e i,x = a+a+e i,x = x i. This code is a -query code, since we only make two queries to recover x i. The code is linear, since the encoding is a linear function of the input bits. Theorem 11 (Samorodnitsky). Every linear -query locally decodable code must store Ω(n) bits. Let us sketch the proof. Suppose the code encodes x into the vector a 1,x, a,x,..., a m,x. It is no loss of generality to assume that the decoder succeeds in decoding x i only when it queries the code at a j,a k such that a j +a k = e i. Indeed, if x is uniformly random and the span of a j,a k do not contain e i, then the value of x i is independent of the queried values, so the queries are useless. If the code is to tolerate a 1% fraction of errors, then it must be the case that there are at least m/00 pairs of bits that can be queried to recover x i. If not, the value of x i can be destroyed by randomly corrupting at most m/100 of the stored bits. Recall that the hypercube is the graph whose vertices are F n, and two vertices are adjacent if and only if they disagree in exactly one coordinate. We have just argued that if any linear -query code uses only m bits of storage, then there must be a set S = {a 1,...,a m } of vertices in the hypercube, such that for every i, S contains at least m/00 edges in direction i. The lower bound will be proved by showing the following claim. Claim 1. For every subset S of vertices in the hypercube, # of edges in S S log( S ) The bound is tight when S is a subcube. Given the claim, the fact that S has m/00 edges in each direction implies that nm 00 mlogm m n/100. The proof of the claim goes by a slick entropy argument. Proof Let X 1,...,X n be a uniformly random vertex in S. Then observe that for every x,...,x n in the support of X 1,...,X n, X 1 x,...,x n is a uniform bit if there is an edge in direction 1 at x,...,x n, and is constant if not. Thus, { 1 if there is an edge in direction 1 at x,...,x n, and H(X 1 x,...,x n ) = 0 else. Therefore, the number of edges in direction i is exactly S H(X i X i )/, where X i denotes all coordinate except for the i th one. The factor of comes from the fact that every edge gets a probability of / S in the expectation. 6-6

This allows us to bound the total number of edges by ( S /) n H(X i X i ) i=1 ( S /) n H(X i X 1,...,X i 1 ) i=1 = ( S /)H(X 1,...,X n ) = S log S References [1] Michael L. Fredman and János Komlós. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic and Discrete Methods, 5(1):61 68, March 1984. [] Jeff Kahn and Jeong Han Kim. Entropy and sorting. Journal of Computer and System Sciences, 51(3):390 399, December 1995. [3] J. Körner. Fredman-Komlós bounds and information theory. SIJADM: SIAM Journal on Algebraic and Discrete Methods, 7, 1986. 6-7