Hash tables. Hash tables

Size: px
Start display at page:

Download "Hash tables. Hash tables"

Transcription

1

2 Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key is k. Insert(S, x) Add x to S. Delete(S, x) Remove x from S Assume that the keys are from U = {0,..., u 1}.

3 Direct addressing S is stored in an array T [0..u 1]. The entry T [k] contains a pointer to the element with key k, if such element exists, and NULL otherwise. Search(T, k): return T [k]. Insert(T, x): T [x.key] x. Delete(T, x): T [x.key] NULL. What is the problem with this structure? 1 S U key satellite data

4 Hashing In order to reduce the space complexity of direct addressing, map the keys to a smaller range {0,... m 1} using a hash function. There is a problem of collisions (two or more keys that are mapped to the same value). There are several methods to handle collisions. Chaining Open addressing

5 Hash table with chaining Let h : U {0,..., m 1} be a hash function (m < u). S is stored in a table T [0..m 1] of linked lists. The element x S is stored in the list T [h(x.key)]. Search(T, k): Search the list T [h(k)]. Insert(T, x): Insert x at the head of T [h(x.key)]. Delete(T, x): Delete x from the list containing x. S={6,9,19,26,30} m=5, h(x)=x mod

6 Analysis Assumption of simple uniform hashing: any element is equally likely to hash into any of the m slots, independently of where other elements have hashed into. The above assumption is true when the keys are chosen uniformly and independently at random (with repetitions), and the hash function satisfies {k U : h(k) = i} = u/m for every i {0,..., m 1}. We want to analyze the performance of hashing under the assumption of simple uniform hashing. This is the balls into bins problem. Suppose we randomly place n balls into m bins. Let X be the number of balls in bin 1. The time complexity of a random search in a hash table is Θ(1 + X ).

7 The expectation of X Each ball has probability 1/m to be in bin 1. The random variable X has binomial distribution with parameters n and 1/m. Therefore, E[X ] = n/m.

8 The distribution of X Claim Pr[X = r] e α α r r!, where α = n/m (i.e., X has approximately Poisson distribution). Example If α = 1, Pr[X = 0] Pr[X = 4] Pr[X = 1] Pr[X = 5] Pr[X = 2] Pr[X = 6] Pr[X = 3] Pr[X = 7]

9 The distribution of X Claim Pr[X = r] e α α r r!, where α = n/m (i.e., X has approximately Poisson distribution). Proof. Pr[X = r] = = ( ) ( ) r ( n ) n r r m m n(n 1) (n r + 1) r! 1 m r ( 1 1 ) n r m If m and n are large, n(n 1) (n r + 1) n r and ( ) 1 1 n r m e n/m. Thus, Pr[X = r] e n/m (n/m) r. r!

10 The maximum number of balls in a bin Suppose that n = m = Most bins contain 0 3 balls. The probability that a specific bin contains at least 8 balls is However, it is very likely that some bin will contain 8 balls. Theorem If m = Θ(n) then the size of the largest bin is Θ(log n/ log log n) with probability at least 1 1/n Θ(1).

11 Rehashing We wish to maintain n = O(m) in order to have Θ(1) search time. This can be achieved by rehashing. Suppose we want n m. When the table has m elements and a new element is inserted, create a new table of size 2m and copy all elements into the new table h'(x)=x mod h(x)=x mod

12 Rehashing We wish to maintain n = O(m) in order to have Θ(1) search time. This can be achieved by rehashing. Suppose we want n m. When the table has m elements and a new element is inserted, create a new table of size 2m and copy all elements into the new table. The cost of rehashing is Θ(n).

13 Universal hash functions Random keys + fixed hash function (e.g. mod) The hashed keys are random numbers. A fixed set of keys + random hash function (selected from a universal set) The hashed keys are semi-random numbers.

14 Universal hash functions Definition A collection H of hash functions is a universal if for every pair of distinct keys x, y U, Pr h H [h(x) = h(y)] 1 m. Example Let p be a prime number larger than u. f a,b (x) = ((ax + b) mod p) mod m H p,m = {f a,b a {1, 2,..., p 1}, b {0, 1,..., p 1}}

15 Universal hash functions Theorem Suppose that H is a universal collection of hash functions. If a hash table for S is built using a randomly chosen h H, then for every k U, the expected time of Search(S, k) is Θ(1 + n/m). Proof. Let X = length of T [h(k)]. X = y S I y where I y = 1 if h(y.key) = h(k) and I y = 0 otherwise. [ ] E[X ] = E I y = E[I y ] = y S y S y S 1 + n 1 m. Pr [h(y.key) = h(k)] h H

16 Time complexity of chaining Under the assumption of simple uniform hashing, the expected time of a search is Θ(1 + α) time. If α = Θ(1), and under the assumption of simple uniform hashing, the worst case time of a search is Θ(log n/ log log n), with probability at least 1 1/n Θ(1). If the hash function is chosen from a universal collection at random, the expected time of a search is Θ(1 + α). The worst case time of insert is Θ(1) if there is no rehashing.

17 Interpreting keys as natural numbers How can we convert floats or ASCII strings to natural numbers? An ASCII string can be interpreted as a number in base 256. Example For the string CLRS, the ASCII values of the characters are C = 67, L = 76, R = 82, S = 83. So CLRS is ( )+( )+( )+( ) =

18 Horner s rule Horner s rule: a d x d + a d 1 x d a 1 x 1 + a 0 = Example: ( ((a d x + a d 1 )x + a d 2 )x + )x + a = (( ) ) If d is large the value of y is too big. Solution: evaluate the polynomial modulo p (1) y a d (2) for i = d 1 to 0 (3) y a i + xy mod p

19 Rabin-Karp pattern matching algorithm Suppose we are given strings P and T, and we want to find all occurences of P in T. The Rabin-Karp algorithm is as follows: Compute h(p) for every substring T of T of length P if h(t ) = h(p) check whether T = P. The values h(t ) for all T can be computed in Θ( T ) time using rolling hash. Example Let T = BGUACC, P = GUAB, T 1 = BGUA, T 2 = GUAB. h(t 1 ) = ( ) mod p h(t 2 ) = ( ) mod p = (h(t 1 ) ) mod p

20 Applications Data deduplication: Suppose that we have many files, and some files have duplicates. In order to save storage space, we want to store only one instance of each distinct file. Distributed storage: Suppose we have many files, and we want to store them on several servers.

21 Distributed storage Let m denote the number of servers. The simple solution is to use a hash function h : U {1,..., m}, and assign file x to server h(x). The problem with this solution is that if we add a server, we need to do rehashing which will move most files between servers.

22 Distributed storage: Consistent hashing Suppose that the server have identifiers s 1,..., s m. Let h : U [0, 1] be a hash function. For each server i associate a point h(s i ) on the unit circle. For each file f, assign f to the server whose point is the first point encountered when traversing the unit cycle anti-clockwise starting from h(f ). server 1 server 2 0 server 3

23 Distributed storage: Consistent hashing Suppose that the server have identifiers s 1,..., s m. Let h : U [0, 1] be a hash function. For each server i associate a point h(s i ) on the unit circle. For each file f, assign f to the server whose point is the first point encountered when traversing the unit cycle anti-clockwise starting from h(f ). server 1 server 2 0 server 3

24 Distributed storage: Consistent hashing When a new server m + 1 is added, let i be the server whose point is the first server point after h(s m+1 ). We only need to reassign some of the files that were assigned to server i. The expected number of files reassignments is n/(m + 1). server 1 server 2 0 server 4 server 3

25 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow h(k)=k mod 10

26 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,14)

27 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,20)

28 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,34)

29 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,19)

30 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,55)

31 Linear probing In the following, we assume that the elements in the hash table are keys with no satellite information. To insert element k, try inserting to T [h(k)]. If T [h(k)] is not empty, try T [h(k) + 1 mod m], then try T [h(k) + 2 mod m] etc. Insert(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = NULL OR... (4) T [j] k (5) return (6) error hash table overflow insert(t,49)

32 Searching Search(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = k (4) return j (5) if T [j] = NULL (6) return NULL (7) return NULL search(t,55)

33 Searching Search(T, k) (1) for i = 0 to m 1 (2) j h(k) + i mod m (3) if T [j] = k (4) return j (5) if T [j] = NULL (6) return NULL (7) return NULL search(t,25)

34 Deletion Delete method 1: To delete element k, store in T [h(k)] a special value DELETED D delete 34

35 Deletion Delete method 2: Erase k from the table (replace it by NULL) and also erase the consecutive block of elements following k. Then, reinsert the latter elements to the table delete 34

36 Deletion Delete method 2: Erase k from the table (replace it by NULL) and also erase the consecutive block of elements following k. Then, reinsert the latter elements to the table

37 Deletion Delete method 2: Erase k from the table (replace it by NULL) and also erase the consecutive block of elements following k. Then, reinsert the latter elements to the table

38 Open addressing Open addressing is a generalization of linear probing. Let h : U {0,... m 1} {0,... m 1} be a hash function such that {h(k, 0), h(k, 1),... h(k, m 1)} is a permutation of {0,... m 1} for every k U. The slots examined during search/insert are h(k, 0), then h(k, 1), h(k, 2) etc. In the example on the right, h(k, i) = (h 1 (k) + ih 2 (k)) mod 13 where h 1 (k) = k mod 13 h 2 (k) = 1 + (k mod 11) h(14,0) h(14,1) h(14,2) insert(t,14)

39 Open addressing Insertion is the same as in linear probing: Insert(T, k) (1) for i = 0 to m 1 (2) j h(k, i) (3) if T [j] = NULL OR T [j] = DELETED (4) T [j] k (5) return (6) error hash table overflow Deletion is done using delete method 1 defined above (using special value DELETED).

40 Double hashing In the double hashing method, h(k, i) = (h 1 (k) + ih 2 (k)) mod m for some hash functions h 1 and h 2. The value h 2 (k) must be relatively prime to m for the entire hash table to be searched. This can be ensured by either Taking m to be a power of 2, and the image of h 2 contains only odd numbers. Taking m to be a prime number, and the image of h 2 contains integers from {1,..., m 1}. For example, h 1 (k) = k mod m h 2 (k) = 1 + (k mod m ) where m is prime and m < m.

41 Time complexity of open addressing Assume uniform hashing: the probe sequence of each key is equally likely to be any of the m! permutations of {0,... m 1}. Assuming uniform hashing and no deletions, the expected number of probes in a search is 1 At most 1 α for unsuccessful search. At most 1 α ln 1 1 α for successful search.

42 Comparison of hash table methods Search Cuckoo Two Way Chaining Chained Hashing Linear Probing Successful Lookup Clock Cycles log n

43 Comparison of hash table methods Search Cuckoo Two Way Chaining Chained Hashing Linear Probing Unsuccessful Lookup Clock Cycles log n

44 Comparison of hash table methods Insert Clock Cycles Cuckoo Two Way Chaining Chained Hashing Linear Probing Insert log n

45 Comparison of hash table methods Delete Clock Cycles Cuckoo Two Way Chaining Chained Hashing Linear Probing Delete log n

46 Bloom Filter Bloom Filter

47 Blocking malicious web pages Suppose we want to implement blocking of malicious web pages in a web browser. Assume we have a list of 10,000,000 malicious pages. We can store all pages in a hash table, but this requires a large amount of memory. Bloom filter is an implementation of static dictionary that uses small amount of memory. For a query key that is in the set, the Bloom filter always returns a positive answer. For a query key that is not in the set, the Bloom filter can return either a negative answer, or a positive answer (false positive). Bloom Filter

48 Blocking malicious web pages Suppose that the list of malicious pages is stored in a Bloom filter. For a query URL, if the answer is negative, we know that the URL is not a malicious page. If the answer is positive, we do not know the correct answer. In this case, we get the answer from a server. We want a small probability for a false positive. Bloom Filter

49 Bloom filter simple version Suppose we want to store a set S. We use an array A of size m initialized to 0. We then choose a hash function h, and set A[h(x)] 1 for every x S. For a query y, if A[h(y)] = 0, we know that y / S. If A[h(y)] = 1, either y is in S or not Bloom Filter

50 Bloom filter simple version Suppose we want to store a set S. We use an array A of size m initialized to 0. We then choose a hash function h, and set A[h(x)] 1 for every x S. For a query y, if A[h(y)] = 0, we know that y / S. If A[h(y)] = 1, either y is in S or not Bloom Filter

51 Bloom filter simple version Suppose we want to store a set S. We use an array A of size m initialized to 0. We then choose a hash function h, and set A[h(x)] 1 for every x S. For a query y, if A[h(y)] = 0, we know that y / S. If A[h(y)] = 1, either y is in S or not Bloom Filter

52 Bloom filter simple version Suppose we want to store a set S. We use an array A of size m initialized to 0. We then choose a hash function h, and set A[h(x)] 1 for every x S. For a query y, if A[h(y)] = 0, we know that y / S. If A[h(y)] = 1, either y is in S or not Bloom Filter

53 Bloom filter simple version Suppose we want to store a set S. We use an array A of size m initialized to 0. We then choose a hash function h, and set A[h(x)] 1 for every x S. For a query y, if A[h(y)] = 0, we know that y / S. If A[h(y)] = 1, either y is in S or not Bloom Filter

54 Probability of false positive Assume simple uniform hashing: any element is equally likely to hash into any of the m slots, independently of where other elements have hashed into. For fixed i, the probability that A[i] = 0 is p = (1 1/m) n. For y / S, the probability that A[h(y)] = 1 is 1 p Bloom Filter

55 Reducing the false positive probability We choose k hash functions h 1, h 2,..., h k, and set A[h j (x)] 1 for j = 1, 2,... k, for every x S. For a query y, if A[h j (y)] = 0 for some j, we know that y / S. If A[h j (y)] = 1 for all j, either y is in S or not Bloom Filter

56 Reducing the false positive probability We choose k hash functions h 1, h 2,..., h k, and set A[h j (x)] 1 for j = 1, 2,... k, for every x S. For a query y, if A[h j (y)] = 0 for some j, we know that y / S. If A[h j (y)] = 1 for all j, either y is in S or not Bloom Filter

57 Reducing the false positive probability We choose k hash functions h 1, h 2,..., h k, and set A[h j (x)] 1 for j = 1, 2,... k, for every x S. For a query y, if A[h j (y)] = 0 for some j, we know that y / S. If A[h j (y)] = 1 for all j, either y is in S or not Bloom Filter

58 Reducing the false positive probability We choose k hash functions h 1, h 2,..., h k, and set A[h j (x)] 1 for j = 1, 2,... k, for every x S. For a query y, if A[h j (y)] = 0 for some j, we know that y / S. If A[h j (y)] = 1 for all j, either y is in S or not Bloom Filter

59 Reducing the false positive probability We choose k hash functions h 1, h 2,..., h k, and set A[h j (x)] 1 for j = 1, 2,... k, for every x S. For a query y, if A[h j (y)] = 0 for some j, we know that y / S. If A[h j (y)] = 1 for all j, either y is in S or not Bloom Filter

60 Probability of false positive For fixed i, the probability that A[i] = 0 is p = (1 1/m) kn. For y / S, the probability that A[h(y)] = 1 is (1 p) k Bloom Filter

61 Optimizing the probability of false positive For the previous example (n = 2, m = 10): For k = 1, p = 0.81, (1 p) 1 = For k = 2, p 0.656, (1 p) For k = 3, p 0.531, (1 p) For k = 4, p 0.430, (1 p) For k = 5, p 0.349, (1 p) Bloom Filter

62 Optimizing the probability of false positive Let f (k) be the probability of false positive. p = (1 1/m) kn e kn/m f (k) = (1 p) k (1 e kn/m ) k = e k ln(1 e kn/m ) Minimizing f (k) is equivalent to minimizing g(k) = k ln(1 e kn/m ). dg = ln(1 dk e kn/m ) + kn e kn/m. m 1 e kn/m The minimum of g(k) is when dg/dk = 0 = k = m n The minimum of f (k) is m/n. Example: For m = 8n, m ln f (5) , n f (6) ln 2. Bloom Filter

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Hash tables. Hash tables

Hash tables. Hash tables Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X

More information

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Dictionaries Dictionary: Maintain a dynamic set S. Every

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn Dictionary problem Dictionary T holding n records: x records key[x] Other fields containing satellite data Operations on T: INSERT(T, x)

More information

Lecture: Analysis of Algorithms (CS )

Lecture: Analysis of Algorithms (CS ) Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect

More information

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of

More information

Hashing, Hash Functions. Lecture 7

Hashing, Hash Functions. Lecture 7 Hashing, Hash Functions Lecture 7 Symbol-table problem Symbol table T holding n records: x record key[x] Other fields containing satellite data Operations on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most

More information

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

Introduction to Hashtables

Introduction to Hashtables Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example

More information

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park 1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution

More information

Introduction to Hash Tables

Introduction to Hash Tables Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In

More information

So far we have implemented the search for a key by carefully choosing split-elements.

So far we have implemented the search for a key by carefully choosing split-elements. 7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise

More information

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record. CS 5633 -- Spring 25 Symbol-table problem Hashing Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 5633 Analysis of Algorithms 1 Symbol table holding n records: record

More information

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10] Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter A, you write 3,004,572,688. B is 28,731,569½.

More information

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9 Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

Analysis of Algorithms I: Perfect Hashing

Analysis of Algorithms I: Perfect Hashing Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

More information

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Dictionaries Dictionary problem. Maintain a set S U = {,..., u-} supporting lookup(x): return true if x S and false otherwise. insert(x): set S = S {x} delete(x): set S = S - {x} Dictionaries

More information

CSE 502 Class 11 Part 2

CSE 502 Class 11 Part 2 CSE 502 Class 11 Part 2 Jeremy Buhler Steve Cole February 17 2015 Today: analysis of hashing 1 Constraints of Double Hashing How does using OA w/double hashing constrain our hash function design? Need

More information

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing) CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002) COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x

More information

:s ej2mttlm-(iii+j2mlnm )(J21nm/m-lnm/m)

:s ej2mttlm-(iii+j2mlnm )(J21nm/m-lnm/m) BALLS, BINS, AND RANDOM GRAPHS We use the Chernoff bound for the Poisson distribution (Theorem 5.4) to bound this probability, writing the bound as Pr(X 2: x) :s ex-ill-x In(x/m). For x = m + J2m In m,

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30 CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, 2019 1 / 30 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30 Open Addressing Open Addressing. Each entry in

More information

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of

More information

CSE 548: (Design and) Analysis of Algorithms

CSE 548: (Design and) Analysis of Algorithms Intro Decentralize Taming distribution Probabilistic Algorithms 1 / 68 CSE 548: (Design and) Analysis of Algorithms Randomized Algorithms R. Sekar Intro Decentralize Taming distribution Probabilistic Algorithms

More information

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures

More information

Cache-Oblivious Hashing

Cache-Oblivious Hashing Cache-Oblivious Hashing Zhewei Wei Hong Kong University of Science & Technology Joint work with Rasmus Pagh, Ke Yi and Qin Zhang Dictionary Problem Store a subset S of the Universe U. Lookup: Does x belong

More information

? 11.5 Perfect hashing. Exercises

? 11.5 Perfect hashing. Exercises 11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate

More information

Hashing. Martin Babka. January 12, 2011

Hashing. Martin Babka. January 12, 2011 Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice

More information

Hashing. Data organization in main memory or disk

Hashing. Data organization in main memory or disk Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison

More information

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Algorithms lecture notes 1. Hashing, and Universal Hash functions Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.

More information

Lecture 4 Thursday Sep 11, 2014

Lecture 4 Thursday Sep 11, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)

More information

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science Round 5: Hashing Tommi Junttila Aalto University School of Science Department of Computer Science CS-A1140 Data Structures and Algorithms Autumn 017 Tommi Junttila (Aalto University) Round 5 CS-A1140 /

More information

As mentioned, we will relax the conditions of our dictionary data structure. The relaxations we

As mentioned, we will relax the conditions of our dictionary data structure. The relaxations we CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data

More information

Hashing Data Structures. Ananda Gunawardena

Hashing Data Structures. Ananda Gunawardena Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments

More information

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to Lecture Overview Dictionaries and Python Motivation Hash functions Chaining Simple uniform hashing Good hash functions Readings CLRS Chapter,, 3 Dictionary Problem Abstract Data Type (ADT) maintains a

More information

Lecture 24: Bloom Filters. Wednesday, June 2, 2010

Lecture 24: Bloom Filters. Wednesday, June 2, 2010 Lecture 24: Bloom Filters Wednesday, June 2, 2010 1 Topics for the Final SQL Conceptual Design (BCNF) Transactions Indexes Query execution and optimization Cardinality Estimation Parallel Databases 2 Lecture

More information

Hashing. Why Hashing? Applications of Hashing

Hashing. Why Hashing? Applications of Hashing 12 Hashing Why Hashing? Hashing A Search algorithm is fast enough if its time performance is O(log 2 n) For 1 5 elements, it requires approx 17 operations But, such speed may not be applicable in real-world

More information

Randomized Load Balancing:The Power of 2 Choices

Randomized Load Balancing:The Power of 2 Choices Randomized Load Balancing: The Power of 2 Choices June 3, 2010 Balls and Bins Problem We have m balls that are thrown into n bins, the location of each ball chosen independently and uniformly at random

More information

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)

More information

Independence, Variance, Bayes Theorem

Independence, Variance, Bayes Theorem Independence, Variance, Bayes Theorem Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ May 16, 2016 Resolving collisions with chaining Hash

More information

1 Hashing. 1.1 Perfect Hashing

1 Hashing. 1.1 Perfect Hashing 1 Hashing Hashing is covered by undergraduate courses like Algo I. However, there is much more to say on this topic. Here, we focus on two selected topics: perfect hashing and cockoo hashing. In general,

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn MULTIPOP top[s] = 6 top[s] = 2 3 2 8 5 6 5 S MULTIPOP(S, x). while not STACK-EMPTY(S) and k 0 2. do POP(S) 3. k k MULTIPOP(S, 4) Analysis

More information

CS 580: Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Announcements: Homework 6 deadline extended to April 24 th at 11:59 PM Course Evaluation Survey: Live until 4/29/2018

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

Bloom Filters, general theory and variants

Bloom Filters, general theory and variants Bloom Filters: general theory and variants G. Caravagna caravagn@cli.di.unipi.it Information Retrieval Wherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.

More information

Amortized analysis. Amortized analysis

Amortized analysis. Amortized analysis In amortized analysis the goal is to bound the worst case time of a sequence of operations on a data-structure. If n operations take T (n) time (worst case), the amortized cost of an operation is T (n)/n.

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13 CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic

More information

Cuckoo Hashing and Cuckoo Filters

Cuckoo Hashing and Cuckoo Filters Cuckoo Hashing and Cuckoo Filters Noah Fleming May 7, 208 Preliminaries A dictionary is an abstract data type that stores a collection of elements, located by their key. It supports operations: insert,

More information

2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two

2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two CS{74 Combinatorics & Discrete Probability, Fall 96 Final Examination 2:30{3:30pm, 7 December Read these instructions carefully. This is a closed book exam. Calculators are permitted. 2. This exam consists

More information

CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 13th, 2017 Bloom Filter Approximate membership problem Highly space-efficient randomized data structure

More information

Application: Bucket Sort

Application: Bucket Sort 5.2.2. Application: Bucket Sort Bucket sort breaks the log) lower bound for standard comparison-based sorting, under certain assumptions on the input We want to sort a set of =2 integers chosen I+U@R from

More information

s 1 if xπy and f(x) = f(y)

s 1 if xπy and f(x) = f(y) Algorithms Proessor John Rei Hash Function : A B ALG 4.2 Universal Hash Functions: CLR - Chapter 34 Auxillary Reading Selections: AHU-Data Section 4.7 BB Section 8.4.4 Handout: Carter & Wegman, "Universal

More information

Tight Bounds for Sliding Bloom Filters

Tight Bounds for Sliding Bloom Filters Tight Bounds for Sliding Bloom Filters Moni Naor Eylon Yogev November 7, 2013 Abstract A Bloom filter is a method for reducing the space (memory) required for representing a set by allowing a small error

More information

Lecture 12: Hash Tables [Sp 15]

Lecture 12: Hash Tables [Sp 15] Insanity is repeating the same mistaes and expecting different results. Narcotics Anonymous (1981) Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally

More information

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261

More information

Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash

Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash Martin Aumu ller, Martin Dietzfelbinger, Philipp Woelfel? Technische Universita t Ilmenau,? University of Calgary ESA Ljubljana,

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

Outline. CS38 Introduction to Algorithms. Max-3-SAT approximation algorithm 6/3/2014. randomness in algorithms. Lecture 19 June 3, 2014

Outline. CS38 Introduction to Algorithms. Max-3-SAT approximation algorithm 6/3/2014. randomness in algorithms. Lecture 19 June 3, 2014 Outline CS38 Introduction to Algorithms Lecture 19 June 3, 2014 randomness in algorithms max-3-sat approximation algorithm universal hashing load balancing Course summary and review June 3, 2014 CS38 Lecture

More information

Advanced Algorithm Design: Hashing and Applications to Compact Data Representation

Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Lectured by Prof. Moses Chariar Transcribed by John McSpedon Feb th, 20 Cucoo Hashing Recall from last lecture the dictionary

More information

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}

More information

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM Hashing Hashing Start with an array that holds the hash table. Use a hash function to take a key and map it to some index in the array. If the desired record is in the location given by the index, then

More information

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) = CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)

More information

CSCI Honor seminar in algorithms Homework 2 Solution

CSCI Honor seminar in algorithms Homework 2 Solution CSCI 493.55 Honor seminar in algorithms Homework 2 Solution Saad Mneimneh Visiting Professor Hunter College of CUNY Problem 1: Rabin-Karp string matching Consider a binary string s of length n and another

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 ) CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26 Homework #2 ( Due: Nov 8 ) Task 1. [ 80 Points ] Average Case Analysis of Median-of-3 Quicksort Consider the median-of-3 quicksort algorithm

More information

Basics of hashing: k-independence and applications

Basics of hashing: k-independence and applications Basics of hashing: k-independence and applications Rasmus Pagh Supported by: 1 Agenda Load balancing using hashing! - Analysis using bounded independence! Implementation of small independence! Case studies:!

More information

Problem Set 4 Solutions

Problem Set 4 Solutions Introduction to Algorithms October 8, 2001 Massachusetts Institute of Technology 6.046J/18.410J Singapore-MIT Alliance SMA5503 Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson Handout 18

More information

Bloom Filters and Locality-Sensitive Hashing

Bloom Filters and Locality-Sensitive Hashing Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,

More information

Advanced Data Structures

Advanced Data Structures Simon Gog gog@kit.edu - Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Predecessor data structures We want to support the following operations on a set of integers from

More information

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers.

In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers. Suppose you are a content delivery network. In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers. How to distribute requests to balance the

More information

Lecture Lecture 3 Tuesday Sep 09, 2014

Lecture Lecture 3 Tuesday Sep 09, 2014 CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor

More information

Rainbow Tables ENEE 457/CMSC 498E

Rainbow Tables ENEE 457/CMSC 498E Rainbow Tables ENEE 457/CMSC 498E How are Passwords Stored? Option 1: Store all passwords in a table in the clear. Problem: If Server is compromised then all passwords are leaked. Option 2: Store only

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Introduction to Algorithms

Introduction to Algorithms Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of

More information

Bloom Filters, Adaptivity, and the Dictionary Problem

Bloom Filters, Adaptivity, and the Dictionary Problem 2018 IEEE 59th Annual Symposium on Foundations of Computer Science Bloom Filters, Adaptivity, and the Dictionary Problem Michael A. Bender, Martín Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley,

More information

Electrical & Computer Engineering University of Waterloo Canada February 6, 2007

Electrical & Computer Engineering University of Waterloo Canada February 6, 2007 Lecture 9: Lecture 9: Electrical & Computer Engineering University of Waterloo Canada February 6, 2007 Hash tables Lecture 9: Recall that a hash table consists of m slots into which we are placing items;

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

Cryptographic Hash Functions

Cryptographic Hash Functions Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction

More information

Lecture Note 2. 1 Bonferroni Principle. 1.1 Idea. 1.2 Want. Material covered today is from Chapter 1 and chapter 4

Lecture Note 2. 1 Bonferroni Principle. 1.1 Idea. 1.2 Want. Material covered today is from Chapter 1 and chapter 4 Lecture Note 2 Material covere toay is from Chapter an chapter 4 Bonferroni Principle. Iea Get an iea the frequency of events when things are ranom billion = 0 9 Each person has a % chance to stay in a

More information