Electrical & Computer Engineering University of Waterloo Canada February 6, 2007

Size: px
Start display at page:

Download "Electrical & Computer Engineering University of Waterloo Canada February 6, 2007"

Transcription

1 Lecture 9: Lecture 9: Electrical & Computer Engineering University of Waterloo Canada February 6, 2007 Hash tables Lecture 9: Recall that a hash table consists of m slots into which we are placing items; A map h : K [0, m 1] from key values to slots. We put n keys k 1, k 2,..., k n into locations h(k 1 ), h(k 2 ),..., h(k n ). In the ideal situation we can then locate keys with O(1) operations.

2 Horner s Rule I Horner s rule gives an efficient method for evaluating hash functions for sequences, e.g., strings. Consider a hash function of the form Lecture 9: h(k) = k mod m If we wish to hash a string such as hello, we can interpret it as a long binary number: in ASCII, hello is }{{} }{{} }{{} }{{} }{{} h e l l o As a sequence of integers, hello is [104, 101, 108, 108, 111]. We want to compute ( ) mod m Horner s Rule II Horner s rule is a general trick for evaluating a polynomial. We write ax 3 + bx 2 + cx + d = (ax 2 + bx + c)x + d = ((ax + b)x + c)x + d Lecture 9: So that instead of computing x 3, x 2,... we have only multiplications: t 1 = ax + b t 2 = t 1 x + c t 3 = t 2 x + d Trivia: some early CPUs included an instruction opcode for applying Horner s rule. May be making a comeback!

3 Horner s Rule III To use Horner s rule for hashing: to compute (a b c d) mod m, t 1 = (a b) mod m t 2 = (t c) mod m t 3 = (t d) mod m Lecture 9: Note that multiplying by 2 k is simply a shift by k bits. Why this works. In short, algebra. The integers Z form a ring under multiplication and addition. The hash function h(k) = k mod m can be interpreted as a homomorphism from the ring Z of integers to the ring Z/mZ of integers modulo m. Homomorphisms preserve structure in the following sense: if we write + for integer addition, and for addition modulo m, h(a + b) = h(a) h(b) i.e., it doesn t matter whether we compute (a + b) mod m or compute (a mod m) and (b mod m) and add with modular Horner s Rule IV arithmetic: we get the same answer either way. Similarly, if we write for multiplication in Z, and for multiplication in Z/mZ, h(a b) = h(a) h(b) Horner s rule works precisely because h : Z Z/mZ is a homomorphism: h(((a b) c) d) = (((h(a) h(2 8 ) h(b)) h(2 8 ) h(c)) h(2 8 ) h(d)) Lecture 9: This can be optimized to use fewer applications of h, as above. In this form it is obvious why m = 2 8 is a horrible choice for a hash table size: 2 8 mod 2 8 = 0, so (((h(a) h(2 8 ) h(b)) h(2 8 ) h(c)) h(2 8 ) h(d)) = (((h(a) 0 h(b)) 0 h(c)) 0 h(d)) = h(d) i.e., the hash value depends only on the last byte. Similarly, if we used m = 2 16, we would have h(2 16 ) = 0, which would remove all but the last two bytes from the hash value computation. For background on algebra see, e.g., [1, 9, 7].

4 Collisions A collision occurs when two keys map to the same location in the hash table, i.e., there are distinct x, y M such that h(x) = h(y). Strategies for handling collisions: 1. Pick a value of m large enough so that collisions are rare, and can be easily dealt with e.g., by maintaining a short overflow list of items whose hash slot is already occupied. 2. Pick the hash function h to avoid collisions. 3. Put another data structure in each hash table slot (a list, tree, or another hash table); 4. If a hash slot is full then try some other slots in some fixed sequence (open addressing). Lecture 9: Collision Strategy 1: Pick m big I Let s see how big m must be for the probability of collisions to be small. Two cases: n > m: then there must be a collision, by the pigeonhole principle. 1 n m: may or may not be a collision. The birthday problem : what is the probability that amongst n people, at least two share the same birthday? This is a hashing problem: people are keys, days of the year are slots, and h maps people to their birthdays. If n 23, then the probability of two people having the same birthday is > 1 2. (Counterintuitive, but true.) The birthday problem analysis is straightforward to adapt to hashing. Lecture 9:

5 Collision Strategy 1: Pick m big II Suppose the hash function h and the distribution of keys cooperate to produce a uniform distribution of keys into hash table slots. Recall that with a uniform distribution, probability may be computed by simple counting: Lecture 9: Pr(event E happens) = # outcomes in which E happens # outcomes First we count the number of hash functions without collisions: There are m choices of where to put the first key; m 1 choices of where to put the second key;... m n + 1 choices of where to put the n th key. The number of hash functions with no collisions is m n = m (m 1) (m n + 1) = m! (m n)!. (Note2.) Next we count the number of hash functions allowing collisions: Collision Strategy 1: Pick m big III There are m choices of where to put the first key; m choices of where to put the second key;... m choices of where to put the n th key. The number of hash functions allowing collisions is m n. The probability of a collision-free arrangement is Lecture 9: p = m! (m n)! m n Asymptotic estimate of ln p, assume m n: ln p n2 2m + n ( ) n 3 2m + O m 2 (1) Here we have used Stirling s approximation and ln(m n) = ln m n O n. 2 m m 2 Two cases: If n 2 m then ln p 0. If n 2 m then ln p.

6 Collision Strategy 1: Pick m big IV Recall that if ln p = x + ɛ then Lecture 9: p = e x+ɛ = e x e ɛ = e x ( 1 + ɛ + ɛ 2 + ) Taylor series = e x (1 + O(ɛ)) if ɛ o(1) Probability of a collision-free arrangement is Interpretation: p e n(n 1) 2m + O (n 3 e n(n 1) 2m m 2 ) Collision Strategy 1: Pick m big V Lecture 9: If m ω(n 2 ) there are no collisions (almost surely). If m o(n 2 ) there is a collision (almost surely). i.e., if we want a low probability of collisions, our hash table has to be quadratic (or more) in the number of items. 1 If m + 1 pigeons are placed in m pigeonholes, there must be two pigeons in the same hole. (Replace pigeons with keys, and pigeonholes with hash slots. ) 2 The handy notation m m is called a falling power [8].

7 Threshold functions m = 1 2 n2 is an example of a threshold function: the threshold, asymptotic probability of event is 0 the threshold, asymptotic probability of event is 1. Lecture 9: Prob. of no collision 1 0 n n 2 ɛ n 2 n 2+ɛ Hash table size (m) n 3 Collision Strategy 1: pick m big Lecture 9: Picking m big is not an effective strategy for handling collisions. For n = 1000 elements, this table shows how big m must be to achieve the desired probability of no collisions: p m

8 Collision Strategy 1: pick m big The analysis of collisions in hashing demonstrates two pigeonhole principles. The simplest pigeonhole principle states that if you put m + 1 pigeons in m holes, there must be one hole with 2 pigeons. With respect to hash tables, the pigeonhole applies as follows: If a hash table with m slots is used to store m + 1 elements, there is a collision. The probability-of-collision analysis of the previous slide demonstrates a probabilistic pigeonhole principle: if you put ω( n) pigeons in n holes, there is a hole with 2 pigeons almost surely (i.e., with probability converging to 1 as n.) Lecture 9: Collision Strategy 2: pick h carefully I Can we pick our hash function h to avoid collisions? For example, if we use hash functions of the form h(k) = m{kφ} Lecture 9: we could try random values of φ (0, 1) until we found one that was collision-free. We have a probability of success p e n 2m(m 1) (1 + o(1)) Geometric distribution: Probability of success p, probability of failure 1 p Each trial independent, identically distributed. Probability that k tries are needed for success = (1 p) k 1 p Mean: p 1

9 Collision Strategy 2: pick h carefully II Number of values of φ we expect to try before we find a collision-free hash table for n = 1000: m # Expected failures before success Lecture 9: Picking hash functions randomly in this manner is unlikely to be practical. There are better strategies: see [6, 2]. Collision Strategy 3: secondary data structures I By far the most common technique for handling collisions is to put a secondary data structure in each hash table slot: A linked list ( chaining ) A binary search tree (BSTs) Another hash table Let α = n m be the load factor: the average number of items per hash table slot. Assuming uniform distribution of keys into slots: Linked lists require 1 + α steps (on average) to find a key; Suitable BSTs require 1 + max(c log α, 0) steps (on average). 3 Using secondary hash tables of size quadratic in the number of elements in the slot, one can achieve O(1) lookups on average, and require only Θ(n) space. Lecture 9:

10 Collision Strategy 3: secondary data structures II Analysis of secondary hash tables: Let N i be a random variable indicating the number of items landing in slot j. E[N i ] = α Var[N i ] = n «m m {z } Bernoulli variance Space required for secondary hash tables is proportional to 2 3 E 4 X 5 = X 1 i m N 2 i 1 i m = m E[N 2 i ] = n 1 m X 1 i m 1 1 m Var[N i ] + α 2 ««+ n2 m 2 Lecture 9: n2 m + n n m Plus space Θ(m) for the primary hash table = Θ(m + n2 + n). Choosing m = Θ(n) yields linear space. m 3 The max( ) deals with the possibility that α < 1, in which case log α < 0. Collision Strategy 4: open addressing I Open addressing is a family of techniques for resolving collisions that do not require secondary data structures. This has the advantage of not requiring any dynamic memory allocation. In the simplest scenario we have a function s : H H that is ideally a permutation of the hash values, for example the linear probing function Lecture 9: s(x) = (x + 1) mod m When we attempt to insert a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until an empty slot is found. To find a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until either k or an empty slot is found.

11 Collision Strategy 4: open addressing II However, the use of permutations performs badly as the hash table becomes fuller: tend to get clumps/clusters, i.e., long sequences h(k), s(h(k), s(s(h(k))),... where all the slots are occupied (see e.g. [10]). Performance can be good for not very full tables, e.g. α < 2 3. As α 1 operations begin to take Θ( n) time [5]. Quadratic probing offers less clumping: try slots h 0 (k), h 1 (k), where Lecture 9: h i (k) = (h(k) + i 2 ) mod m h(k) is an initial fixed hash function. If m prime, the sequence h i (k) will visit every slot. Double hashing uses two hash functions, h 1 and h 2 : h i (k) = (h 1 (k) + i h 2 (k)) mod m h 1 (k) gives an initial slot to try; h 2 (k) gives a stride (reduces to linear probing when h 2 (k) = 1.) Collision Strategy 4: open addressing III Lecture 9: Under favourable conditions, an open addressing scheme behaves like a geometric distribution when searching for an open slot: the probability of finding an empty slot is 1 α, so the expected number of trials is 1 1 α. Note the catastrophe when α 1.

12 Summary of collision strategies Lecture 9: Strategy E[access time] Space Choose m big O(1) Ω(n 2 ) Linked List 1 + α O(n + m) Binary Search Tree 1 + max(c log α, 0) O(n + m) Secondary Hash Tables O(1) O(n + m) 1 Open addressing 1 α O(m) Open addressing can be quite effective if α 1, but fails catastrophically as α 1. Summary of collision strategies If unexpectedly n m (e.g. we have far more data than we designed for), then α. For example, if m O(1) and n ω(1): Linked list has O(n) accesses; BSTs have O(log n) accesses offer a gentler failure mode. If hash function is badly nonuniform: Linked list can be O(n); BST will have O(log n); Secondary hash tables may require O(n 2 ) space. To summarize: hash table + BST will give fast search times, and let you sleep at night. To maintain O(1) access times as n, it is necessary to maintain m n. This can be done by choosing an allowable interval α [c 1, c 2 ]; when α > c 2 resize the hash table to make α = c 1. So long as c 2 > c 1, this strategy adds O(1) amortized time per insertion, as in dynamic arrays. Lecture 9:

13 Applications of hashing I Lecture 9: is a ubiquitous concept, used not just for maintaining collections but also for cryptography combinatorics data mining computational geometry databases router traffic analysis An example: probabilistic counting Probabilistic Counting I Lecture 9: Problem: estimate the number of unique elements in a LARGE collection (e.g., a database, a data stream) without requiring much working space Useful for query optimization in databases [11]: e.g. to evaluate A B C can do either A (B C) or (A B) C one of these might be very fast, one very slow. have rough estimates of B C vs A B to decide which strategy will be faster.

14 Probabilistic Counting I Less serious (but more readily understood) example: Shakespeare s complete works: N=884,647 words (or so) n=28,239 unique words (or so) w = average word length N max n = prior estimate on n Problem: estimate n the number of unique words used. Approaches: 1. Sorting: Put all 884,647 words in a list and sort, then count. (Time O(Nw log N), space O(Nw)) 2. Trie: Scan through the words and build a trie, with counters at each node; requires O(nw) space (neglecting size of counters.) 3. Super-LogLog Probabilistic Counting [3]: Use 128 bytes of space, obtain estimate of words (error 9.4%). Lecture 9: Probabilistic Counting I Inputs: a multiset A of elements, possibly with many duplicates (e.g., Shakespeare s plays) Problem: estimate card(a): the number of unique elements in A (e.g., number of distinct words Shakespeare used) Simple starting idea: hash the objects into an m-element hash table. Instead of storing keys, just count the number of elements landing in each hash slot. Extreme cases to illustrate the principle: Elements of A are all different: will get an even distribution in the hash table. Elements of A are all the same: will get one hash table slot with all the elements! The shape of the hash table distribution reflects the frequency of duplicates. Lecture 9:

15 Probabilistic Counting Lecture 9: Linear Counting [11] Compute hash values in the range [0, N max ) Maintain a bitmap representing which elements of the hash table would be occupied, and estimate n from the sparsity of the hash table. Uses Θ(Nmax ) bits, e.g., on the order of card(a) bits. Room for improvement: the precise sparsity pattern doesn t matter: just the number of full vs. empty slots. Probabilistic Counting I Probabilistic Counting [4] Compute hash values in the range [0, Nmax ) Instead of counting hash values directly, count the occurrence of hash values matching certain patterns: Lecture 9: Pattern xxxxxxx1 xxxxxx10 xxxxx100 xxxx1000. Expected occurrences 2 1 card(a) 2 2 card(a) 2 3 card(a) 2 4 card(a). Use these counts to estimate card(a). To improve accuracy, use m different hash functions. Uses Θ(m log N max ) storage, and delivers accuracy of O(m 1/2 )

16 Probabilistic Counting Super-LogLog [3] requires Θ(log log N max ) bits. With 1.28kb of memory can estimate card(a) to within accuracy of 2.5% for N max 130 million. Probabilistic counters: count to N using log log N bits: Lecture 9: Need log N states, which can be encoded in log log N bits. I [1] Stanley Burris and H. P. Sankappanavar. A Course in Universal Algebra. Springer-Verlag, bib pdf [2] Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, and Friedhelm MeyerAuf Der. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4): , bib Lecture 9: [3] Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, ESA, volume 2832 of Lecture Notes in Computer Science, pages Springer, bib pdf

17 II [4] Philippe Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2): , September bib pdf Lecture 9: [5] Philippe Flajolet, Patricio V. Poblete, and Alfredo Viola. On the analysis of linear probing hashing. Algorithmica, 22(4): , bib pdf [6] Michael L. Fredman and Janos Komlos an Endre Szemeredi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3): , bib III [7] Joseph A. Gallian. Contemporary Abstract Algebra. D. C. Heath and Company, Toronto, 3rd edition, bib Lecture 9: [8] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, Reading, MA, USA, second edition, bib [9] Saunders MacLane and Garrett Birkhoff. Algebra. Chelsea Publishing Co., New York, third edition, bib

18 IV [10] Robert Sedgewick and Philippe Flajolet. An introduction to the analysis of algorithms. Addison-Wesley Publishing Company, Reading, MA-Menlo Park-New York-Don Mills, Ontario-Wokingham, England-Amsterdam-Bonn- Sydney-Singapore-Tokyo-Madrid-San Juan-Milan-Paris, bib Lecture 9: [11] Kyu-Young Whang, Brad T. Vander-Zanden, and Howard M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2): , bib pdf

Introduction to Hash Tables

Introduction to Hash Tables Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In

More information

Hashing. Martin Babka. January 12, 2011

Hashing. Martin Babka. January 12, 2011 Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30 CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, 2019 1 / 30 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30 Open Addressing Open Addressing. Each entry in

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Hash tables. Hash tables

Hash tables. Hash tables Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X

More information

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10] Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter A, you write 3,004,572,688. B is 28,731,569½.

More information

Lecture: Analysis of Algorithms (CS )

Lecture: Analysis of Algorithms (CS ) Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect

More information

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).

More information

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park 1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution

More information

So far we have implemented the search for a key by carefully choosing split-elements.

So far we have implemented the search for a key by carefully choosing split-elements. 7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)

More information

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated

More information

lsb(x) = Least Significant Bit?

lsb(x) = Least Significant Bit? lsb(x) = Least Significant Bit? w-1 lsb i 1 0 x 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 1 msb(x) in O(1) steps using 5 multiplications [M.L. Fredman, D.E. Willard, Surpassing the information-theoretic bound with

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Lecture Lecture 3 Tuesday Sep 09, 2014

Lecture Lecture 3 Tuesday Sep 09, 2014 CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor

More information

ECE750-TXB Lecture 8: Treaps, Tries, and. Hash Tables

ECE750-TXB Lecture 8: Treaps, Tries, and. Hash Tables , and, and Hash Electrical & Computer Engineering University of Waterloo Canada February 1, 2007 Recall that a binary search tree has keys drawn from a totally ordered structure K, An inorder traversal

More information

How Philippe Flipped Coins to Count Data

How Philippe Flipped Coins to Count Data 1/18 How Philippe Flipped Coins to Count Data Jérémie Lumbroso LIP6 / INRIA Rocquencourt December 16th, 2011 0. DATA STREAMING ALGORITHMS Stream: a (very large) sequence S over (also very large) domain

More information

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9 Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

1 Hashing. 1.1 Perfect Hashing

1 Hashing. 1.1 Perfect Hashing 1 Hashing Hashing is covered by undergraduate courses like Algo I. However, there is much more to say on this topic. Here, we focus on two selected topics: perfect hashing and cockoo hashing. In general,

More information

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

More information

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Dictionaries Dictionary problem. Maintain a set S U = {,..., u-} supporting lookup(x): return true if x S and false otherwise. insert(x): set S = S {x} delete(x): set S = S - {x} Dictionaries

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions. CSE533: Information Theory in Computer Science September 8, 010 Lecturer: Anup Rao Lecture 6 Scribe: Lukas Svec 1 A lower bound for perfect hash functions Today we shall use graph entropy to improve the

More information

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record. CS 5633 -- Spring 25 Symbol-table problem Hashing Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 5633 Analysis of Algorithms 1 Symbol table holding n records: record

More information

Hashing, Hash Functions. Lecture 7

Hashing, Hash Functions. Lecture 7 Hashing, Hash Functions Lecture 7 Symbol-table problem Symbol table T holding n records: x record key[x] Other fields containing satellite data Operations on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of

More information

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) = CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)

More information

Analysis of Algorithms I: Perfect Hashing

Analysis of Algorithms I: Perfect Hashing Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the

More information

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard

More information

Hashing Data Structures. Ananda Gunawardena

Hashing Data Structures. Ananda Gunawardena Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

1 Estimating Frequency Moments in Streams

1 Estimating Frequency Moments in Streams CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature

More information

CSE 502 Class 11 Part 2

CSE 502 Class 11 Part 2 CSE 502 Class 11 Part 2 Jeremy Buhler Steve Cole February 17 2015 Today: analysis of hashing 1 Constraints of Double Hashing How does using OA w/double hashing constrain our hash function design? Need

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most

More information

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn Dictionary problem Dictionary T holding n records: x records key[x] Other fields containing satellite data Operations on T: INSERT(T, x)

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending

More information

Cuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions

Cuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions 1 / 29 Cuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions Martin Aumüller, Martin Dietzfelbinger Technische Universität Ilmenau 2 / 29 Cuckoo Hashing Maintain a dynamic dictionary

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

4.5 Applications of Congruences

4.5 Applications of Congruences 4.5 Applications of Congruences 287 66. Find all solutions of the congruence x 2 16 (mod 105). [Hint: Find the solutions of this congruence modulo 3, modulo 5, and modulo 7, and then use the Chinese remainder

More information

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Algorithms lecture notes 1. Hashing, and Universal Hash functions Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.

More information

ALGEBRA AND ALGEBRAIC COMPUTING ELEMENTS OF. John D. Lipson. Addison-Wesley Publishing Company, Inc.

ALGEBRA AND ALGEBRAIC COMPUTING ELEMENTS OF. John D. Lipson. Addison-Wesley Publishing Company, Inc. ELEMENTS OF ALGEBRA AND ALGEBRAIC COMPUTING John D. Lipson University of Toronto PRO Addison-Wesley Publishing Company, Inc. Redwood City, California Menlo Park, California Reading, Massachusetts Amsterdam

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

Hashing. Why Hashing? Applications of Hashing

Hashing. Why Hashing? Applications of Hashing 12 Hashing Why Hashing? Hashing A Search algorithm is fast enough if its time performance is O(log 2 n) For 1 5 elements, it requires approx 17 operations But, such speed may not be applicable in real-world

More information

N/4 + N/2 + N = 2N 2.

N/4 + N/2 + N = 2N 2. CS61B Summer 2006 Instructor: Erin Korber Lecture 24, 7 Aug. 1 Amortized Analysis For some of the data structures we ve discussed (namely hash tables and splay trees), it was claimed that the average time

More information

Introduction to Hashtables

Introduction to Hashtables Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example

More information

Problem Set 4 Solutions

Problem Set 4 Solutions Introduction to Algorithms October 8, 2001 Massachusetts Institute of Technology 6.046J/18.410J Singapore-MIT Alliance SMA5503 Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson Handout 18

More information

Bloom Filters, general theory and variants

Bloom Filters, general theory and variants Bloom Filters: general theory and variants G. Caravagna caravagn@cli.di.unipi.it Information Retrieval Wherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments

More information

Chapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn

Chapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn Chapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn Randomization Randomized Algorithm: An algorithm that uses (or can use) random coin flips in order to make decisions We will see: randomization

More information

Asymptotic Analysis. Slides by Carl Kingsford. Jan. 27, AD Chapter 2

Asymptotic Analysis. Slides by Carl Kingsford. Jan. 27, AD Chapter 2 Asymptotic Analysis Slides by Carl Kingsford Jan. 27, 2014 AD Chapter 2 Independent Set Definition (Independent Set). Given a graph G = (V, E) an independent set is a set S V if no two nodes in S are joined

More information

? 11.5 Perfect hashing. Exercises

? 11.5 Perfect hashing. Exercises 11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate

More information

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory

More information

Lecture 4 Thursday Sep 11, 2014

Lecture 4 Thursday Sep 11, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among

More information

CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing

CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing Tim Roughgarden April 4, 204 otivation: Linear Probing and Universal Hashing This lecture discusses a very neat paper

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261

More information

Electrical & Computer Engineering University of Waterloo Canada February 26, 2007

Electrical & Computer Engineering University of Waterloo Canada February 26, 2007 : Electrical & Computer Engineering University of Waterloo Canada February 26, 2007 We want to choose the best algorithm or data structure for the job. Need characterizations of resource use, e.g., time,

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of

More information

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to Lecture Overview Dictionaries and Python Motivation Hash functions Chaining Simple uniform hashing Good hash functions Readings CLRS Chapter,, 3 Dictionary Problem Abstract Data Type (ADT) maintains a

More information

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13 CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic

More information

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1 CSE525: Randomized Algorithms and Probabilistic Analysis April 2, 2013 Lecture 1 Lecturer: Anna Karlin Scribe: Sonya Alexandrova and Eric Lei 1 Introduction The main theme of this class is randomized algorithms.

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2 1 Faculty of Mathematics Waterloo, Ontario Centre for Education in Mathematics and Computing Grade 11/12 Math Circles Fall 2014 - Nov. 5 Recurrences, Part 2 Running time of algorithms In computer science,

More information

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make

More information

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016 CS125 Lecture 4 Fall 2016 Divide and Conquer We have seen one general paradigm for finding algorithms: the greedy approach. We now consider another general paradigm, known as divide and conquer. We have

More information

Cryptographic Hash Functions

Cryptographic Hash Functions Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction

More information

Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks

Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks 1 Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks Michael Albert michael.albert@cs.otago.ac.nz 2 This week Arithmetic Knapsack cryptosystems Attacks on knapsacks Some

More information

Chapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn

Chapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Chapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Randomization Randomized Algorithm: An algorithm that uses (or can use) random coin flips in order to make decisions We will see: randomization

More information

Lecture 7: More Arithmetic and Fun With Primes

Lecture 7: More Arithmetic and Fun With Primes IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 7: More Arithmetic and Fun With Primes David Mix Barrington and Alexis Maciel July

More information

Lecture 6: Introducing Complexity

Lecture 6: Introducing Complexity COMP26120: Algorithms and Imperative Programming Lecture 6: Introducing Complexity Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2015 16 You need this book: Make sure you use the up-to-date

More information

Hashing. Data organization in main memory or disk

Hashing. Data organization in main memory or disk Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison

More information

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002) COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x

More information

Lecture 8 HASHING!!!!!

Lecture 8 HASHING!!!!! Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract Finding Succinct Ordered Minimal Perfect Hash Functions Steven S. Seiden 3 Daniel S. Hirschberg 3 September 22, 1994 Abstract An ordered minimal perfect hash table is one in which no collisions occur among

More information

Randomized Algorithms, Spring 2014: Project 2

Randomized Algorithms, Spring 2014: Project 2 Randomized Algorithms, Spring 2014: Project 2 version 1 March 6, 2014 This project has both theoretical and practical aspects. The subproblems outlines a possible approach. If you follow the suggested

More information

On the average-case complexity of Shellsort

On the average-case complexity of Shellsort Received: 16 February 2015 Revised: 24 November 2016 Accepted: 1 February 2017 DOI: 10.1002/rsa.20737 RESEARCH ARTICLE On the average-case complexity of Shellsort Paul Vitányi 1,2 1 CWI, Science Park 123,

More information

Lecture 11: Hash Functions, Merkle-Damgaard, Random Oracle

Lecture 11: Hash Functions, Merkle-Damgaard, Random Oracle CS 7880 Graduate Cryptography October 20, 2015 Lecture 11: Hash Functions, Merkle-Damgaard, Random Oracle Lecturer: Daniel Wichs Scribe: Tanay Mehta 1 Topics Covered Review Collision-Resistant Hash Functions

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn MULTIPOP top[s] = 6 top[s] = 2 3 2 8 5 6 5 S MULTIPOP(S, x). while not STACK-EMPTY(S) and k 0 2. do POP(S) 3. k k MULTIPOP(S, 4) Analysis

More information

UNIFORM HASHING IN CONSTANT TIME AND OPTIMAL SPACE

UNIFORM HASHING IN CONSTANT TIME AND OPTIMAL SPACE UNIFORM HASHING IN CONSTANT TIME AND OPTIMAL SPACE ANNA PAGH AND RASMUS PAGH Abstract. Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e.,

More information

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Lecture 1: Asymptotics, Recurrences, Elementary Sorting Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running

More information

data structures and algorithms lecture 2

data structures and algorithms lecture 2 data structures and algorithms 2018 09 06 lecture 2 recall: insertion sort Algorithm insertionsort(a, n): for j := 2 to n do key := A[j] i := j 1 while i 1 and A[i] > key do A[i + 1] := A[i] i := i 1 A[i

More information