Electrical & Computer Engineering University of Waterloo Canada February 6, 2007
|
|
- Ashlyn Franklin
- 5 years ago
- Views:
Transcription
1 Lecture 9: Lecture 9: Electrical & Computer Engineering University of Waterloo Canada February 6, 2007 Hash tables Lecture 9: Recall that a hash table consists of m slots into which we are placing items; A map h : K [0, m 1] from key values to slots. We put n keys k 1, k 2,..., k n into locations h(k 1 ), h(k 2 ),..., h(k n ). In the ideal situation we can then locate keys with O(1) operations.
2 Horner s Rule I Horner s rule gives an efficient method for evaluating hash functions for sequences, e.g., strings. Consider a hash function of the form Lecture 9: h(k) = k mod m If we wish to hash a string such as hello, we can interpret it as a long binary number: in ASCII, hello is }{{} }{{} }{{} }{{} }{{} h e l l o As a sequence of integers, hello is [104, 101, 108, 108, 111]. We want to compute ( ) mod m Horner s Rule II Horner s rule is a general trick for evaluating a polynomial. We write ax 3 + bx 2 + cx + d = (ax 2 + bx + c)x + d = ((ax + b)x + c)x + d Lecture 9: So that instead of computing x 3, x 2,... we have only multiplications: t 1 = ax + b t 2 = t 1 x + c t 3 = t 2 x + d Trivia: some early CPUs included an instruction opcode for applying Horner s rule. May be making a comeback!
3 Horner s Rule III To use Horner s rule for hashing: to compute (a b c d) mod m, t 1 = (a b) mod m t 2 = (t c) mod m t 3 = (t d) mod m Lecture 9: Note that multiplying by 2 k is simply a shift by k bits. Why this works. In short, algebra. The integers Z form a ring under multiplication and addition. The hash function h(k) = k mod m can be interpreted as a homomorphism from the ring Z of integers to the ring Z/mZ of integers modulo m. Homomorphisms preserve structure in the following sense: if we write + for integer addition, and for addition modulo m, h(a + b) = h(a) h(b) i.e., it doesn t matter whether we compute (a + b) mod m or compute (a mod m) and (b mod m) and add with modular Horner s Rule IV arithmetic: we get the same answer either way. Similarly, if we write for multiplication in Z, and for multiplication in Z/mZ, h(a b) = h(a) h(b) Horner s rule works precisely because h : Z Z/mZ is a homomorphism: h(((a b) c) d) = (((h(a) h(2 8 ) h(b)) h(2 8 ) h(c)) h(2 8 ) h(d)) Lecture 9: This can be optimized to use fewer applications of h, as above. In this form it is obvious why m = 2 8 is a horrible choice for a hash table size: 2 8 mod 2 8 = 0, so (((h(a) h(2 8 ) h(b)) h(2 8 ) h(c)) h(2 8 ) h(d)) = (((h(a) 0 h(b)) 0 h(c)) 0 h(d)) = h(d) i.e., the hash value depends only on the last byte. Similarly, if we used m = 2 16, we would have h(2 16 ) = 0, which would remove all but the last two bytes from the hash value computation. For background on algebra see, e.g., [1, 9, 7].
4 Collisions A collision occurs when two keys map to the same location in the hash table, i.e., there are distinct x, y M such that h(x) = h(y). Strategies for handling collisions: 1. Pick a value of m large enough so that collisions are rare, and can be easily dealt with e.g., by maintaining a short overflow list of items whose hash slot is already occupied. 2. Pick the hash function h to avoid collisions. 3. Put another data structure in each hash table slot (a list, tree, or another hash table); 4. If a hash slot is full then try some other slots in some fixed sequence (open addressing). Lecture 9: Collision Strategy 1: Pick m big I Let s see how big m must be for the probability of collisions to be small. Two cases: n > m: then there must be a collision, by the pigeonhole principle. 1 n m: may or may not be a collision. The birthday problem : what is the probability that amongst n people, at least two share the same birthday? This is a hashing problem: people are keys, days of the year are slots, and h maps people to their birthdays. If n 23, then the probability of two people having the same birthday is > 1 2. (Counterintuitive, but true.) The birthday problem analysis is straightforward to adapt to hashing. Lecture 9:
5 Collision Strategy 1: Pick m big II Suppose the hash function h and the distribution of keys cooperate to produce a uniform distribution of keys into hash table slots. Recall that with a uniform distribution, probability may be computed by simple counting: Lecture 9: Pr(event E happens) = # outcomes in which E happens # outcomes First we count the number of hash functions without collisions: There are m choices of where to put the first key; m 1 choices of where to put the second key;... m n + 1 choices of where to put the n th key. The number of hash functions with no collisions is m n = m (m 1) (m n + 1) = m! (m n)!. (Note2.) Next we count the number of hash functions allowing collisions: Collision Strategy 1: Pick m big III There are m choices of where to put the first key; m choices of where to put the second key;... m choices of where to put the n th key. The number of hash functions allowing collisions is m n. The probability of a collision-free arrangement is Lecture 9: p = m! (m n)! m n Asymptotic estimate of ln p, assume m n: ln p n2 2m + n ( ) n 3 2m + O m 2 (1) Here we have used Stirling s approximation and ln(m n) = ln m n O n. 2 m m 2 Two cases: If n 2 m then ln p 0. If n 2 m then ln p.
6 Collision Strategy 1: Pick m big IV Recall that if ln p = x + ɛ then Lecture 9: p = e x+ɛ = e x e ɛ = e x ( 1 + ɛ + ɛ 2 + ) Taylor series = e x (1 + O(ɛ)) if ɛ o(1) Probability of a collision-free arrangement is Interpretation: p e n(n 1) 2m + O (n 3 e n(n 1) 2m m 2 ) Collision Strategy 1: Pick m big V Lecture 9: If m ω(n 2 ) there are no collisions (almost surely). If m o(n 2 ) there is a collision (almost surely). i.e., if we want a low probability of collisions, our hash table has to be quadratic (or more) in the number of items. 1 If m + 1 pigeons are placed in m pigeonholes, there must be two pigeons in the same hole. (Replace pigeons with keys, and pigeonholes with hash slots. ) 2 The handy notation m m is called a falling power [8].
7 Threshold functions m = 1 2 n2 is an example of a threshold function: the threshold, asymptotic probability of event is 0 the threshold, asymptotic probability of event is 1. Lecture 9: Prob. of no collision 1 0 n n 2 ɛ n 2 n 2+ɛ Hash table size (m) n 3 Collision Strategy 1: pick m big Lecture 9: Picking m big is not an effective strategy for handling collisions. For n = 1000 elements, this table shows how big m must be to achieve the desired probability of no collisions: p m
8 Collision Strategy 1: pick m big The analysis of collisions in hashing demonstrates two pigeonhole principles. The simplest pigeonhole principle states that if you put m + 1 pigeons in m holes, there must be one hole with 2 pigeons. With respect to hash tables, the pigeonhole applies as follows: If a hash table with m slots is used to store m + 1 elements, there is a collision. The probability-of-collision analysis of the previous slide demonstrates a probabilistic pigeonhole principle: if you put ω( n) pigeons in n holes, there is a hole with 2 pigeons almost surely (i.e., with probability converging to 1 as n.) Lecture 9: Collision Strategy 2: pick h carefully I Can we pick our hash function h to avoid collisions? For example, if we use hash functions of the form h(k) = m{kφ} Lecture 9: we could try random values of φ (0, 1) until we found one that was collision-free. We have a probability of success p e n 2m(m 1) (1 + o(1)) Geometric distribution: Probability of success p, probability of failure 1 p Each trial independent, identically distributed. Probability that k tries are needed for success = (1 p) k 1 p Mean: p 1
9 Collision Strategy 2: pick h carefully II Number of values of φ we expect to try before we find a collision-free hash table for n = 1000: m # Expected failures before success Lecture 9: Picking hash functions randomly in this manner is unlikely to be practical. There are better strategies: see [6, 2]. Collision Strategy 3: secondary data structures I By far the most common technique for handling collisions is to put a secondary data structure in each hash table slot: A linked list ( chaining ) A binary search tree (BSTs) Another hash table Let α = n m be the load factor: the average number of items per hash table slot. Assuming uniform distribution of keys into slots: Linked lists require 1 + α steps (on average) to find a key; Suitable BSTs require 1 + max(c log α, 0) steps (on average). 3 Using secondary hash tables of size quadratic in the number of elements in the slot, one can achieve O(1) lookups on average, and require only Θ(n) space. Lecture 9:
10 Collision Strategy 3: secondary data structures II Analysis of secondary hash tables: Let N i be a random variable indicating the number of items landing in slot j. E[N i ] = α Var[N i ] = n «m m {z } Bernoulli variance Space required for secondary hash tables is proportional to 2 3 E 4 X 5 = X 1 i m N 2 i 1 i m = m E[N 2 i ] = n 1 m X 1 i m 1 1 m Var[N i ] + α 2 ««+ n2 m 2 Lecture 9: n2 m + n n m Plus space Θ(m) for the primary hash table = Θ(m + n2 + n). Choosing m = Θ(n) yields linear space. m 3 The max( ) deals with the possibility that α < 1, in which case log α < 0. Collision Strategy 4: open addressing I Open addressing is a family of techniques for resolving collisions that do not require secondary data structures. This has the advantage of not requiring any dynamic memory allocation. In the simplest scenario we have a function s : H H that is ideally a permutation of the hash values, for example the linear probing function Lecture 9: s(x) = (x + 1) mod m When we attempt to insert a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until an empty slot is found. To find a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until either k or an empty slot is found.
11 Collision Strategy 4: open addressing II However, the use of permutations performs badly as the hash table becomes fuller: tend to get clumps/clusters, i.e., long sequences h(k), s(h(k), s(s(h(k))),... where all the slots are occupied (see e.g. [10]). Performance can be good for not very full tables, e.g. α < 2 3. As α 1 operations begin to take Θ( n) time [5]. Quadratic probing offers less clumping: try slots h 0 (k), h 1 (k), where Lecture 9: h i (k) = (h(k) + i 2 ) mod m h(k) is an initial fixed hash function. If m prime, the sequence h i (k) will visit every slot. Double hashing uses two hash functions, h 1 and h 2 : h i (k) = (h 1 (k) + i h 2 (k)) mod m h 1 (k) gives an initial slot to try; h 2 (k) gives a stride (reduces to linear probing when h 2 (k) = 1.) Collision Strategy 4: open addressing III Lecture 9: Under favourable conditions, an open addressing scheme behaves like a geometric distribution when searching for an open slot: the probability of finding an empty slot is 1 α, so the expected number of trials is 1 1 α. Note the catastrophe when α 1.
12 Summary of collision strategies Lecture 9: Strategy E[access time] Space Choose m big O(1) Ω(n 2 ) Linked List 1 + α O(n + m) Binary Search Tree 1 + max(c log α, 0) O(n + m) Secondary Hash Tables O(1) O(n + m) 1 Open addressing 1 α O(m) Open addressing can be quite effective if α 1, but fails catastrophically as α 1. Summary of collision strategies If unexpectedly n m (e.g. we have far more data than we designed for), then α. For example, if m O(1) and n ω(1): Linked list has O(n) accesses; BSTs have O(log n) accesses offer a gentler failure mode. If hash function is badly nonuniform: Linked list can be O(n); BST will have O(log n); Secondary hash tables may require O(n 2 ) space. To summarize: hash table + BST will give fast search times, and let you sleep at night. To maintain O(1) access times as n, it is necessary to maintain m n. This can be done by choosing an allowable interval α [c 1, c 2 ]; when α > c 2 resize the hash table to make α = c 1. So long as c 2 > c 1, this strategy adds O(1) amortized time per insertion, as in dynamic arrays. Lecture 9:
13 Applications of hashing I Lecture 9: is a ubiquitous concept, used not just for maintaining collections but also for cryptography combinatorics data mining computational geometry databases router traffic analysis An example: probabilistic counting Probabilistic Counting I Lecture 9: Problem: estimate the number of unique elements in a LARGE collection (e.g., a database, a data stream) without requiring much working space Useful for query optimization in databases [11]: e.g. to evaluate A B C can do either A (B C) or (A B) C one of these might be very fast, one very slow. have rough estimates of B C vs A B to decide which strategy will be faster.
14 Probabilistic Counting I Less serious (but more readily understood) example: Shakespeare s complete works: N=884,647 words (or so) n=28,239 unique words (or so) w = average word length N max n = prior estimate on n Problem: estimate n the number of unique words used. Approaches: 1. Sorting: Put all 884,647 words in a list and sort, then count. (Time O(Nw log N), space O(Nw)) 2. Trie: Scan through the words and build a trie, with counters at each node; requires O(nw) space (neglecting size of counters.) 3. Super-LogLog Probabilistic Counting [3]: Use 128 bytes of space, obtain estimate of words (error 9.4%). Lecture 9: Probabilistic Counting I Inputs: a multiset A of elements, possibly with many duplicates (e.g., Shakespeare s plays) Problem: estimate card(a): the number of unique elements in A (e.g., number of distinct words Shakespeare used) Simple starting idea: hash the objects into an m-element hash table. Instead of storing keys, just count the number of elements landing in each hash slot. Extreme cases to illustrate the principle: Elements of A are all different: will get an even distribution in the hash table. Elements of A are all the same: will get one hash table slot with all the elements! The shape of the hash table distribution reflects the frequency of duplicates. Lecture 9:
15 Probabilistic Counting Lecture 9: Linear Counting [11] Compute hash values in the range [0, N max ) Maintain a bitmap representing which elements of the hash table would be occupied, and estimate n from the sparsity of the hash table. Uses Θ(Nmax ) bits, e.g., on the order of card(a) bits. Room for improvement: the precise sparsity pattern doesn t matter: just the number of full vs. empty slots. Probabilistic Counting I Probabilistic Counting [4] Compute hash values in the range [0, Nmax ) Instead of counting hash values directly, count the occurrence of hash values matching certain patterns: Lecture 9: Pattern xxxxxxx1 xxxxxx10 xxxxx100 xxxx1000. Expected occurrences 2 1 card(a) 2 2 card(a) 2 3 card(a) 2 4 card(a). Use these counts to estimate card(a). To improve accuracy, use m different hash functions. Uses Θ(m log N max ) storage, and delivers accuracy of O(m 1/2 )
16 Probabilistic Counting Super-LogLog [3] requires Θ(log log N max ) bits. With 1.28kb of memory can estimate card(a) to within accuracy of 2.5% for N max 130 million. Probabilistic counters: count to N using log log N bits: Lecture 9: Need log N states, which can be encoded in log log N bits. I [1] Stanley Burris and H. P. Sankappanavar. A Course in Universal Algebra. Springer-Verlag, bib pdf [2] Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, and Friedhelm MeyerAuf Der. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4): , bib Lecture 9: [3] Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, ESA, volume 2832 of Lecture Notes in Computer Science, pages Springer, bib pdf
17 II [4] Philippe Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2): , September bib pdf Lecture 9: [5] Philippe Flajolet, Patricio V. Poblete, and Alfredo Viola. On the analysis of linear probing hashing. Algorithmica, 22(4): , bib pdf [6] Michael L. Fredman and Janos Komlos an Endre Szemeredi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3): , bib III [7] Joseph A. Gallian. Contemporary Abstract Algebra. D. C. Heath and Company, Toronto, 3rd edition, bib Lecture 9: [8] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, Reading, MA, USA, second edition, bib [9] Saunders MacLane and Garrett Birkhoff. Algebra. Chelsea Publishing Co., New York, third edition, bib
18 IV [10] Robert Sedgewick and Philippe Flajolet. An introduction to the analysis of algorithms. Addison-Wesley Publishing Company, Reading, MA-Menlo Park-New York-Don Mills, Ontario-Wokingham, England-Amsterdam-Bonn- Sydney-Singapore-Tokyo-Madrid-San Juan-Milan-Paris, bib Lecture 9: [11] Kyu-Young Whang, Brad T. Vander-Zanden, and Howard M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2): , bib pdf
Introduction to Hash Tables
Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In
More informationHashing. Martin Babka. January 12, 2011
Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationCSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30
CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, 2019 1 / 30 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30 Open Addressing Open Addressing. Each entry in
More informationFundamental Algorithms
Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationHash tables. Hash tables
Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X
More information12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]
Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter A, you write 3,004,572,688. B is 28,731,569½.
More informationLecture: Analysis of Algorithms (CS )
Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect
More informationHash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a
Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).
More informationInsert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park
1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution
More informationSo far we have implemented the search for a key by carefully choosing split-elements.
7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationA General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY
A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)
More informationMotivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis
Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated
More informationlsb(x) = Least Significant Bit?
lsb(x) = Least Significant Bit? w-1 lsb i 1 0 x 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 1 msb(x) in O(1) steps using 5 multiplications [M.L. Fredman, D.E. Willard, Surpassing the information-theoretic bound with
More informationAdvanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationLecture Lecture 3 Tuesday Sep 09, 2014
CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor
More informationECE750-TXB Lecture 8: Treaps, Tries, and. Hash Tables
, and, and Hash Electrical & Computer Engineering University of Waterloo Canada February 1, 2007 Recall that a binary search tree has keys drawn from a totally ordered structure K, An inorder traversal
More informationHow Philippe Flipped Coins to Count Data
1/18 How Philippe Flipped Coins to Count Data Jérémie Lumbroso LIP6 / INRIA Rocquencourt December 16th, 2011 0. DATA STREAMING ALGORITHMS Stream: a (very large) sequence S over (also very large) domain
More informationSearching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9
Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More informationCS 591, Lecture 6 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More information1 Hashing. 1.1 Perfect Hashing
1 Hashing Hashing is covered by undergraduate courses like Algo I. However, there is much more to say on this topic. Here, we focus on two selected topics: perfect hashing and cockoo hashing. In general,
More informationHashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille
Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing
More informationHashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing
Philip Bille Dictionaries Dictionary problem. Maintain a set S U = {,..., u-} supporting lookup(x): return true if x S and false otherwise. insert(x): set S = S {x} delete(x): set S = S - {x} Dictionaries
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationLecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.
CSE533: Information Theory in Computer Science September 8, 010 Lecturer: Anup Rao Lecture 6 Scribe: Lukas Svec 1 A lower bound for perfect hash functions Today we shall use graph entropy to improve the
More informationSymbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.
CS 5633 -- Spring 25 Symbol-table problem Hashing Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 5633 Analysis of Algorithms 1 Symbol table holding n records: record
More informationHashing, Hash Functions. Lecture 7
Hashing, Hash Functions Lecture 7 Symbol-table problem Symbol table T holding n records: x record key[x] Other fields containing satellite data Operations on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationHash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1
Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of
More informationCS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =
CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)
More informationAnalysis of Algorithms I: Perfect Hashing
Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the
More informationCOMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from
COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard
More informationHashing Data Structures. Ananda Gunawardena
Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time
More informationQuiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)
Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We
More informationRandomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th
CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationCSE 502 Class 11 Part 2
CSE 502 Class 11 Part 2 Jeremy Buhler Steve Cole February 17 2015 Today: analysis of hashing 1 Constraints of Double Hashing How does using OA w/double hashing constrain our hash function design? Need
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most
More informationA Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple
A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures
More informationSearching, mainly via Hash tables
Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing
More informationData Structures and Algorithm. Xiaoqing Zheng
Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn Dictionary problem Dictionary T holding n records: x records key[x] Other fields containing satellite data Operations on T: INSERT(T, x)
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending
More informationCuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions
1 / 29 Cuckoo Hashing with a Stash: Alternative Analysis, Simple Hash Functions Martin Aumüller, Martin Dietzfelbinger Technische Universität Ilmenau 2 / 29 Cuckoo Hashing Maintain a dynamic dictionary
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More information4.5 Applications of Congruences
4.5 Applications of Congruences 287 66. Find all solutions of the congruence x 2 16 (mod 105). [Hint: Find the solutions of this congruence modulo 3, modulo 5, and modulo 7, and then use the Chinese remainder
More informationAlgorithms lecture notes 1. Hashing, and Universal Hash functions
Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.
More informationALGEBRA AND ALGEBRAIC COMPUTING ELEMENTS OF. John D. Lipson. Addison-Wesley Publishing Company, Inc.
ELEMENTS OF ALGEBRA AND ALGEBRAIC COMPUTING John D. Lipson University of Toronto PRO Addison-Wesley Publishing Company, Inc. Redwood City, California Menlo Park, California Reading, Massachusetts Amsterdam
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?
More informationHashing. Why Hashing? Applications of Hashing
12 Hashing Why Hashing? Hashing A Search algorithm is fast enough if its time performance is O(log 2 n) For 1 5 elements, it requires approx 17 operations But, such speed may not be applicable in real-world
More informationN/4 + N/2 + N = 2N 2.
CS61B Summer 2006 Instructor: Erin Korber Lecture 24, 7 Aug. 1 Amortized Analysis For some of the data structures we ve discussed (namely hash tables and splay trees), it was claimed that the average time
More informationIntroduction to Hashtables
Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example
More informationProblem Set 4 Solutions
Introduction to Algorithms October 8, 2001 Massachusetts Institute of Technology 6.046J/18.410J Singapore-MIT Alliance SMA5503 Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson Handout 18
More informationBloom Filters, general theory and variants
Bloom Filters: general theory and variants G. Caravagna caravagn@cli.di.unipi.it Information Retrieval Wherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More informationCS483 Design and Analysis of Algorithms
CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments
More informationChapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn
Chapter 7 Randomization Algorithm Theory WS 2017/18 Fabian Kuhn Randomization Randomized Algorithm: An algorithm that uses (or can use) random coin flips in order to make decisions We will see: randomization
More informationAsymptotic Analysis. Slides by Carl Kingsford. Jan. 27, AD Chapter 2
Asymptotic Analysis Slides by Carl Kingsford Jan. 27, 2014 AD Chapter 2 Independent Set Definition (Independent Set). Given a graph G = (V, E) an independent set is a set S V if no two nodes in S are joined
More information? 11.5 Perfect hashing. Exercises
11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate
More informationCollision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST
Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among
More informationCS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing
CS369N: Beyond Worst-Case Analysis Lecture #6: Pseudorandom Data and Universal Hashing Tim Roughgarden April 4, 204 otivation: Linear Probing and Universal Hashing This lecture discusses a very neat paper
More information6.1 Occupancy Problem
15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.
More informationINTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class
Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261
More informationElectrical & Computer Engineering University of Waterloo Canada February 26, 2007
: Electrical & Computer Engineering University of Waterloo Canada February 26, 2007 We want to choose the best algorithm or data structure for the job. Need characterizations of resource use, e.g., time,
More informationLecture 2 September 4, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationLecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing
Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of
More informationAbstract Data Type (ADT) maintains a set of items, each with a key, subject to
Lecture Overview Dictionaries and Python Motivation Hash functions Chaining Simple uniform hashing Good hash functions Readings CLRS Chapter,, 3 Dictionary Problem Abstract Data Type (ADT) maintains a
More informationCSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13
CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic
More informationCSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1
CSE525: Randomized Algorithms and Probabilistic Analysis April 2, 2013 Lecture 1 Lecturer: Anna Karlin Scribe: Sonya Alexandrova and Eric Lei 1 Introduction The main theme of this class is randomized algorithms.
More informationModule 9: Tries and String Matching
Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School
More informationGrade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2
1 Faculty of Mathematics Waterloo, Ontario Centre for Education in Mathematics and Computing Grade 11/12 Math Circles Fall 2014 - Nov. 5 Recurrences, Part 2 Running time of algorithms In computer science,
More informationMining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationDivide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016
CS125 Lecture 4 Fall 2016 Divide and Conquer We have seen one general paradigm for finding algorithms: the greedy approach. We now consider another general paradigm, known as divide and conquer. We have
More informationCryptographic Hash Functions
Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction
More informationCosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks
1 Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks Michael Albert michael.albert@cs.otago.ac.nz 2 This week Arithmetic Knapsack cryptosystems Attacks on knapsacks Some
More informationChapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn
Chapter 6 Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Randomization Randomized Algorithm: An algorithm that uses (or can use) random coin flips in order to make decisions We will see: randomization
More informationLecture 7: More Arithmetic and Fun With Primes
IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 7: More Arithmetic and Fun With Primes David Mix Barrington and Alexis Maciel July
More informationLecture 6: Introducing Complexity
COMP26120: Algorithms and Imperative Programming Lecture 6: Introducing Complexity Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2015 16 You need this book: Make sure you use the up-to-date
More informationHashing. Data organization in main memory or disk
Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison
More informationCOMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)
COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x
More informationLecture 8 HASHING!!!!!
Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationFinding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract
Finding Succinct Ordered Minimal Perfect Hash Functions Steven S. Seiden 3 Daniel S. Hirschberg 3 September 22, 1994 Abstract An ordered minimal perfect hash table is one in which no collisions occur among
More informationRandomized Algorithms, Spring 2014: Project 2
Randomized Algorithms, Spring 2014: Project 2 version 1 March 6, 2014 This project has both theoretical and practical aspects. The subproblems outlines a possible approach. If you follow the suggested
More informationOn the average-case complexity of Shellsort
Received: 16 February 2015 Revised: 24 November 2016 Accepted: 1 February 2017 DOI: 10.1002/rsa.20737 RESEARCH ARTICLE On the average-case complexity of Shellsort Paul Vitányi 1,2 1 CWI, Science Park 123,
More informationLecture 11: Hash Functions, Merkle-Damgaard, Random Oracle
CS 7880 Graduate Cryptography October 20, 2015 Lecture 11: Hash Functions, Merkle-Damgaard, Random Oracle Lecturer: Daniel Wichs Scribe: Tanay Mehta 1 Topics Covered Review Collision-Resistant Hash Functions
More informationData Structures and Algorithm. Xiaoqing Zheng
Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn MULTIPOP top[s] = 6 top[s] = 2 3 2 8 5 6 5 S MULTIPOP(S, x). while not STACK-EMPTY(S) and k 0 2. do POP(S) 3. k k MULTIPOP(S, 4) Analysis
More informationUNIFORM HASHING IN CONSTANT TIME AND OPTIMAL SPACE
UNIFORM HASHING IN CONSTANT TIME AND OPTIMAL SPACE ANNA PAGH AND RASMUS PAGH Abstract. Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e.,
More informationLecture 1: Asymptotics, Recurrences, Elementary Sorting
Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running
More informationdata structures and algorithms lecture 2
data structures and algorithms 2018 09 06 lecture 2 recall: insertion sort Algorithm insertionsort(a, n): for j := 2 to n do key := A[j] i := j 1 while i 1 and A[i] > key do A[i + 1] := A[i] i := i 1 A[i
More information