CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

Size: px
Start display at page:

Download "CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30"

Transcription

1 CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, / 30

2 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30

3 Open Addressing Open Addressing. Each entry in the hash table stores a fixed number c of elements. This has the immediate implication that we only use it when cn < m. We will for simplicity assume this capacity c is 1. We will talk about expanding the table later... Q. How can we insert a new element if we get a collision? A. Find a new location to store the new element. We need to know where we put it as well. Search a well-defined sequence of other locations in the hash table until we find one that s not full. This sequence is called a probe sequence. 3 / 30

4 Probe Sequences We will look at each of the following methods for generating a probe sequence. linear probing: try A[(h(k) + i) mod m], i = 0, 1, 2,... quadratic probing: try A[(h(k) + c 1 i + c 2 i 2 ) mod m] double hashing: try A[(h(k) + i h (k)) mod m] where h is another hash function Linear Probing: The easiest open addressing strategy is linear-probing. For a hash table of size m, key k and hash function h(k), the probe sequence is calculated as: s i = (h(k) + i) mod m for i = 0, 1, 2,.... Q. What is the value of s 0, (the home location for the item)? A. h(k) 4 / 30

5 Linear Probing Q. What is the problem with linear probing? A. Clustering. Q: What happens when we hash to something within a group of filled locations? we have to probe the whole group until we reach an empty slot. we increase the size of the cluster. Resulting in two keys that didn t necessarily share the same home" location ending up with almost identical probe sequences. 5 / 30

6 Non-Linear Probing Non-linear probing includes schemes where the probe sequence does not involve steps of fixed size. Example. Quadratic probing where the probe sequence is calculated as: s i = (h(k) + c 1 i + c 2 i 2 ) mod m for i = 0, 1, 2,.... Q. Now what problem may occur? A. probe sequences will still be identical for elements that hash to the same home location. 6 / 30

7 Double Hashing In double hashing we use a different hash function h 2 (k) to calculate the step size. The probe sequence is: s i = (h(k) + i h 2 (k)) mod m for i = 0, 1, 2,.... Note that h 2 (k) should not be 0 for any k. Also, we want to choose h 2 so that, if h(k 1 ) = h(k 2 ) for two keys k 1, k 2, it won t be the case that h 2 (k 1 ) = h 2 (k 2 ). That is, the two hash functions don t cause collisions on the same pairs of keys. 7 / 30

8 Analysis of Open Addressing: Notice that in open addressing, INSERT and SEARCH take the same amount of work. Let s consider the complexity of INSERT for a key k: It s not hard to come up with worst-case situations where the above types of open addressing require Θ(n) time for INSERT. To simplify the analysis of the average case, we make some assumptions: the hash table has m locations the hash table contains n elements and we want to insert a new key k. consider a random probe sequence for k, that is, it s probe sequence is equally likely to be any permutation of (0, 1,..., m 1). 8 / 30

9 Average Insert Time under Open Addressing Let T denote the number of probes performed in the INSERT. Q. Then the average case time for insert is the expected time E(T). What is E(T)? A. m 1 E(T) = i Pr(T = i) i=0 Q. What is P(T = i)? A. This is hard. Find another way to express T = i... P(T = i) = P(T i) P(T i + 1) Let A i denote the event that every location up until the i-th probe is occupied. Then, T i iff A 1, A 2,..., A i 1 all occur, so Pr(T i) = Pr(A 1 A 2 A i 1 ) = Pr(A 1 ) Pr(A 2 A 1 ) Pr(A 3 A 1 A 2 ) Pr(A i 1 A 1 A i 2 ) 9 / 30

10 E(T) So far we have: A i denotes the event that every location up until the i-th probe is occupied. Then, T i iff A 1, A 2,..., A i 1 all occur, so Pr(T i) = Pr(A 1 ) Pr(A 2 A 1 ) Pr(A 3 A 1 A 2 ) Pr(A i 1 A 1 A i 2 ) Q. What is Pr(A j A 1 A j 1 )? A. Intuition. We need the number of elements that we have not seen so far over the number of slots we have not seen so far. For j 1, Pr(A j A 1 A j 1 ) = (n (j 1))/(m (j 1)), Pr(T i) = n/m (n 1)/(m 1) (n (i 2))/(m (i 2)) (n/m) i 1 a i 1 10 / 30

11 Average Case Complexity Now we can calculate the expected value of T, or the average-case complexity of INSERT. Changes shown in green... E(T) = m 1 i Pr(T = i) i=0 i Pr(T = i) i=1 i(pr(t i) Pr(T i + 1)) i=1 i Pr(T i) i Pr(T i + 1) i=1 i=1 i Pr(T i) ( (i+1) Pr(T i + 1) 1 P(T i + 1) ) i=1 i=1 i Pr(T i) i Pr(T i) + P(T i) i=1 i=2 Pr(T 1) + i Pr(T i) i Pr(T i) + P(T i) i=2 i=2 i=2 i=2 11 / 30

12 Average Case Complexity E(T) Pr(T 1) + (i) Pr(T i) (i) Pr(T i) + P(T i) i=2 Pr(T 1) + P(T i) Pr(T i) i=1 i=1 i=0 a i 1 a i 1 1 a. i=2 i=2 From previous slides Note: a < 1 since n < m bigger the load factor the longer it takes to insert something. This is what we expect, intuitively. i=2 12 / 30

13 Remove, under Open Addressing Under open addressing, two approaches for REMOVE: Find an existing key to fill the hole. Tricky for probing, impossible for double hashing. Mark the cell as deactivated. (Do not mark as free!) Each cell has 3 possibilities: Free: can insert here, can stop searching here. Deactivated: can insert here, cannot stop searching here. Stores a key. Accumulates junk, slows down all operations. Remove is problematic under open addressing. 13 / 30

14 Hash Functions: Division Method Assume each key is an integer. h(k) = k mod m Simple but susceptible to regular patterns in keys more collisions. Q. How can we improve this? A. Using prime numbers for m, the length of the array, reduces this problem. 14 / 30

15 Hash Function: Multiplication Method In theory: Pick real constant A with 0 < A < 1. h(k) = m fraction(k A) In practice: Assume each key is a w-bit natural number. Define A by picking w-bit constant s with 0 < s < 2 w and letting A = s/2 w. Use m = 2 p, for some 0 p < w h(k) = m fraction(k A) = 2 p fraction(k s/2 w ) (k s) mod 2w = 2 p 2 w (k s) mod 2 w = 2 w p What does (k s) mod 2 w return? The lower w bits of (k s) What does dividing by 2 w p do? Returns the upper p bits of the lower w bits of (k s). 15 / 30

16 Multiplication Method in Practice (k s) mod 2 w h(k) = m fraction(k A) = 2 w p To compute the hash function h(k) we can simply: 1. Obtain k s as a 2w-bit integer 2. Retain the lower w bits of k s 3. Retain the upper p bits of the result of part 2 Summary. h(k) = ((k s) mod 2 w ) >> (w p) where >> is the shift operator. Want A to be irrational - often use golden ratio for A and work backwards to define s. 16 / 30

17 Hash Function: Polynomial Hash When each key consists of multiple machine words (e.g., a string). Pick constant a, not equal to zero or one. If your key is internally the machine words x 0,..., x k 1 : h( x 0,..., x k 1 ) = (x 0 a k 1 + x 1 a k x k 2 a + x k 1 ) mod m Compute by: c = 0 (some people use non-zero constant here) for i in 0..k 1 c = c a + x i c = c mod m In practice people use xor instead of +. Every bit contributes. Order contributes too. 17 / 30

18 Hash Function: FNV-1 (Fowler - Noll - Vo) FNV-1 is a family of hash functions, one for each word size, e.g., there is one for 32 bits, there is one for 64 bits, etc. They work by chopping your key into 8-bit words. 32-bit FNV-1: hash = This is called the FNV - offset - basis for i in 0..k-1: # for each byte of data hash = hash 0x This is an FNV prime number in hex hash = hash XOR byte i Do your own hash mod m afterwards as needed (not part of FNV-1). Links: Wikipedia article, Noll s FNV page. On the down side: Problems with Hash Tables. 18 / 30

19 Hash Function: FNV-1a FNV-1a is like FNV-1 but with xor before multiply. 32-bit FNV-1a: hash = for i in 0..k-1: # for each byte of data hash = hash XOR byte i hash = hash 0x Recommended over FNV-1 for being more random and uniform. 19 / 30

20 Problems with Hashing When the set S of keys is unknown, we can no longer assume a uniform distribution. Further, regular patterns can be found, making any deterministic hashing scheme vulnerable to malicious slowdowns. Links: ocert advisory # , LWN article Q. What might be a solution? A. Create a family of hash functions and select one at random. Called universal hashing. 20 / 30

21 Universal Hashing Definition. A family H of of hash functions is universal iff: For any two keys j and k, with j k and table of size m, at most H /m functions satisfy h(j) = h(k) i.e., randomly pick h from H with uniform probability, then Pr(h(j) = h(k)) 1/m why? Can think of this as being equivalent" to a hash function that maps keys to hash codes randomly Note. Given a set of keys, we randomly select a hash function h() from H and use this function for every key in our set. 21 / 30

22 Universal Hashing Expected Number of Collisions Q. What would you hope the expected number of collisions is? A. n keys hashed to a table of size m, nicely spread out... O(n/m). Proof. Let S be a set of n keys. Let j be a key not in S. Randomly pick h from H. Q. How many keys in S does j collide with? A. Let random variable C be the number of such collisions. For each k S: Let indicator random variable X k be 1 when j collides with k. E(C) = E X k = E(X k ) k S k S j = Pr(h(j) = h(k)) k S j 1/m = (n 1)/m < n/m k S j 22 / 30

23 A Universal Family Find a prime p large enough such that m < p and every key k (assume integer) satisfies 0 k < p. Define: Universal Family: f a,b (k) = (a k + b) mod p h a,b (k) = f a,b (k) mod m = ((a k + b) mod p) mod m H = {h a,b : (0 < a < p) (0 b < p)} Randomly pick a from 1 to p 1, pick b from 0 to p 1. Q. How many choices for a and b? A. (p 1) p choices. 23 / 30

24 Proving H is a universal family In order to prove that H is a universal family, we need to show: For keys k, j st. k j, Pr(h(j) = h(k)) 1/m. I.e.,Pr(h(j) = h(k)) number of (a,b) collisions num (a,b) pairs 1/m. Q. How many (a, b) pairs are there in total? A. We said p(p 1). 24 / 30

25 Overview Show that f a,b has no collisions. Show that for keys j, k with j k. For every (r, s) with 0 r < p, 0 s < p, r s: there exists unique (a, b) such that f a,b (j) = (aj + b) mod p = r and f a,b (k) = (ak + b) mod p = s. This means there is a 1-1 correspondence between (a, b) and (r, s). Since 1-1 correspondence between (a, b) and (r, s), to count the number of a and b pairs that cause a collision, i.e., h a,b (j) = h a,b (k) for h k is less than O(1/m) we can count the number of r and s pairs such that r s mod m. 25 / 30

26 No Collisions from f a,b Recall: Prime p is large enough such that m < p and every key k (assume integer) satisfies 0 k < p. f a,b (k) = (a k + b) mod p h a,b (k) = f a,b (k) mod m = ((a k + b) mod p) mod m Claim. Let j and k be different keys, then f a,b (j) f a,b (k). Proof. Assume otherwise. Without loss of generality, assume k < j. Suppose f a,b (j) = f a,b (k), (a j + b) (a k + b) is a multiple of p a (j k) is a multiple of p a or j k is a multiple of p because p is prime But 0 < a < p and 0 < j k < p. Neither can be a multiple of p Contradiction. 26 / 30

27 A One-One Correspondence Claim. Given keys j, k with j k. For every (r, s) with 0 r < p, 0 s < p, r s: there exists unique (a, b) such that f a,b (j) = (aj + b) mod p = r and f a,b (k) = (ak + b) mod p = s. In other words, there is a one-one correspondence between (a, b) s and (r, s) s. Proof. Left as an exercise - or see Ch Now, we count how many (a, b) s cause h a,b (j) = h a,b (k) (collisions). Henceforth we will count how many (r, s) s cause r mod m = s mod m but r s. 27 / 30

28 How Many (a, b) s Cause h a,b Collisions We want to prove (given keys j, k with j k) and random a, b, the number of collisions: {(a, b) : h a,b (j) = h a,b (k)} p(p 1)/m = H /m In other words, H is a universal family of hash functions. {(a, b) : h a,b (j) = h a,b (k)} = {(a, b) : f a,b (j) mod m = f a,b (k) mod m} = {(r, s) : r s r mod m = s mod m} p 1 = {s : r s r mod m = s mod m} r=0 p 1 = {s : r mod m = s mod m} 1 r=0 p 1 p/m 1 r=0 28 / 30

29 How Many (a, b) s Cause h a,b Collisions p 1 p/m 1 r=0 Q. What is the largest value that p/m can be? A. Notice that p/m {p/m, (p + 1)/m,..., (p + m 1)/m} p 1 (p + m 1)/m m/m r=0 p 1 = (p 1)/m r=0 = p(p 1)/m = H m 29 / 30

30 Rehashing Need to enlarge the array when enough keys are inserted. Think dynamic array... Changing the table size m changing the hash function Move every key! Choose a new m, approx twice the old one. Allocate new array. For every key: re-compute hash value, store in new array. Want twice the array for the good amortized cost. Can you determine the amortized cost? See also Problems with Hash Tables again. It seems many implementations do it wrong: They add rather than multiply. 30 / 30

Lecture: Analysis of Algorithms (CS )

Lecture: Analysis of Algorithms (CS ) Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Algorithms lecture notes 1. Hashing, and Universal Hash functions Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.

More information

Introduction to Hashtables

Introduction to Hashtables Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example

More information

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated

More information

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record. CS 5633 -- Spring 25 Symbol-table problem Hashing Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 5633 Analysis of Algorithms 1 Symbol table holding n records: record

More information

Hashing, Hash Functions. Lecture 7

Hashing, Hash Functions. Lecture 7 Hashing, Hash Functions Lecture 7 Symbol-table problem Symbol table T holding n records: x record key[x] Other fields containing satellite data Operations on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search

More information

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9 Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash

More information

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park 1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution

More information

CSE 502 Class 11 Part 2

CSE 502 Class 11 Part 2 CSE 502 Class 11 Part 2 Jeremy Buhler Steve Cole February 17 2015 Today: analysis of hashing 1 Constraints of Double Hashing How does using OA w/double hashing constrain our hash function design? Need

More information

Analysis of Algorithms I: Perfect Hashing

Analysis of Algorithms I: Perfect Hashing Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the

More information

Hash tables. Hash tables

Hash tables. Hash tables Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most

More information

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).

More information

Theoretical Cryptography, Lecture 13

Theoretical Cryptography, Lecture 13 Theoretical Cryptography, Lecture 13 Instructor: Manuel Blum Scribe: Ryan Williams March 1, 2006 1 Today Proof that Z p has a generator Overview of Integer Factoring Discrete Logarithm and Quadratic Residues

More information

Introduction to Hash Tables

Introduction to Hash Tables Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn Dictionary problem Dictionary T holding n records: x records key[x] Other fields containing satellite data Operations on T: INSERT(T, x)

More information

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory

More information

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) = CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

So far we have implemented the search for a key by carefully choosing split-elements.

So far we have implemented the search for a key by carefully choosing split-elements. 7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

Theoretical Cryptography, Lectures 18-20

Theoretical Cryptography, Lectures 18-20 Theoretical Cryptography, Lectures 18-20 Instructor: Manuel Blum Scribes: Ryan Williams and Yinmeng Zhang March 29, 2006 1 Content of the Lectures These lectures will cover how someone can prove in zero-knowledge

More information

Lecture Notes for Chapter 17: Amortized Analysis

Lecture Notes for Chapter 17: Amortized Analysis Lecture Notes for Chapter 17: Amortized Analysis Chapter 17 overview Amortized analysis Analyze a sequence of operations on a data structure. Goal: Show that although some individual operations may be

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002) COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x

More information

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13

CSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13 CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among

More information

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to Lecture Overview Dictionaries and Python Motivation Hash functions Chaining Simple uniform hashing Good hash functions Readings CLRS Chapter,, 3 Dictionary Problem Abstract Data Type (ADT) maintains a

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]

12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10] Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter A, you write 3,004,572,688. B is 28,731,569½.

More information

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science

Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science Round 5: Hashing Tommi Junttila Aalto University School of Science Department of Computer Science CS-A1140 Data Structures and Algorithms Autumn 017 Tommi Junttila (Aalto University) Round 5 CS-A1140 /

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Quick Sort Notes , Spring 2010

Quick Sort Notes , Spring 2010 Quick Sort Notes 18.310, Spring 2010 0.1 Randomized Median Finding In a previous lecture, we discussed the problem of finding the median of a list of m elements, or more generally the element of rank m.

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Lecture Lecture 3 Tuesday Sep 09, 2014

Lecture Lecture 3 Tuesday Sep 09, 2014 CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor

More information

Modular Arithmetic Instructor: Marizza Bailey Name:

Modular Arithmetic Instructor: Marizza Bailey Name: Modular Arithmetic Instructor: Marizza Bailey Name: 1. Introduction to Modular Arithmetic If someone asks you what day it is 145 days from now, what would you answer? Would you count 145 days, or find

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at

More information

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2 Contents 1 Recommended Reading 1 2 Public Key/Private Key Cryptography 1 2.1 Overview............................................. 1 2.2 RSA Algorithm.......................................... 2 3 A Number

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn MULTIPOP top[s] = 6 top[s] = 2 3 2 8 5 6 5 S MULTIPOP(S, x). while not STACK-EMPTY(S) and k 0 2. do POP(S) 3. k k MULTIPOP(S, 4) Analysis

More information

Lecture 3: Big-O and Big-Θ

Lecture 3: Big-O and Big-Θ Lecture 3: Big-O and Big-Θ COSC4: Algorithms and Data Structures Brendan McCane Department of Computer Science, University of Otago Landmark functions We saw that the amount of work done by Insertion Sort,

More information

Number theory (Chapter 4)

Number theory (Chapter 4) EECS 203 Spring 2016 Lecture 10 Page 1 of 8 Number theory (Chapter 4) Review Questions: 1. Does 5 1? Does 1 5? 2. Does (129+63) mod 10 = (129 mod 10)+(63 mod 10)? 3. Does (129+63) mod 10 = ((129 mod 10)+(63

More information

CS 161 Summer 2009 Homework #2 Sample Solutions

CS 161 Summer 2009 Homework #2 Sample Solutions CS 161 Summer 2009 Homework #2 Sample Solutions Regrade Policy: If you believe an error has been made in the grading of your homework, you may resubmit it for a regrade. If the error consists of more than

More information

PRIME NUMBERS YANKI LEKILI

PRIME NUMBERS YANKI LEKILI PRIME NUMBERS YANKI LEKILI We denote by N the set of natural numbers: 1,2,..., These are constructed using Peano axioms. We will not get into the philosophical questions related to this and simply assume

More information

Lecture 8 HASHING!!!!!

Lecture 8 HASHING!!!!! Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)

More information

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 5. 1 Review (Pairwise Independence and Derandomization) 6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can

More information

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture 1 Real Numbers In these lectures, we are going to study a branch of mathematics called

More information

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Dictionaries Dictionary: Maintain a dynamic set S. Every

More information

Today: Amortized Analysis

Today: Amortized Analysis Today: Amortized Analysis COSC 581, Algorithms March 6, 2014 Many of these slides are adapted from several online sources Today s class: Chapter 17 Reading Assignments Reading assignment for next class:

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

Lecture 7: More Arithmetic and Fun With Primes

Lecture 7: More Arithmetic and Fun With Primes IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 7: More Arithmetic and Fun With Primes David Mix Barrington and Alexis Maciel July

More information

? 11.5 Perfect hashing. Exercises

? 11.5 Perfect hashing. Exercises 11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

CS173 Lecture B, November 3, 2015

CS173 Lecture B, November 3, 2015 CS173 Lecture B, November 3, 2015 Tandy Warnow November 3, 2015 CS 173, Lecture B November 3, 2015 Tandy Warnow Announcements Examlet 7 is a take-home exam, and is due November 10, 11:05 AM, in class.

More information

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d CS 5321: Advanced Algorithms Amortized Analysis of Data Structures Ali Ebnenasir Department of Computer Science Michigan Technological University Motivations Why amortized analysis and when? Suppose you

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter 6, part 2 R. Paul Wiegand George Mason University, Department of Computer Science March 22, 2006 Outline 1 Balanced Trees 2 Heaps &

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 15 October 20, 2014 CPSC 467, Lecture 15 1/37 Common Hash Functions SHA-2 MD5 Birthday Attack on Hash Functions Constructing New

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016 CS125 Lecture 4 Fall 2016 Divide and Conquer We have seen one general paradigm for finding algorithms: the greedy approach. We now consider another general paradigm, known as divide and conquer. We have

More information

CSE 21 Practice Exam for Midterm 2 Fall 2017

CSE 21 Practice Exam for Midterm 2 Fall 2017 CSE 1 Practice Exam for Midterm Fall 017 These practice problems should help prepare you for the second midterm, which is on monday, November 11 This is longer than the actual exam will be, but good practice

More information

Proof by Contradiction

Proof by Contradiction Proof by Contradiction MAT231 Transition to Higher Mathematics Fall 2014 MAT231 (Transition to Higher Math) Proof by Contradiction Fall 2014 1 / 12 Outline 1 Proving Statements with Contradiction 2 Proving

More information

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n). 18.310 lecture notes April 22, 2015 Factoring Lecturer: Michel Goemans We ve seen that it s possible to efficiently check whether an integer n is prime or not. What about factoring a number? If this could

More information

CS 360, Winter Morphology of Proof: An introduction to rigorous proof techniques

CS 360, Winter Morphology of Proof: An introduction to rigorous proof techniques CS 30, Winter 2011 Morphology of Proof: An introduction to rigorous proof techniques 1 Methodology of Proof An example Deep down, all theorems are of the form If A then B, though they may be expressed

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it?

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it? Math 1302 Notes 2 We know that x 2 + 4 = 0 has How many solutions? What type of solution in the real number system? What kind of equation is it? What happens if we enlarge our current system? Remember

More information

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 02 Groups: Subgroups and homomorphism (Refer Slide Time: 00:13) We looked

More information

Hashing Data Structures. Ananda Gunawardena

Hashing Data Structures. Ananda Gunawardena Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time

More information

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

Computational complexity

Computational complexity COMS11700 Computational complexity Department of Computer Science, University of Bristol Bristol, UK 2 May 2014 COMS11700: Computational complexity Slide 1/23 Introduction If we can prove that a language

More information

Warm-up Quantifiers and the harmonic series Sets Second warmup Induction Bijections. Writing more proofs. Misha Lavrov

Warm-up Quantifiers and the harmonic series Sets Second warmup Induction Bijections. Writing more proofs. Misha Lavrov Writing more proofs Misha Lavrov ARML Practice 3/16/2014 and 3/23/2014 Warm-up Using the quantifier notation on the reference sheet, and making any further definitions you need to, write the following:

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

CS1800: Strong Induction. Professor Kevin Gold

CS1800: Strong Induction. Professor Kevin Gold CS1800: Strong Induction Professor Kevin Gold Mini-Primer/Refresher on Unrelated Topic: Limits This is meant to be a problem about reasoning about quantifiers, with a little practice of other skills, too

More information

Theory of Computation

Theory of Computation Theory of Computation Dr. Sarmad Abbasi Dr. Sarmad Abbasi () Theory of Computation 1 / 33 Lecture 20: Overview Incompressible strings Minimal Length Descriptions Descriptive Complexity Dr. Sarmad Abbasi

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

Algorithms: Lecture 2

Algorithms: Lecture 2 1 Algorithms: Lecture 2 Basic Structures: Sets, Functions, Sequences, and Sums Jinwoo Kim jwkim@jjay.cuny.edu 2.1 Sets 2 1 2.1 Sets 3 2.1 Sets 4 2 2.1 Sets 5 2.1 Sets 6 3 2.1 Sets 7 2.2 Set Operations

More information

Finding Divisors, Number of Divisors and Summation of Divisors. Jane Alam Jan

Finding Divisors, Number of Divisors and Summation of Divisors. Jane Alam Jan Finding Divisors, Number of Divisors and Summation of Divisors by Jane Alam Jan Finding Divisors If we observe the property of the numbers, we will find some interesting facts. Suppose we have a number

More information

ECS 120 Lesson 18 Decidable Problems, the Halting Problem

ECS 120 Lesson 18 Decidable Problems, the Halting Problem ECS 120 Lesson 18 Decidable Problems, the Halting Problem Oliver Kreylos Friday, May 11th, 2001 In the last lecture, we had a look at a problem that we claimed was not solvable by an algorithm the problem

More information

You separate binary numbers into columns in a similar fashion. 2 5 = 32

You separate binary numbers into columns in a similar fashion. 2 5 = 32 RSA Encryption 2 At the end of Part I of this article, we stated that RSA encryption works because it s impractical to factor n, which determines P 1 and P 2, which determines our private key, d, which

More information

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019 CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2019 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis

More information

Hashing. Data organization in main memory or disk

Hashing. Data organization in main memory or disk Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison

More information

Lecture 5: Latin Squares and Magic

Lecture 5: Latin Squares and Magic Latin Squares Instructor: Padraic Bartlett Lecture 5: Latin Squares and Magic Week Mathcamp 0 Today s application is to magic! Not the friendship kind though ; instead we re going to talk about magic squares

More information

CSE 312, 2017 Winter, W.L.Ruzzo. 5. independence [ ]

CSE 312, 2017 Winter, W.L.Ruzzo. 5. independence [ ] CSE 312, 2017 Winter, W.L.Ruzzo 5. independence [ ] independence Defn: Two events E and F are independent if P(EF) = P(E) P(F) If P(F)>0, this is equivalent to: P(E F) = P(E) (proof below) Otherwise, they

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

Problem Set 4 Solutions

Problem Set 4 Solutions Introduction to Algorithms October 8, 2001 Massachusetts Institute of Technology 6.046J/18.410J Singapore-MIT Alliance SMA5503 Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson Handout 18

More information

Graduate Analysis of Algorithms Dr. Haim Levkowitz

Graduate Analysis of Algorithms Dr. Haim Levkowitz UMass Lowell Computer Science 9.53 Graduate Analysis of Algorithms Dr. Haim Levkowitz Fall 27 Lecture 5 Tuesday, 2 Oct 27 Amortized Analysis Overview Amortize: To pay off a debt, usually by periodic payments

More information

How to prove it (or not) Gerry Leversha MA Conference, Royal Holloway April 2017

How to prove it (or not) Gerry Leversha MA Conference, Royal Holloway April 2017 How to prove it (or not) Gerry Leversha MA Conference, Royal Holloway April 2017 My favourite maxim It is better to solve one problem in five different ways than to solve five problems using the same method

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information