Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science

Size: px
Start display at page:

Download "Round 5: Hashing. Tommi Junttila. Aalto University School of Science Department of Computer Science"

Transcription

1 Round 5: Hashing Tommi Junttila Aalto University School of Science Department of Computer Science CS-A1140 Data Structures and Algorithms Autumn 017 Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

2 Material in the book Introduction to Algorithms, 3rd ed. (online via Aalto lib): Sections Similar materia elsewhere: Section 3.4 in Algorithms, 4th ed. an these slides (quadratic probing etc not in the book) hashing Chapter in the OpenDSA book External links: MIT OCW video on hashing with chaining MIT OCW video on open addressing and cryptographic hashing Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn 017 / 47

3 In many applications, dictionaries with INSERT, SEARCH, and DELETE operations are enough With hashing we can perform these in O(1) time on average worst-case time requirement can be Θ(n) but with good design this is extremely improbable finding smallest and largest elements takes Θ(n) time in the worst case Implementations: C++11 standard library unordered set and unordered map Java HashSet and HashMap Scala HashSet and HashMap... Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

4 Intro: small key universe, bit sets and direct-access tables Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

5 Bit sets Let us first assume that the number n of all possible keys is small, and the keys are, or can be easily mapped to, the integers 0,...,n 1 A set data structure on these keys is easy to implement as a bit set (aka bit array or bit vector ): Allocate an array a = a 0 a 1...n m 1 of m = n 8 bytes A key k {0,...,n 1} belongs to the set if and only if the bit k mod 8 is 1 in the byte a k/8 Example The array below stores a subset of the keys {0,1,...,999}. It includes the keys = 1, = 4, = 806, = 999 but not, for instance, the key = 805. a bit byte Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

6 It is easy to implement the operations INSERT, SEARCH, and DELETE so that they operate in constant time Bit sets are a very memory efficient way of representing dense sets (i.e., sets in which a large number of the keys are included) For sparse sets space is wasted and, for instance, listing all the keys in the set becomes heavy Some implementations: BitSet in Scalas bitset in the C++ standard library Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

7 Direct-access tables Suppose that we have a small set of possible keys For instance, in the two-letter country codes in ISO alpha- drawn from the alphabet {A,B,...,Z} of 6 letters, there are 6 6 = 676 possible codes (of which 49, such as FI and UK, are actually assigned to some meaning) We can implement a dictionary mapping country codes to objects (e.g., capital city name) by having a direct-access table with 676 entries The value v of the code c 1 c is simply stored in the entry index(c 1 c ) = f(c 1 ) 6 + f(c ) of the array, where f(a) = 0,f(B) = 1,...,f(Z) = 5 INSERT, SEARCH, and DELETE now easy to implement in O(1) time Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

8 Example Mapping country codes to the entries in a direct-access table arr: AA AB AC DE FI UK US ZZ arr..... Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

9 Implementing a country code map in Scala: import scala. r e f l e c t. ClassTag class CountryMap [ B >: N u l l ] ( ) ( i m p l i c i t tag : ClassTag [ B ] ) { private val a r r = new Array [ B] ( ) private def f ( c : Char ) = c. t o I n t A. t o I n t private def isvalidcode ( code : S t r i n g ) = code. length == && 0 <= f ( code ( 0 ) ) && f ( code ( 0 ) ) < 6 && 0 <= f ( code ( 1 ) ) && f ( code ( 1 ) ) < 6 def index ( code : S t r i n g ) : I n t = { require ( isvalidcode ( code ) ) } f ( code ( 0 ) ) * 6 + f ( code ( 1 ) ) def apply ( code : S t r i n g ) : Option [ B ] = { val v = a r r ( index ( code ) ) i f ( v == null ) None else Some( v ) } def update ( code : String, value : B) = { r e q u i r e ( value!= null ) a r r ( index ( code ) ) = value } def d elete ( code : S t r i n g ) = { a r r ( index ( code ) ) = null } } Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

10 Note: use of nulls is generally discouraged in Scala (use Option instead) but excusable inside data structure implementations Extending the class with constant-time size operation is easy Example Building a direct-access table mapping country codes to some capital city names. val c a p i t a l = new CountryMap [ S t r i n g ] ( ) c a p i t a l ( DE ) = B e r l i n c a p i t a l ( FI ) = H e l s i n k i c a p i t a l ( UK ) = London c a p i t a l ( US ) = Washington p r i n t l n ( c a p i t a l ( FI ) ) produces Some( H e l s i n k i ) AA AB AC DE FI arr.. Berlin Helsinki UK 530. London US 538. Washington ZZ 675. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

11 Hashing and hash tables Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

12 Hashing and hash tables Extend the idea of direct-access tables to very large (or even infinite) key universes U At any given time, only a subset K U of the possible keys are used The main idea is to have a hash table of m entries and then use a hash function h : U {0,1,...,m 1} to map each key to an index in in the table In the ideal case, each key should map to a different index... but in general this is difficult to obtain efficiently and there will be collisions when two keys should be put in the same table index We will discuss the design of hash functions later Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

13 Example Assume a hash table with m = 13 entries and a hash function for strings implemented in Scala.11.8 with def h(s: String) = math.abs(s.hashcode) % 13 Many strings map to different indices and the idea works as is Germany Finland United States Denmark United Kingdom Sweden Berlin Helsinki Washington Copenhagen London Stockholm arr But some strings map to the same index causing collisions Germany Finland United States Denmark Austria United Kingdom Sweden Berlin Helsinki Washington London Stockholm arr Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

14 How probable are collisions? Suppose a hash function h under the simple uniform hashing assumption: any key is equally likely to hash to a value in {0,1,...,m 1} independently of the other keys Under the assumption, the number of randomly drawn keys such that at least one collision is produced with probability p is ( ) 1 m ln 1 p As an example, for a hash table with m = entries we only need to insert only 1178 random keys to produce at least one collision with probability of 0.5 As a consequence, collisions become rather likely quite soon A special case of this is called the birthday paradox: there are 365 days in a year but if we have 3 randomly selected people in the same room, the probability that there are two people having the same birthday is at least 0.5 (assuming that the birthdays of people in general are evenly distributed over a year) Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

15 Collision resolution Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

16 In order to handle the inevitable collisions, we need a collision resolution scheme In the following, we ll see chaining, and open hashing with some variants We need an additional definition: the load factor α of a hash table with m entries when there are n keys stored in it is α = n/m Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

17 Chaining Also called separate chaining and open hashing The idea: each entry in the hash table starts a linked list and the key/value-pairs are stored in the list After finding the index h(k), and thus the correct list for a key k, the rest is as with mutable linked lists: search: traverse the list to see if the key was stored in an entry insertion: traverse the list and insert a new entry at the end of the list if the key was not found deletion: traverse the list and remove the entry if the key was found Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

18 Example: Collision handling with chaining The entries in the linked lists are drawn as key value next entry, where key is the key (or actually a reference to it in case of a non-primitive type) value is the value (or a reference to it), and next entry is a reference to the next entry in the list (null if last) Germany Finland 0 1 arr Germany Berlin Finland Helsinki United States 3 United States Washington Denmark Austria 8 9 Denmark Copenhagen Austria Vienna United Kingdom 10 United Kingdom London Sweden 11 Sweden Stockholm 1 Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

19 Example: Implementing sets with hashing and chaining If we are not interested in maps but of sets, we just drop the value field and key next entry use entries of form in the lists. Inserting the numbers 131, 9833, 344, 6, 17, 434, 653 and -13 in a hash set of integers with the hash function h(k) = k mod m produces the hash table 0 1 arr Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

20 unordered sets in C++11 The GNU ISO C++ library implements open hashing Hash functions and key equality comparators already implemented for basic types # i n c l u d e <iostream> # i n c l u d e <unordered set> i n t main ( ) { / / A set of small prime numbers std : : unordered set<i n t> myset = {3,5,7,11,13,17,19,3,9}; myset. erase ( 1 3 ) ; / / erasing by key myset. erase ( myset. begin ( ) ) ; / / erasing by i t e r a t o r std : : cout << myset contains : ; f o r ( const i n t & x : myset ) std : : cout << << x ; std : : cout << std : : endl ; r e t u r n 0; } One possible output (note that the elements are not ordered): myset contains : Similarly for unordered multisets, maps, and multimaps Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

21 Analysis Suppose that we have inserted n keys in a hash table with m entries Under the simple uniform hashing assumption, the expected number of keys in the linked list at an index i is n m Assume that computing the hash value h(k) for each key takes O(1) time The cost of searching for a key in the hash table is O(1 + n ), i.e. m O(1 + α), on average in both cases when the key is not found and when it is found in the table When n is proportional to m, i.e. n cm for some constant c, implying α c, then searching takes time O(1 + cm ) = O(1) on m average If insertion and deletion operations first search whether the key is in the table, then they also take time O(1) on average if n cm for some constant c Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

22 The worst-case behaviour occurs when all the keys hash to the same index: the hash table effectively reduces to a linked list and inserting, searching, and deleting keys require Θ(n) time good design of hash functions is important Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn 017 / 47

23 Rehashing How large hash table should we allocate in the beginning if/when we do not know how many keys will be seen in the future? Or, what should we do when the load factor grows too large? The answer is rehashing: start with a smallish hash table and then grow its size when the load factor rises above certain threshold When the hash table is resized, all the keys (or key/value pairs) must be reinserted to it as their indices in the table are probably changed (hence the term rehashing ) What is a good value of load factor for triggering rehashing? There is no single best answer; the GNU ISO C++ library version 4.6 triggers rehashing when the load factor is 1.0 and doubles the number of entries Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

24 Open addressing Another collision resolution scheme Also (very confusingly) called closed hashing Idea: use an array large enough to hold all the inserted keys Each array index stores one key/value pair (or simply the key if we are implementing a set instead of a map), or is null when it is free Thus the load factor is at most 1.0 in open addressing When a collision occurs during an insert, probe an another index in the table until a free one is found Similarly, when searching for a key, one probes indices until the key is found or an empty slot is encountered Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

25 For probing, we need to define which index is tried next To do this, we define a hash function h : U {0,1,...,m 1} {0,1,...m 1} where the second argument is the probe number For each key k, the probe sequence is thus h(k,0),h(k,1),...,h(k,m 1) To ensure that every index in the table is probed at some point, the probe sequence must be a permutation of {0,1,...,m 1} for every key k Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

26 Linear probing The simplest way to define probe sequences Given an auxiliary hash function h : U {0,1,...,m 1} h is the usually the hash function provided by the class or user (e.g., hashcode in Scala) Define the hash function Clearly, the probe sequence h(k,i) = (h (k) + i) mod m h(k,0),h(k,1),...,h(k,m 1) is a permutation of {0,1,...,m 1} Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

27 Example: Collision resolution with linear probing The entries in the table are drawn as key value, where again key is the key (or actually a reference to it in case of a non-primitive type), and value is the value (or a reference to it). Inserting the mappings Finland Helsinki United States Washington United Kingdom London Denmark Copenhagen Austria Vienna Sweden Stockholm in a table of size 13 in this order by using our ealier hash function and linear probing gives the hash table on the right When inserting the map Austria Vienna, we observe that the index 8 is already taken by Denmark, probe the next one, see that it is free, and insert the key/value-pair there arr Germany Berlin Finland Helsinki United States Washington Denmark Copenhagen Austria Vienna United Kingdom London Sweden Stockholm Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

28 Example: Sets with hashing and linear probing When implementing sets instead of maps, the table entries consist of the key values (or references to them) only. For instance, inserting the numbers 131, 9833, 344, 6, 17, 434, 653 and -13 in a hash table of integers with m = 13 by using the auxiliary hash function h (k) = k: is inserted to h(131,0) = ( ) mod 13 = is inserted to h(9833,0) = ( ) mod 13 = is inserted to h(344,0) = ( ) mod 13 = is inserted to h(6,0) = (6 + 0) mod 13 = is inserted to h(17,0) = (17 + 0) mod 13 = 4 6 as h(434,0) = ( ) mod 13 = 9 is occupied, the next index h(434,1) = ( ) mod 13 = 10 is probed, found free, and 434 is inserted there 7 as h(653,0) = ( ) mod 13 = 3 is occupied, the next indices h(653,1) = 4 and h(653,) = 5 are probed and found occupied, and finally 653 is inserted to h(653,3) = 6 8 as h( 13,0) = 0 is occupied, 13 is inserted to the next free index h( 13,1) = arr Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

29 In the following examples, for representational simplicity, we only consider hash sets for integers generalization to maps over arbitrary types is straightforward Linear probing suffers from the problem of primary clustering: Example an empty slot preceded by i occupied slots get filled with probability (i + 1)/m instead of 1/m, and thus occupied slots start to cluster Suppose that h (k) = k for integer-valued keys and m = 17. Inserting the values 1, 50,, 0, 38, 35 in this order with linear probing produces the hash table where the arrows show the probe sequence for the key 35. Note that only 1, 50, and 35 of the above keys hash to the same value. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

30 Quadratic probing Eliminates primary clustering with more complex probing sequences Define h(k,i) = (h (k) + c 1 i + c i ) mod m where c 1 and c are some positive constants To make the probe sequence a permutation of {0,1,...,m 1}, the values c 1, c, and m must be constrained Example Assume that m = 11 (i.e., a prime) and h(k,i) = (h (k) + i + i ) mod m If h (k) = 0, then the probe sequence is 0,,6,1,9,8,9,1,6,,0. This is not a permutation of {0,1,...,10} but probes only 6 of 11 possible slots. Thus if the load factor is high, this probe sequence may not find any free slots. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

31 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

32 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

33 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

34 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

35 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

36 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

37 Example Assume that m is a power of two. In this case, it can be proven that h(k,i) = (h i(i + 1) (k) + ) mod m = (h (k) + 0.5i + 0.5i ) mod m produces probe sequencies that are permutations of {0, 1,..., m}. For instance, if m = 3 = 8 and h (k) = 0, then the probe sequence is 0,1,3,6,,7,5,4. Example Suppose that h (k) = k for integer-valued keys and m = 16. Insert the values 1, 50,, 0, 38, 35 in this order with quadratic probing hash function h(k) = (h (k) + i(i+1) ) mod m to the hash table below (again, arrows show the probe sequences): Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

38 Quadratic probing does not suffer from primary clustering But two keys k 1 and k with the same hash value h (k 1 ) = h (k ) have the same probe sequence This (less severe) problem with quadratic probing is called secondary clustering Another inoptimality of linear and quadratic probings is the following: For hash tables of size m, there are m! possible probe sequences But for any auxiliary hash function h, both linear and quadratic probing (with fixed constants) only explore at most m of those Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

39 Double hashing Produces more probe sequences than linear and quadratic probing by using an another hash function for generating the probe sequence steps The general form is h(k,i) = (h 1 (k) + i h (k)) mod m Double hashing can produce m different probe sequences Again, to obtain permutation probe sequencies, we must constrain the function h This can be done by forcing the value h (k) to be relatively prime to the hash table size m Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

40 Example One convenient way to make the probe sequence h(k,0),h(k,1),...,h(k,m 1) to be a permutation of {0,1,...,m 1} is to require that m is a power of two and h (k) is an odd number. Example An another way to make the probe sequence h(k,0),h(k,1),...,h(k,m 1) to be a permutation of {0,1,...,m 1} is to require that m is a prime number and h (k) is a positive number less than m. For instance, for integer keys we could choose h 1 (k) = k mod m and h (k) = 1 + (k mod m ) for some m slightly less than m. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

41 Removing keys Removing keys when using chaining was straightforward With open addressing, we must ensure that removing does not leave holes in the probe sequencies of other keys Example Suppose that h (k) = k for integer-valued keys, m = 16, and we use quadratic probing with the hash function h(k) = (h (k) + i(i+1) ) mod m to insert the keys 3, 0 and 35 in a hash table. The result is If we now delete the key 0 by simply removing it, the result is But in this hash table the search operation does not find the key 35 anymore as it probes the slots 3 and 4 only, concluding that the key is not in the set when it sees the free slot at index 4. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

42 One solution is to replace the deleted key with some special tombstone key value del in the hash table Search and deletion consider these values to be real keys but insertion can overwrite them If the table has too many such tombstones, one should probably garbage collect them by rehashing all the values in the same table so that search operations do not perform unnecessary work Example Consider again the hash table of the previous example: Delete the key 0 by putting the tombstone value in its place: del Now the search operation can work unmodified and finds the key 35 after probing the slots 3, 4 and 6. If we now insert the key 19, we probe the slots 3 and 4 and then insert the key in the slot 4. Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

43 Analysis For the analysis, assume uniform hashing: the probe sequence of each key is equally likely to be any of the m! permutations of {0,1,...,m 1} Of the probing methods presented, double hashing is closest to this requirement Theorem (11.6 in Introduction to Algorithms, 3rd ed. (online via Aalto lib)) Under the uniform hashing assumption, the expected number of probings done in the case of an unsuccesfull search is at most 1/(1 α). Recall that the load factor α is less than 1 in a non-full hash-table in open addressing and thus 1/(1 α) = 1 + α + α + α Informal intuitive explanation: 1 comes from the fact that at least one probe is done The first probed slot was occupied with probability α and thus a second probe is done with probability α The first and second probed slot were both occupied with probability α and thus a third probe is done with probability α... Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

44 Therefore, if we keep the load factor below some constant, then the insertion, search, and deletion take constant time For instance, if we keep the load factor at or below 0.5, then the expected number of probes is at most Note: the deleted values in the table are accounted to the load factor in this case! If deletions are known to occurr often, then separate chaining is probably a better collision resolution approach Again, when an open addressing hash-table becomes too full, we perform rehashing: grow the size of the table and reinsert the keys to this new larger table As in the case of dynamically grown arrays, the size of the hash table is usually approximately doubled What is a good value of load factor for triggering rehashing? There is no single best answer but usually one uses something like 0.50 or 0.75 for open addressing Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

45 Building hash functions Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

46 In simple uniform hashing, each key should be equally likely to hash to any of the m values, independently where any other key has hashed to But we do not necessarily know the distribution of the keys inserted nor are they necessarily drawn independently A good approach computes hash values in a way that we expect to be independent of the patterns that appear in the keys As an example, in a compiler symbol table construction we do not want to the common strings i and j to hash to the neigbouring values especially if we use open addressing and linear probing As a rough guide: the more random the hash value looks like, the better For map/set hash table use, the hash value should also be very efficient to compute Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

47 Hashing integers We first show some ways to produce hash functions for integer keys We have already seen the division method in which the hash value is the remainder when dividing the key with the hash table size m: h(k) = k mod m This is fast to compute as only one division operation is required But it may be bad choice if m is a power of two with m = p : only the p least significant bits of the key influence the hash value If possible, m should rather be a prime number not too close to a power of two Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

48 The multiplication method produces a hash value for a w-bit integer key k by multiplying it with another, well chosen constant w-bit integer A, to a w-bit integer r = r w 1 r w...r 0 = ka and taking the hash value from the most significant bits of the least significant half r w 1...r 0 Here m is usually a power of two, m = p, so that one can simply take the p most significant bits in r w 1...r 0 by shifting and masking For instance, consider the Scala function def h(x: Int): Int = (x * L).toInt Now h(1).tohexstring = 9e3779b9 h().tohexstring = 3c6ef37 h(3).tohexstring = daa66db and so on, looking quite random Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

49 Hashing strings To translate a string s = c 0...c n 1 into an integer, characters c i in the string can be processed one-by-one E.g., the current Java implementations use the hash function n 1 h(s) = i=0 c i 31 n 1 i From openjdk Java 8 (with software caching of the hash value): p u b l i c f i n a l class S t r i n g implements java. io. Serializable, Comparable<String >, CharSequence { / * * The value i s used f o r character storage. * / p r i v a t e f i n a l char value [ ] ; / * * Cache the hash code f o r the s t r i n g * / private i n t hash ; / / Default to 0... p u b l i c i n t hashcode ( ) { i n t h = hash ; i f ( h == 0 && value. l ength > 0) { char val [ ] = value ; f o r ( i n t i = 0; i < value. length ; i ++) { h = 31 * h + v a l [ i ] ; } hash = h ; } r e t u r n h ; } } Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

50 Hashing compound objects Computing hash values for compound objects (objects with fields, arrays) can be done by combining the (hash) values of the components One of the simplest examples is computing hash code for integer arrays in openjdk Java 6 p u b l i c s t a t i c i n t hashcode ( i n t a [ ] ) { i f ( a == n u l l ) r e t u r n 0; i n t r e s u l t = 1; f o r ( i n t element : a ) r e s u l t = 31 * r e s u l t + element ; return r e s u l t ; } Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

51 Currently Scala uses a bit more complex hash function based on MurMurHash 3 It tries to produce good hash values (few collisions) and be fast From the Scala source code: f i n a l def arrayhash [@s p e c i a l i z e d T ] ( a : Array [ T ], seed : I n t ) : I n t = { var h = seed var i = 0 while ( i < a. length ) { h = mix ( h, a ( i ). ## ) / / ## i s hashcode i += 1 } f i n a l i z e H a s h ( h, a. length ) } f i n a l def mix ( hash : I n t, data : I n t ) : I n t = { var h = mixlast ( hash, data ) h = r o t l ( h, 13) / / r o t l i s I n t e r g e r. r o t a t e L e f t h * 5 + 0xe6546b64 } Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

52 f i n a l def mixlast ( hash : I n t, data : I n t ) : I n t = { var k = data k * = 0xcc9ed51 k = r o t l ( k, 15) k * = 0x1b hash ˆ k } / * * F i n a l i z e a hash to i n c o r p o r a t e the length and make sure a l l b i t s avalanche. * / f i n a l def f i n a l i z e H a s h ( hash : I n t, length : I n t ) : I n t = avalanche ( hash ˆ length ) / * * Force a l l b i t s of the hash to avalanche. Used f o r f i n a l i z i n g the hash. * / private f i n a l def avalanche ( hash : I n t ) : I n t = { var h = hash h ˆ = h >>> 16 h * = 0x85ebca6b h ˆ = h >>> 13 h * = 0xcbae35 h ˆ = h >>> 16 h } Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

53 More Naturally, a lot of hash functions for different purposes have been proposed See, e.g., the following links: libsupc%b%b/hash_bytes.cc?view=markup#l74 Tommi Junttila (Aalto University) Round 5 CS-A1140 / Autumn / 47

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis

Motivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated

More information

Lecture: Analysis of Algorithms (CS )

Lecture: Analysis of Algorithms (CS ) Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect

More information

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park 1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution

More information

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record. CS 5633 -- Spring 25 Symbol-table problem Hashing Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 5633 Analysis of Algorithms 1 Symbol table holding n records: record

More information

Introduction to Hashtables

Introduction to Hashtables Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example

More information

Hash tables. Hash tables

Hash tables. Hash tables Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X

More information

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1

Hash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of

More information

Hashing Data Structures. Ananda Gunawardena

Hashing Data Structures. Ananda Gunawardena Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search

More information

Hashing, Hash Functions. Lecture 7

Hashing, Hash Functions. Lecture 7 Hashing, Hash Functions Lecture 7 Symbol-table problem Symbol table T holding n records: x record key[x] Other fields containing satellite data Operations on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How

More information

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30 CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, 2019 1 / 30 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30 Open Addressing Open Addressing. Each entry in

More information

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9

Searching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9 Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Hash tables. Hash tables

Hash tables. Hash tables Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn Dictionary problem Dictionary T holding n records: x records key[x] Other fields containing satellite data Operations on T: INSERT(T, x)

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Introduction to Hash Tables

Introduction to Hash Tables Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments

More information

Hashing. Why Hashing? Applications of Hashing

Hashing. Why Hashing? Applications of Hashing 12 Hashing Why Hashing? Hashing A Search algorithm is fast enough if its time performance is O(log 2 n) For 1 5 elements, it requires approx 17 operations But, such speed may not be applicable in real-world

More information

So far we have implemented the search for a key by carefully choosing split-elements.

So far we have implemented the search for a key by carefully choosing split-elements. 7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise

More information

Analysis of Algorithms I: Perfect Hashing

Analysis of Algorithms I: Perfect Hashing Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the

More information

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple

A Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

Hashing. Martin Babka. January 12, 2011

Hashing. Martin Babka. January 12, 2011 Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at

More information

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Algorithms lecture notes 1. Hashing, and Universal Hash functions Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.

More information

CSE 502 Class 11 Part 2

CSE 502 Class 11 Part 2 CSE 502 Class 11 Part 2 Jeremy Buhler Steve Cole February 17 2015 Today: analysis of hashing 1 Constraints of Double Hashing How does using OA w/double hashing constrain our hash function design? Need

More information

Cache-Oblivious Hashing

Cache-Oblivious Hashing Cache-Oblivious Hashing Zhewei Wei Hong Kong University of Science & Technology Joint work with Rasmus Pagh, Ke Yi and Qin Zhang Dictionary Problem Store a subset S of the Universe U. Lookup: Does x belong

More information

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)

COMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002) COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x

More information

? 11.5 Perfect hashing. Exercises

? 11.5 Perfect hashing. Exercises 11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate

More information

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing

Hashing. Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Dictionaries Dictionary: Maintain a dynamic set S. Every

More information

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).

More information

Hashing. Data organization in main memory or disk

Hashing. Data organization in main memory or disk Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most

More information

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

Abstract Data Type (ADT) maintains a set of items, each with a key, subject to Lecture Overview Dictionaries and Python Motivation Hash functions Chaining Simple uniform hashing Good hash functions Readings CLRS Chapter,, 3 Dictionary Problem Abstract Data Type (ADT) maintains a

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261

More information

Cuckoo Hashing and Cuckoo Filters

Cuckoo Hashing and Cuckoo Filters Cuckoo Hashing and Cuckoo Filters Noah Fleming May 7, 208 Preliminaries A dictionary is an abstract data type that stores a collection of elements, located by their key. It supports operations: insert,

More information

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing) CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose

More information

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM

Hashing. Hashing DESIGN & ANALYSIS OF ALGORITHM Hashing Hashing Start with an array that holds the hash table. Use a hash function to take a key and map it to some index in the array. If the desired record is in the location given by the index, then

More information

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161 Discrete Math at CSU (Rosen book) Sets and Functions (Rosen, Sections 2.1,2.2, 2.3) TOPICS Discrete math Set Definition Set Operations Tuples 1 n CS 160 or CS122 n Sets and Functions n Propositions and

More information

Array-based Hashtables

Array-based Hashtables Array-based Hashtables For simplicity, we will assume that we only insert numeric keys into the hashtable hash(x) = x % B; where B is the number of 5 Implementation class Hashtable { int [B]; bool occupied[b];

More information

4.5 Applications of Congruences

4.5 Applications of Congruences 4.5 Applications of Congruences 287 66. Find all solutions of the congruence x 2 16 (mod 105). [Hint: Find the solutions of this congruence modulo 3, modulo 5, and modulo 7, and then use the Chinese remainder

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

Problem Set 4 Solutions

Problem Set 4 Solutions Introduction to Algorithms October 8, 2001 Massachusetts Institute of Technology 6.046J/18.410J Singapore-MIT Alliance SMA5503 Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson Handout 18

More information

Advanced Algorithm Design: Hashing and Applications to Compact Data Representation

Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Lectured by Prof. Moses Chariar Transcribed by John McSpedon Feb th, 20 Cucoo Hashing Recall from last lecture the dictionary

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

A Repetition Test for Pseudo-Random Number Generators

A Repetition Test for Pseudo-Random Number Generators Monte Carlo Methods and Appl., Vol. 12, No. 5-6, pp. 385 393 (2006) c VSP 2006 A Repetition Test for Pseudo-Random Number Generators Manuel Gil, Gaston H. Gonnet, Wesley P. Petersen SAM, Mathematik, ETHZ,

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending

More information

Testing a Hash Function using Probability

Testing a Hash Function using Probability Testing a Hash Function using Probability Suppose you have a huge square turnip field with 1000 turnips growing in it. They are all perfectly evenly spaced in a regular pattern. Suppose also that the Germans

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 15 October 20, 2014 CPSC 467, Lecture 15 1/37 Common Hash Functions SHA-2 MD5 Birthday Attack on Hash Functions Constructing New

More information

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence Section 2.4 1 Section Summary Sequences. Examples: Geometric Progression, Arithmetic Progression Recurrence Relations Example: Fibonacci Sequence Summations 2 Introduction Sequences are ordered lists of

More information

1. Write a program to calculate distance traveled by light

1. Write a program to calculate distance traveled by light G. H. R a i s o n i C o l l e g e O f E n g i n e e r i n g D i g d o h H i l l s, H i n g n a R o a d, N a g p u r D e p a r t m e n t O f C o m p u t e r S c i e n c e & E n g g P r a c t i c a l M a

More information

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille

Hashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

More information

Basics of hashing: k-independence and applications

Basics of hashing: k-independence and applications Basics of hashing: k-independence and applications Rasmus Pagh Supported by: 1 Agenda Load balancing using hashing! - Analysis using bounded independence! Implementation of small independence! Case studies:!

More information

Authentication. Chapter Message Authentication

Authentication. Chapter Message Authentication Chapter 5 Authentication 5.1 Message Authentication Suppose Bob receives a message addressed from Alice. How does Bob ensure that the message received is the same as the message sent by Alice? For example,

More information

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing

Hashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Dictionaries Dictionary problem. Maintain a set S U = {,..., u-} supporting lookup(x): return true if x S and false otherwise. insert(x): set S = S {x} delete(x): set S = S - {x} Dictionaries

More information

CS1800: Mathematical Induction. Professor Kevin Gold

CS1800: Mathematical Induction. Professor Kevin Gold CS1800: Mathematical Induction Professor Kevin Gold Induction: Used to Prove Patterns Just Keep Going For an algorithm, we may want to prove that it just keeps working, no matter how big the input size

More information

1 ListElement l e = f i r s t ; / / s t a r t i n g p o i n t 2 while ( l e. next!= n u l l ) 3 { l e = l e. next ; / / next step 4 } Removal

1 ListElement l e = f i r s t ; / / s t a r t i n g p o i n t 2 while ( l e. next!= n u l l ) 3 { l e = l e. next ; / / next step 4 } Removal Präsenzstunden Today In the same room as in the first week Assignment 5 Felix Friedrich, Lars Widmer, Fabian Stutz TA lecture, Informatics II D-BAUG March 18, 2014 HIL E 15.2 15:00-18:00 Timon Gehr (arriving

More information

Cryptographic Hash Functions

Cryptographic Hash Functions Cryptographic Hash Functions Çetin Kaya Koç koc@ece.orst.edu Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report December 9, 2002 Version 1.5 1 1 Introduction

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) = CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)

More information

Lecture 8 HASHING!!!!!

Lecture 8 HASHING!!!!! Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)

More information

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b) Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem

More information

CSE 312, 2017 Winter, W.L.Ruzzo. 5. independence [ ]

CSE 312, 2017 Winter, W.L.Ruzzo. 5. independence [ ] CSE 312, 2017 Winter, W.L.Ruzzo 5. independence [ ] independence Defn: Two events E and F are independent if P(EF) = P(E) P(F) If P(F)>0, this is equivalent to: P(E F) = P(E) (proof below) Otherwise, they

More information

Chapter Summary. Sets The Language of Sets Set Operations Set Identities Functions Types of Functions Operations on Functions Computability

Chapter Summary. Sets The Language of Sets Set Operations Set Identities Functions Types of Functions Operations on Functions Computability Chapter 2 1 Chapter Summary Sets The Language of Sets Set Operations Set Identities Functions Types of Functions Operations on Functions Computability Sequences and Summations Types of Sequences Summation

More information

0 Sets and Induction. Sets

0 Sets and Induction. Sets 0 Sets and Induction Sets A set is an unordered collection of objects, called elements or members of the set. A set is said to contain its elements. We write a A to denote that a is an element of the set

More information

Data Structures and Algorithms Winter Semester

Data Structures and Algorithms Winter Semester Page 0 German University in Cairo December 26, 2015 Media Engineering and Technology Faculty Prof. Dr. Slim Abdennadher Dr. Wael Abouelsadaat Data Structures and Algorithms Winter Semester 2015-2016 Final

More information

1 Cryptographic hash functions

1 Cryptographic hash functions CSCI 5440: Cryptography Lecture 6 The Chinese University of Hong Kong 24 October 2012 1 Cryptographic hash functions Last time we saw a construction of message authentication codes (MACs) for fixed-length

More information

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

CS246 Final Exam. March 16, :30AM - 11:30AM

CS246 Final Exam. March 16, :30AM - 11:30AM CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions

More information

CS 580: Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Announcements: Homework 6 deadline extended to April 24 th at 11:59 PM Course Evaluation Survey: Live until 4/29/2018

More information

Sets are one of the basic building blocks for the types of objects considered in discrete mathematics.

Sets are one of the basic building blocks for the types of objects considered in discrete mathematics. Section 2.1 Introduction Sets are one of the basic building blocks for the types of objects considered in discrete mathematics. Important for counting. Programming languages have set operations. Set theory

More information

The set of integers will be denoted by Z = {, -3, -2, -1, 0, 1, 2, 3, 4, }

The set of integers will be denoted by Z = {, -3, -2, -1, 0, 1, 2, 3, 4, } Integers and Division 1 The Integers and Division This area of discrete mathematics belongs to the area of Number Theory. Some applications of the concepts in this section include generating pseudorandom

More information

Counting Methods. CSE 191, Class Note 05: Counting Methods Computer Sci & Eng Dept SUNY Buffalo

Counting Methods. CSE 191, Class Note 05: Counting Methods Computer Sci & Eng Dept SUNY Buffalo Counting Methods CSE 191, Class Note 05: Counting Methods Computer Sci & Eng Dept SUNY Buffalo c Xin He (University at Buffalo) CSE 191 Discrete Structures 1 / 48 Need for Counting The problem of counting

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

Lecture 4: Finite Automata

Lecture 4: Finite Automata Administrivia Lecture 4: Finite Automata Everyone should now be registered electronically using the link on our webpage. If you haven t, do so today! I dliketohaveteamsformedbyfriday,ifpossible,butnextmonday

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)

More information

Mutually Orthogonal Latin Squares: Covering and Packing Analogues

Mutually Orthogonal Latin Squares: Covering and Packing Analogues Squares: Covering 1 1 School of Computing, Informatics, and Decision Systems Engineering Arizona State University Mile High Conference, 15 August 2013 Latin Squares Definition A latin square of side n

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

Notes. Number Theory: Applications. Notes. Number Theory: Applications. Notes. Hash Functions I

Notes. Number Theory: Applications. Notes. Number Theory: Applications. Notes. Hash Functions I Number Theory: Applications Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry Fall 2007 Computer Science & Engineering 235 Introduction to Discrete Mathematics Sections 3.4 3.7 of Rosen cse235@cse.unl.edu

More information

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) Databases DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) References Hashing Techniques: Elmasri, 7th Ed. Chapter 16, section 8. Cormen, 3rd Ed. Chapter 11. Inverted indexing: Elmasri,

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information

Theoretical Cryptography, Lectures 18-20

Theoretical Cryptography, Lectures 18-20 Theoretical Cryptography, Lectures 18-20 Instructor: Manuel Blum Scribes: Ryan Williams and Yinmeng Zhang March 29, 2006 1 Content of the Lectures These lectures will cover how someone can prove in zero-knowledge

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65 Undecidable Problems Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, 2018 1/ 65 Algorithmically Solvable Problems Let us assume we have a problem P. If there is an algorithm solving

More information

Cryptanalysis of a Message Authentication Code due to Cary and Venkatesan

Cryptanalysis of a Message Authentication Code due to Cary and Venkatesan Cryptanalysis of a Message Authentication Code due to Cary and Venkatesan Simon R. Blackburn and Kenneth G. Paterson Department of Mathematics Royal Holloway, University of London Egham, Surrey, TW20 0EX,

More information

b = 10 a, is the logarithm of b to the base 10. Changing the base to e we obtain natural logarithms, so a = ln b means that b = e a.

b = 10 a, is the logarithm of b to the base 10. Changing the base to e we obtain natural logarithms, so a = ln b means that b = e a. INTRODUCTION TO CRYPTOGRAPHY 5. Discrete Logarithms Recall the classical logarithm for real numbers: If we write b = 10 a, then a = log 10 b is the logarithm of b to the base 10. Changing the base to e

More information

1 Hashing. 1.1 Perfect Hashing

1 Hashing. 1.1 Perfect Hashing 1 Hashing Hashing is covered by undergraduate courses like Algo I. However, there is much more to say on this topic. Here, we focus on two selected topics: perfect hashing and cockoo hashing. In general,

More information

The Bloom Paradox: When not to Use a Bloom Filter

The Bloom Paradox: When not to Use a Bloom Filter 1 The Bloom Paradox: When not to Use a Bloom Filter Ori Rottenstreich and Isaac Keslassy Abstract In this paper, we uncover the Bloom paradox in Bloom filters: sometimes, the Bloom filter is harmful and

More information