Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)
|
|
- Lorena Mason
- 5 years ago
- Views:
Transcription
1 Databases DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)
2 References Hashing Techniques: Elmasri, 7th Ed. Chapter 16, section 8. Cormen, 3rd Ed. Chapter 11. Inverted indexing: Elmasri, 7th Ed. Chapter 27, section 5. 2
3 Hashing Techniques (1) High efficient indexes for equality search. They could be used both for internal or external files. We could classify our techniques in other two types: Static Hashing: for fixed size non-mutable data utilizzabile (e.g. single session CD-ROM/DVD/BlueRay). Hashing for Dynamic File Expansion: when both data and data sizes could vary in time. As for the indices for secondary memory trees, hash indices have use blocks for storing buckets. 3
4 Hashing Techniques (2) The information could be accessed fast: another type of primary file organization. Value search Bucketing (see hash join) We want to examine an arbitrary position in an array in O(1) time. For some hash data structures, in the worse case scenario the seek time is Θ(n) Allow to implement the dictionary operations (Insert, Search, Delete). 4
5 Hashing Functions We want to store our information into a given number of records using blocks (and buckets). The search condition involves a (single) hash field. In most cases, such field is a key field of the file, in which case it is called hash key. We define a hashing function mapping each value into a range 0..M-1. e.g.: h(k)=k mod M If K is non-numeric, its byte representation is used instead 5
6 Collision Resolution It implies that some collisions may happed (exists k and k s.t. h(k)=h(k )) that have to be resolved (collision resolutions). Collision resolutions are the following: Chaining: the bucket array is exteded with overflow. Open addressing: from the occupied position, we seek the next free position. Multiple hashing: if one hash function generates a collision, we try with a next function We re going to see only the first technique. 6
7 Static Hashing A fixed number of buckets M is allocated, and an hashing function mapping K to [0,,M-1]. Each bucket has at least one block. Each block is composed by m records. E.g., H(K) returns the first/last M bits of K s byte representation. In this case, i = 1, N = 2 i = 2, and each bucket contains a block 0 record blocks bucket
8 Static Hashing for Internal Files 8
9 Static Hashing for Internal Files 9
10 Static Hashing: searching 1100 The hash function computes the array index where the records with a given hash field are stored 1100 H Hash Function 10
11 Ricerca di 1100 con hashing statico The hash function computes the array index where the records with a given hash field are stored 1100 H Hash Function 11
12 Static Hashing: Insert() We use the chaining method to extend a block with overflow positions. H
13 Static Hashing: Insert() We use the chaining method to extend a block with overflow positions. H
14 Static Hashing: Insert() We use the chaining method to extend a block with overflow positions. H
15 How many blocks are there? A 0 B 1 4 C 2 D
16 How many buckets are there? A 0 B 1 4 C 2 D
17 Static Hashing: Delete(1100) Overflows degradate static hashing's efficiency. Value deletion could reduce the number of overflows H
18 Static Hashing: Delete(1100) Overflows degradate static hashing's efficiency. Value deletion could reduce the number of overflows H
19 Static Hashing: Delete(1100) Overflows degradate static hashing's efficiency. Value deletion could reduce the number of overflows H
20 Static Hashing: Delete(1100) Overflows degradate static hashing's efficiency. Value deletion could reduce the number of overflows
21 Static Hashing: Efficiency(?) When no overflow occur, buckets could be accessed with a single access. Then, m scans have to be performend if we want to reach a specific record. Overflows cause a quick performance degradation Efficency depends on: Index size vs. Data size ratio, number of buckets. Uniform hashing (e.g. H(k) is not a constant function ). H(K) could vary at run time in order to reduce the number of overflows 21
22 Dynamic File Expansion Collision resolution could be avoided by changing the number of the buckets dynamically. Extendible Hashing: bucekts are used through an extendible directory, storing pointers to buckets. Overflows are not used. The indexed buckets increase exponentially. Linear Hashing: No directory are used. Overflows are still used. The indexed buckets increase linearly. Dynamic Hashing: (A precursor of extendible hashing, this argument will not be dealt with in the) 22
23 External Hashing: a general example 23
24 Extendible Hashing (1) Uses a level of indirection through an array of pointers to buckets (directory). The directory grows by doubling their size. It has size 2 d, increasing linearly by d (global depth). Each entry in the directory points to a single bucket, but each bucket can be accessed from multiple entries. Each bucket contains a d variable (local depth) that indicates how many bits are actually used to index it. 24
25 Extendible Hashing (2) The hash function refreshes alongside with d: it returns the most significant d bits of the binary coded search key. Extendible hashing doesn t use overflow blocks, as the data structure is dynamically updated. 25
26 Extendible Hashing: Search(1100) global depth d= H 0 1 directory
27 Extendible Hashing: Search(1100) global depth d= H 0 1 directory
28 Extendible Hashing: Insert Retrive the bucket where to store the value, if there is enough room, store it! Otherwise, get d. If d < d The block is halved (halving). Keys are distributed within the halved blocks using d + 1 bits. d' is incremented to d'+1 The directory is updated with the pointer to the newly-created block If d = d d is incremented to d+1 (doubling). Each directory entry is doubled, such that each entry w produces two entries, w 0 and w 1. Continue as previously stated 28
29 Extendible Hashing: Insert() (1) Retrive the bucket where to store the value d= H
30 Extendible Hashing: Insert() (1) Retrive the bucket where to store the value d= H There is no room! 30
31 Extendible Hashing: Insert() (2) Compare d' and d d=1 i= H d = d 31
32 Extendible Hashing: Insert() (3) Si incrementa Increase d i by di 1 d= H
33 Extendible Hashing: Insert() (3) Each directory entry is doubled, such that each entry w produces two entries, w 0 and w 1. d= H
34 Extendible Hashing: Insert() (4) The block is halved d=
35 Extendible Hashing: Insert() (4) The block is halved d= SPLIT 1 35
36 Extendible Hashing: Insert() (4) The block is halved Keys are distributed within the halved blocks using d +1 bits. d= SPLIT
37 Extendible Hashing: Insert() (4) The block is halved Keys are distributed within the halved blocks using d +1 bits d' is increased by one d= SPLIT
38 Extendible Hashing: Insert() (4) The block is halved Keys are distributed within the halved blocks using d +1 bits d' is increased by one The directory is updated with the pointer to the newly-created block d= SPLIT
39 Extendible Hashing: Insert(0100) Retrive the bucket where to store the value 0100 H d=
40 Extendible Hashing: Insert(0100) Retrive the bucket where to store the value 0100 H d=
41 Extendible Hashing: Insert(0100) Retrive the bucket where to store the value 0100 H d= There is enough room, insert it! 41
42 Extendible Hashing: Insert() (1) Retrive the bucket where to store the value H d=
43 Extendible Hashing: Insert() (1) Retrive the bucket where to store the value H d= There is no room! 43
44 Extendible Hashing: Insert() (2) Si confronta Compare j d' con and i d H d= d < d 44
45 Extendible Hashing: Insert() (3) The block is halved d=
46 Extendible Hashing: Insert() (3) The block is halved d= SPLIT
47 Extendible Hashing: Insert() (3) The block is halved Keys are distributed within the halved blocks using d +1 bits. d= SPLIT 47
48 Extendible Hashing: Insert() (3) The block is halved Keys are distributed within the halved blocks using d +1 bits. d' is increased by one d= SPLIT 48
49 Extendible Hashing: Insert() (3) The block is halved Keys are distributed within the halved blocks using d + 1 bits. d' is increased by one The directory is updated with the pointer to the newly-created block d= SPLIT 49
50 Hashing estendibile: inserimento 1000 (1) Retrive the bucket where to store the value H d=
51 Hashing estendibile: inserimento 1000 (1) Retrive the bucket where to store the value H d= There is no room! 51
52 Hashing estendibile: inserimento 1000 (2) Compare d' and d H d= d = d 52
53 Hashing estendibile: inserimento 1000 (3) Increase d by H d=
54 Hashing estendibile: inserimento 1000 (4) Increase d by 1 Each directory entry is doubled, such that each entry w produces two entries, w 0 e w H d=3 i=
55 Hashing estendibile: inserimento 1000 (4) The block is halved d=
56 Hashing estendibile: inserimento 1000 (4) The block is halved d= SPLIT 56
57 Hashing estendibile: inserimento 1000 (4) The block is halved Keys are distributed within the halved blocks using d +1 bits d= SPLIT 57
58 Hashing estendibile: inserimento 1000 (4) The block is halved Keys are distributed within the halved blocks using d +1 bits. d' is increased by one d= SPLIT 58
59 Hashing estendibile: inserimento 1000 (5) The directory is updated with the pointer to the newly-created block d=
60 Hashing estendibile: inserimento 1000 (5) The directory is updated with the pointer to the newly-created block d=
61 How many blocks are there? A 3 B 5 C 6 D 8 d=
62 How many buckets are there? A 3 B 5 C 6 D 8 d=
63 Extendible hashing: discussion Pros: The space overhead for the directory is negligible Splitting causes minor reorganizations, since only the records in one bucket are redistributed to the two new buckets (e.g. records that start with the same bit sequence). Cons: Directory must be searched before accessing the buckets themselves: we have two block accesses (directory+data file buckets). This penalty is considered minor. The exponential memory increase in first memory is quite inefficient (it allocates more space than the one actually required) 63
64 Linear Hashing Hash file expands and shrinks without needing a directory. Overflow blocks are allowed. The number of buckets increases linearly. Such increase happens when (i) a overflow block is inserted or (ii) when a specific record-bucket ratio (file load factor, r/n) exeeds a guard value l (e.g. l=1.7). H i (K) returns the less significant i bits. 64
65 Linear Hashing: Search() Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n =
66 Linear Hashing: Search() Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. H i=2 n=3 r=4 r/n = 1.33 H 2 () = 01 2 = 1 10 < 3 66
67 Linear Hashing: Search() Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. H i=2 n=3 r=4 r/n = 1.33 H 2 () = 01 2 = 1 10 < 3 67
68 Linear Hashing: Search(1111) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n =
69 Linear Hashing: Search(1111) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H (2-1) = 1 10 = 01 2 i=2 n=3 r=4 r/n = 1.33 H 2 (1111) = 11 2 =
70 Linear Hashing: Search(1111) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H (2-1) = 1 10 = 01 2 i=2 n=3 r=4 r/n = 1.33 H 2 (1111) = 11 2 =
71 Linear Hashing: Insert (1) r records, n buckets, file load factor (r/n), guard l When the file load factor is exceeded, a bucket gets splitted and a new n+1 bucket is created. While using the hashing function H i, all the buckets up to the 2 i-1 -th are splitted. When n>2 i, H i is changed with H i+1 and the buckets are splitted, once again, from the first one 71
72 Linear Hashing: Insert (2) if H i (k) = m < n: store k in the m-th bucket (use overflows when necessary) Otherwise store k in the (m - 2 i -1 )-th bucket Increase r, if the r/n>l then: if n = 2 i, increase i by 1 (n) 2 =a 1 a 2 a i having a 1 =1 Clear the first bit in n and store it to m. a 1 a 2 a i 0a 2 a i Allocate the n-th block Move all the records from block m having the i-th rightmost bit as 1 into the n-th block. Increase n 72
73 Linear Hashing: Insert() (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. H i=1 n=2 r=3 r/n =
74 Linear Hashing: Insert() (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. H i=1 n=2 r=3 r/n = 1.5 H 1 () = 1 2 = 1 10 < 2 74
75 Linear Hashing: Insert() (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. H i=1 n=2 r=3 r/n = 1.5 H 1 () = 1 2 = 1 10 < 2 75
76 Linear Hashing: Insert() (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. Increase r H i=1 n=2 r=4 r/n = >1.7 H 1 () = 1 2 = 1 10 < 2 76
77 Linear Hashing: Insert() (2) i=1 n=2 r=4 r/n = 2 77
78 Linear Hashing: Insert() (2) n = 2 1, increase i by 1 (i=2) (n) 2 =10 (m) 2 = i=2 n=2 r=4 r/n = 2 78
79 Linear Hashing: Insert() (2) n = 2 1, increase i by 1 (i=2) (n) 2 =10 (m) 2 =00 Allocate the n-th block, having (n) 2 = i=2 n=2 r=4 r/n =
80 Linear Hashing: Insert() (2) n = 2 1, increase i by 1 (i=2) (n) 2 =10 (m) 2 =00 Allocate the n-th block, having (n) 2 =10 Move all the records from block m having the second rightmost bit as 1 into the n-th block. i=2 n=2 r= r/n = 2 80
81 Linear Hashing: Insert() (2) n = 2 1, increase i by 1 (i=2) (n) 2 =10 (m) 2 =00 Allocate the n-th block, having (n) 2 =10 Move all the records from block m having the second rightmost bit as 1 into the n-th block. i=2 Increase n n=3 r= r/n = 2 81
82 Linear Hashing: Insert() (2) n = 2 1, increase i by 1 (i=2) (n) 2 =10 (m) 2 =00 Allocate the n-th block, having (n) 2 =10 Move all the records from block m having the second rightmost bit as 1 into the n-th block. i=2 Increase n n=3 r= r/n = r/n 1.33 = <
83 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n =
84 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n = 1.33 H 2 (0001) = 01 2 = 1 10 < 3 84
85 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n = 1.33 H 2 (0001) = 01 2 = 1 10 < 3 85
86 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket H i=2 n=3 r=4 r/n = 1.33 H 2 (0001) = 01 2 = 1 10 < 3 86
87 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. Increase r 0001 H i=2 n=3 r=5 r/n = 1.33 H 2 (0001) = 01 2 = 1 10 < 3 87
88 Linear Hashing: Insert(0001) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. Increase r 0001 H i=2 n=3 r=5 r/n r/n = = < 1.7 H 2 (0001) = 01 2 = 1 10 < 3 88
89 Linear Hashing: Insert(0110) (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket i=2 n=3 r=5 r/n = H
90 Linear Hashing: Insert(0110) (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket i=2 n=3 r=5 r/n = H H 2 (0110) = 10 2 = 2 < 3 90
91 Linear Hashing: Insert(0110) (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket i=2 n=3 r=5 r/n = H H 2 (0110) = 10 2 = 2 < 3 91
92 Linear Hashing: Insert(0110) (1) Given n buckets (2 i-1 < n 2 i ). H i (K) = m < n K is in the m-th bucket. H i (K) = m n K is in the (m - 2 i-1 )-th bucket. Increase r 0110 H i=2 n=3 r=6 r/n r/n = 2 = > H 2 (0110) = 10 2 = 2 < 3 92
93 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 = i=2 n=3 r=6 r/n = 2 93
94 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 =01 Allocate the n-th block, having n = i=2 n=3 r=6 r/n = 2 94
95 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 =01 Allocate the n-th block, having n = 11 2 Move all the records from block m having the second rightmost bit as 1 into the n-th block i=2 n=3 r=6 r/n = 2 95
96 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 =01 Allocate the n-th block, having n = 11 2 Move all the records from block m having the second rightmost bit as 1 into the n-th block i=2 n=3 r=6 r/n = 2 96
97 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 =01 Allocate the n-th block, having n = 11 2 Move all the records from block m having the second rightmost bit as 1 into the n-th block. Increase n i=2 n= r= r/n = 2 97
98 Linear Hashing: Insert(0110) (2) n 2 2 (n) 2 =11 (m) 2 =01 Allocate the n-th block, having n = 11 2 Move all the records from block m having the second rightmost bit as 1 into the n-th block. Increase n i=2 n= r= r/n r/n = 1.5 = 2<
99 How many blocks are there? A 2 B 3 C 4 D i=2 n=3 r=
100 How many buckets are there? A 2 B 3 C 4 D i=2 n=3 r=
101 Information Retrieval Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). These days we frequently think first of web search, but there are many other cases: search Searching your laptop Corporate knowledge bases Domain specific search (i.e. Legal information retrieval) Graph databases
102 Basic Assumptions Document: unstructured file Collection: A set of documents Assume it is a static, non-hypertext collection for the moment Query: set (of sequence of) of keywords to express an information need Goal: Retrieve documents with information that is relevant to the user s information need and helps the user complete a task
103 Inverted indexes Inverted indexes are used to efficiently retrieve unstructured documents through fulltext queries. Such indices are inverted because the index all the documents where a given word appears. A document collection. D={d 1,,d n } A vocabulary V is a set of distinct terms in the document set An inverted index IX of a document collection is a vocabulary that attaches distinct terms with a list of all documents that contains the term. for each v in V for each d i in D s.t. v d i IX[v] = { (Di,j) d i [j] = v } 103
104 Generating the Inverted Index (1) word number D1 D2 Una mattina, svegliandosi da sogni inquieti, Gregor Samsa si trovò nel suo letto trasformato in un insetto mostruoso Voi che trovate tornando a casa il cibo caldo e visi amici Considerate se questo è un uomo da 4 Gregor 7 in 15 un 16 a 5 casa 6 che 2 D3 vidi un magnifico disegno. Rappresentava un serpente boa nell atto di inghiottire un animale animale 14 atto 10 boa 8 un 2, 6,
105 Generating the Inverted Index (1) a (D2,5) animale (D3,14) atto (D3,10) boa (D3,8) casa (D2,6) che (D2,2) da (D1,4) Gregor (D1,7) in (D1,15) un (D1,16), (D3,2), (D3,6), (D3,13) da 4 Gregor 7 in 15 un 16 a 5 casa 6 che 2 animale 14 atto 10 boa 8 un 2, 6,
106 Inverted Index: queries (1) Return documents containing un AND atto. a (D2,5) animale (D3,14) atto (D3,10) boa (D3,8) casa (D2,6) che (D2,2) da (D1,4) Gregor (D1,7) in (D1,15) un (D1,16), (D3,2), (D3,6), (D3,13) D3 D1, D3 D3 D3 vidi un magnifico disegno. Rappresentava un serpente boa nell atto di inghiottire un animale 106
107 Stemming Reduce terms to their roots before indexing and reduces the size of the inverted index. Stemming suggests crude postfix chopping language dependent e.g., automate(s), automatic, automation all reduced to automat. for example compressed and compression are both accepted as equivalent to compress. for exampl compress and compress ar both accept as equival to compress
108 Stop words With a stop list, you exclude from the dictionary entirely the commonest words. Intuition: They have little semantic content: the, a, and, to, be There are very frequent: ~30% of postings for top 30 words But the trend is away from doing this: It significantly reduces the size of an inverted index It allows query optimizations because the machine-representation of a query won t include such words. On the other hand, you need them for: Sentence Queries: King of Denmark Various song titles, etc.: Let it be, To be or not to be Relational queries: flights to London
109 Inverted Index: queries (2) Return documents containing un atto. a (D2,5) animale (D3,14) atto (D3,10) boa (D3,8) casa (D2,6) che (D2,2) da (D1,4) Gregor (D1,7) in (D1,15) un (D1,16), (D3,2), (D3,6), (D3,13) (D3,10) (D1,16), (D3,2), (D3,6), (D3,13) D3: 2, 10 6, 10 13,
110 Inverted Index: queries (3) Return documents containing un animale. a (D2,5) animale (D3,14) atto (D3,10) boa (D3,8) casa (D2,6) che (D2,2) da (D1,4) Gregor (D1,7) in (D1,15) un (D1,16), (D3,2), (D3,6), (D3,13) (D3,14) (D1,16), (D3,2), (D3,6), (D3,13) D3: 2, 14 6, 14 13, 14 D3 vidi un magnifico disegno. Rappresentava un serpente boa nell atto di inghiottire un animale 110
Advanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationCollision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST
Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory
More informationMotivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis
Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated
More informationInsert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park
1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution
More informationData Structures and Algorithm. Xiaoqing Zheng
Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn MULTIPOP top[s] = 6 top[s] = 2 3 2 8 5 6 5 S MULTIPOP(S, x). while not STACK-EMPTY(S) and k 0 2. do POP(S) 3. k k MULTIPOP(S, 4) Analysis
More informationFundamental Algorithms
Chapter 5: Hash Tables, Winter 2018/19 1 Fundamental Algorithms Chapter 5: Hash Tables Jan Křetínský Winter 2018/19 Chapter 5: Hash Tables, Winter 2018/19 2 Generalised Search Problem Definition (Search
More informationHashing. Hashing DESIGN & ANALYSIS OF ALGORITHM
Hashing Hashing Start with an array that holds the hash table. Use a hash function to take a key and map it to some index in the array. If the desired record is in the location given by the index, then
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationThe Boolean Model ~1955
The Boolean Model ~1955 The boolean model is the first, most criticized, and (until a few years ago) commercially more widespread, model of IR. Its functionalities can often be found in the Advanced Search
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More informationLecture 4: Divide and Conquer: van Emde Boas Trees
Lecture 4: Divide and Conquer: van Emde Boas Trees Series of Improved Data Structures Insert, Successor Delete Space This lecture is based on personal communication with Michael Bender, 001. Goal We want
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationCSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30
CSCB63 Winter 2019 Week10 - Lecture 2 - Hashing Anna Bretscher March 21, 2019 1 / 30 Today Hashing Open Addressing Hash functions Universal Hashing 2 / 30 Open Addressing Open Addressing. Each entry in
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationAlgorithms lecture notes 1. Hashing, and Universal Hash functions
Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationLecture: Analysis of Algorithms (CS )
Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect
More informationHashing. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing. Philip Bille
Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing Philip Bille Hashing Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing
More informationHashing. Hashing. Dictionaries. Dictionaries. Dictionaries Chained Hashing Universal Hashing Static Dictionaries and Perfect Hashing
Philip Bille Dictionaries Dictionary problem. Maintain a set S U = {,..., u-} supporting lookup(x): return true if x S and false otherwise. insert(x): set S = S {x} delete(x): set S = S - {x} Dictionaries
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationINTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class
Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More informationSearching. Constant time access. Hash function. Use an array? Better hash function? Hash function 4/18/2013. Chapter 9
Constant time access Searching Chapter 9 Linear search Θ(n) OK Binary search Θ(log n) Better Can we achieve Θ(1) search time? CPTR 318 1 2 Use an array? Use random access on a key such as a string? Hash
More informationGraduate Analysis of Algorithms Dr. Haim Levkowitz
UMass Lowell Computer Science 9.53 Graduate Analysis of Algorithms Dr. Haim Levkowitz Fall 27 Lecture 5 Tuesday, 2 Oct 27 Amortized Analysis Overview Amortize: To pay off a debt, usually by periodic payments
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationInformation Retrieval Using Boolean Model SEEM5680
Information Retrieval Using Boolean Model SEEM5680 1 Unstructured (text) vs. structured (database) data in 1996 2 2 Unstructured (text) vs. structured (database) data in 2009 3 3 The problem of IR Goal
More informationInteger Sorting on the word-ram
Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words
More informationCS246 Final Exam, Winter 2011
CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including
More informationSearching, mainly via Hash tables
Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing
More informationBloom Filters, general theory and variants
Bloom Filters: general theory and variants G. Caravagna caravagn@cli.di.unipi.it Information Retrieval Wherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.
More informationLecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing
Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of
More informationCSCE 561 Information Retrieval System Models
CSCE 561 Information Retrieval System Models Satya Katragadda 26 August 2015 Agenda Introduction to Information Retrieval Inverted Index IR System Models Boolean Retrieval Model 2 Introduction Information
More informationHashing Data Structures. Ananda Gunawardena
Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time
More informationA Lecture on Hashing. Aram-Alexandre Pooladian, Alexander Iannantuono March 22, Hashing. Direct Addressing. Operations - Simple
A Lecture on Hashing Aram-Alexandre Pooladian, Alexander Iannantuono March 22, 217 This is the scribing of a lecture given by Luc Devroye on the 17th of March 217 for Honours Algorithms and Data Structures
More informationCOMP251: Hashing. Jérôme Waldispühl School of Computer Science McGill University. Based on (Cormen et al., 2002)
COMP251: Hashing Jérôme Waldispühl School of Computer Science McGill University Based on (Cormen et al., 2002) Table S with n records x: Problem DefiniNon X Key[x] InformaNon or data associated with x
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationCounting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109
1 Chris Piech CS 109 Counting Lecture Notes #1 Sept 24, 2018 Based on a handout by Mehran Sahami with examples by Peter Norvig Although you may have thought you had a pretty good grasp on the notion of
More informationToday: Amortized Analysis
Today: Amortized Analysis COSC 581, Algorithms March 6, 2014 Many of these slides are adapted from several online sources Today s class: Chapter 17 Reading Assignments Reading assignment for next class:
More informationHash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a
Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).
More informationHashing. Data organization in main memory or disk
Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationHash tables. Hash tables
Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?
More informationSo far we have implemented the search for a key by carefully choosing split-elements.
7.7 Hashing Dictionary: S. insert(x): Insert an element x. S. delete(x): Delete the element pointed to by x. S. search(k): Return a pointer to an element e with key[e] = k in S if it exists; otherwise
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic
More informationboolean queries Inverted index query processing Query optimization boolean model January 15, / 35
boolean model January 15, 2017 1 / 35 Outline 1 boolean queries 2 3 4 2 / 35 taxonomy of IR models Set theoretic fuzzy extended boolean set-based IR models Boolean vector probalistic algebraic generalized
More informationLecture Notes for Chapter 17: Amortized Analysis
Lecture Notes for Chapter 17: Amortized Analysis Chapter 17 overview Amortized analysis Analyze a sequence of operations on a data structure. Goal: Show that although some individual operations may be
More informationHash Tables. Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing. CS 5633 Analysis of Algorithms Chapter 11: Slide 1
Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables 2 2 Let U = {0,...,m 1}, the set of
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationAmortized Analysis (chap. 17)
Amortized Analysis (chap. 17) Not just consider one operation, but a sequence of operations on a given data structure. Average cost over a sequence of operations. Probabilistic analysis: Average case running
More informationOn Two Class-Constrained Versions of the Multiple Knapsack Problem
On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic
More informationConfiguring Spatial Grids for Efficient Main Memory Joins
Configuring Spatial Grids for Efficient Main Memory Joins Farhan Tauheed, Thomas Heinis, and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne (EPFL), Imperial College London Abstract. The performance
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationIR: Information Retrieval
/ 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC
More informationHashing. Martin Babka. January 12, 2011
Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice
More informationHash-based Indexing: Application, Impact, and Realization Alternatives
: Application, Impact, and Realization Alternatives Benno Stein and Martin Potthast Bauhaus University Weimar Web-Technology and Information Systems Text-based Information Retrieval (TIR) Motivation Consider
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationLecture 8 HASHING!!!!!
Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)
More informationMining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationA General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY
A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)
More informationDivide-and-Conquer. Consequence. Brute force: n 2. Divide-and-conquer: n log n. Divide et impera. Veni, vidi, vici.
Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common usage. Break up problem of
More informationQuerying. 1 o Semestre 2008/2009
Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationCS4800: Algorithms & Data Jonathan Ullman
CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code
More informationN/4 + N/2 + N = 2N 2.
CS61B Summer 2006 Instructor: Erin Korber Lecture 24, 7 Aug. 1 Amortized Analysis For some of the data structures we ve discussed (namely hash tables and splay trees), it was claimed that the average time
More information1 Hashing. 1.1 Perfect Hashing
1 Hashing Hashing is covered by undergraduate courses like Algo I. However, there is much more to say on this topic. Here, we focus on two selected topics: perfect hashing and cockoo hashing. In general,
More information? 11.5 Perfect hashing. Exercises
11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Predecessor data structures We want to support
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationData Structures in Java
Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More informationmd5bloom: Forensic Filesystem Hashing Revisited
DIGITAL FORENSIC RESEARCH CONFERENCE md5bloom: Forensic Filesystem Hashing Revisited By Vassil Roussev, Timothy Bourg, Yixin Chen, Golden Richard Presented At The Digital Forensic Research Conference DFRWS
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Predecessor data structures We want to support the following operations on a set of integers from
More informationChapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn
Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap
More informationAnalysis of Algorithms I: Perfect Hashing
Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the
More informationLecture 5: Web Searching using the SVD
Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationHashing. Dictionaries Hashing with chaining Hash functions Linear Probing
Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Hashing Dictionaries Hashing with chaining Hash functions Linear Probing Dictionaries Dictionary: Maintain a dynamic set S. Every
More informationCS483 Design and Analysis of Algorithms
CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments
More informationQuiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)
Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem
More informationIntroduction to Hashtables
Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More information4 Locality-sensitive hashing using stable distributions
4 Locality-sensitive hashing using stable distributions 4. The LSH scheme based on s-stable distributions In this chapter, we introduce and analyze a novel locality-sensitive hashing family. The family
More informationAmortized analysis. Amortized analysis
In amortized analysis the goal is to bound the worst case time of a sequence of operations on a data-structure. If n operations take T (n) time (worst case), the amortized cost of an operation is T (n)/n.
More information13 Searching the Web with the SVD
13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More information9 Searching the Internet with the SVD
9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More informationCSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13
CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic
More information6.854 Advanced Algorithms
6.854 Advanced Algorithms Homework Solutions Hashing Bashing. Solution:. O(log U ) for the first level and for each of the O(n) second level functions, giving a total of O(n log U ) 2. Suppose we are using
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 2: Distributed Database Design Logistics Gradiance No action items for now Detailed instructions coming shortly First quiz to be released
More informationQuery CS347. Term-document incidence. Incidence vectors. Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia?
Query CS347 Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia? Lecture 1 April 4, 2001 Prabhakar Raghavan Term-document incidence Incidence vectors Antony and Cleopatra Julius
More information6.830 Lecture 11. Recap 10/15/2018
6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationCount-Min Tree Sketch: Approximate counting for NLP
Count-Min Tree Sketch: Approximate counting for NLP Guillaume Pitel, Geoffroy Fouquier, Emmanuel Marchand and Abdul Mouhamadsultane exensa firstname.lastname@exensa.com arxiv:64.5492v [cs.ir] 9 Apr 26
More informationAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /
More information