A Survey on Spatial-Keyword Search
|
|
- Dwain Fisher
- 5 years ago
- Views:
Transcription
1 A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012
2 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms Strengths and Weaknesses Conclusion 2
3 A Survey on Spatial-Keyword Search PROBLEM DEFINITION AND MOTIVATION 3
4 Problem Definition and Motivation Definition: Spatio-Textual Object (STO) A STO o is an object that consist of a textual description (o.text) and a spatial location (o.loc). Example o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 4
5 Problem Definition and Motivation Definition: Spatial-Keyword Search (SKQ) Given a spatial constraint Q.loc and a set of keywords Q.text, a SKQ Q returns a list (ranked or not) of the STOs whose location satisfies Q.loc and whose textual description is relevant to Q.text. o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 5
6 Problem Definition and Motivation Definition: Spatial-Keyword Search (SKQ) Given a spatial constraint Q.loc and a set of keywords Q.text, a SKQ Q returns a list (ranked or not) of the STOs whose location satisfies Q.loc and whose textual description is relevant to Q.text. o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) q o 2 ( Italian Restaurant ) Current Location: (x, y) Keywords: Restaurant 1-NN o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 6
7 A Survey on Spatial-Keyword Search QUERY TYPES 7
8 Query Types Basic Spatial-Keyword Range Query (SKR) Spatial-Keyword Nearest Neighbor Query (SKNN) Top-k Spatial Keyword Query (Top-k SK) Variations Approximation Spatial-Keyword Range Query (ASK Range) Top-k Spatial Boolean query (ksb) Spatial-Keyword Auto-Completion (SK AC) 8
9 Query Types (Basic) Spatial-Keyword Range Query (SKR) Given a region Q.loc and a set of keywords Q.text, a SKR query Q returns all STOs within Q.loc that contain all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. Ans(Q) = {o 1, o 6 } 9
10 Query Types (Basic) Spatial-Keyword Range Query (SKR) Given a region Q.loc and a set of keywords Q.text, a SKR query Q returns all STOs within Q.loc that contain all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. Ans(Q) = {o 1, o 6 } 10
11 Query Types (Basic) Spatial-Keyword Nearest Neighbor Query (SKNN) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a SKNN query Q returns the k nearest STOs to Q.loc, whose textual description contains all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (500, 200), Q.text = { Restaurant, Chinese }, k = 1. Ans(Q): {o 1 }. 11
12 Query Types (Basic) Spatial-Keyword Nearest Neighbor Query (SKNN) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a SKNN query Q returns the k nearest STOs to Q.loc, whose textual description contains all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) Q.loc o 3 ( Restaurante de France ) o 2 ( Italian Restaurant ) ( Chinese Restaurant ) o 6 o 4 ( Italian Ristorante ) Example Q.loc = (500, 200), Q.text = { Restaurant, Chinese }, k = 1. Ans(Q): {o 1 }. 12
13 Query Types (Basic) Top-k Spatial Keyword Query (Top-k SK) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a Top-k SK query Q returns a list of k STOs ranked according to a ranking function f based on their spatial proximity to Q.loc and on their textual similarity to Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example f = a f IR + (1-a) f SP, where a (0, 1), f IR : text similarity and f SP : spatial proximity. Q.loc = (500, 200), Q.text = { restaurant }, k = 2 and a = 0.5. Ans(Q) = {o 2, o 1 }. 13
14 Query Types (Basic) Top-k Spatial Keyword Query (Top-k SK) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a Top-k SK query Q returns a list of k STOs ranked according to a ranking function f based on their spatial proximity to Q.loc and on their textual similarity to Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) Q.loc o 3 ( Restaurante de France ) o 2 ( Italian Restaurant ) ( Chinese Restaurant ) o 6 o 4 ( Italian Ristorante ) Example f = a f IR + (1-a) f SP, where a (0, 1), f IR : text similarity and f SP : spatial proximity. Q.loc = (500, 200), Q.text = { restaurant }, k = 2 and a = 0.5. Ans(Q) = {o 2, o 1 }. 14
15 Query Types (Variations) Approximate Spatial-Keyword Range Query (ASK Range) Given a region Q.loc, a set of keywords Q.text and a positive integer t, an ASK Range query Q returns the STOs in Q.loc that contain similar strings to all keywords in Q.text with edit-distance threshold t o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)], Q.text = { Restaurant } and t = 1. Ans(Q) = {o 1, o 3, o 6 } 15
16 Query Types (Variations) Approximate Spatial-Keyword Range Query (ASK Range) Given a region Q.loc, a set of keywords Q.text and a positive integer t, an ASK Range query Q returns the STOs in Q.loc that contain similar strings to all keywords in Q.text with edit-distance threshold t o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)], Q.text = { Restaurant } and t = 1. Ans(Q) = {o 1, o 3, o 6 } 16
17 Query Types (Variations) Top-k Spatial Boolean query (ksb) Given a query point Q.loc, a Boolean expression e on some query keywords and a positive integer k, a ksb query Q returns the k nearest STOs whose textual description satisfies e o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (400, 300), e = restaurant Chinese and k = 2. Ans(Q): {o 5, o 2 }. 17
18 Query Types (Variations) Top-k Spatial Boolean query (ksb) Given a query point Q.loc, a Boolean expression e on some query keywords and a positive integer k, a ksb query Q returns the k nearest STOs whose textual description satisfies e o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) Q.loc o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (400, 300), e = restaurant Chinese and k = 2. Ans(Q): {o 5, o 2 }. 18
19 Query Types (Variations) Spatial-Keyword Auto-Completion (SK AC) Given a point Q.loc, a string Q.text and a positive integer k, a SK AC query Q returns a list of k STOs, where Q.text is a prefix of their textual description, ranked according to a ranking function f that is based on their spatial proximity to p and several other static scores like popularity and ratings o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Static scores are 1 for all STOs. Q.loc = (300, 0), Q.text = rest, k = 3. Ans(Q): {o 6, o 3, o 1 } 19
20 Query Types (Variations) Spatial-Keyword Auto-Completion (SK AC) Given a point Q.loc, a string Q.text and a positive integer k, a SK AC query Q returns a list of k STOs, where Q.text is a prefix of their textual description, ranked according to a ranking function f that is based on their spatial proximity to p and several other static scores like popularity and ratings o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) o 3 ( Restaurante de France ) Example 0 ( Chinese Restaurant ) o 6 Q.loc o 4 ( Italian Ristorante ) Static scores are 1 for all STOs. Q.loc = (300, 0), Q.text = rest, k = 3. Ans(Q): {o 6, o 3, o 1 } 20
21 Query Types Query Type Spatial Categorization IR Categorization ASK Range Proximity Query Range SKR SKNN Contains all keywords Joint SKNN ASK knn Proximity Query ksb knn Boolean Query Model SK AC Auto-completion Top-k SK LkPT MkSK Moving knn Ranked Query Model RSTkNN Reverse knn mck GSK min-sum GSK min-max Aggregate Contains at least one keyword 21
22 A Survey on Spatial-Keyword Search QUERY PROCESSING TECHNIQUES 22
23 Indices Indexing Approach Query Processing Techniques On Answering SKR Query Separate indices for spatial and textual attributes SI: Separate Indices Approach Word Occurrences [Martins et al. GIR, 2005] o 7 ( Shoes ) n 4 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) n 0 Athletic o 8 Bookstore o 10 Chinese o 1, o 6 Cinema o 9 France o 3 Greek o 5 n 3 o 9 ( Cinema ) n 1 o 1 ( Chinese Restaurant ) n 5 o 2 ( Italian Restaurant ) Italian o 2, o 4 Restaurant o 1, o 2, o 5, o 6 Restaurante o 3 Ristorante o 4 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 n 6 o 3 ( Restaurante de France ) n 2 o 4 ( Italian Ristorante ) Shoes o 7, o 8 n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 7 o 9 o 10 o 5 o 8 o 4 o 6 o 1 o 2 o 3 23
24 Query Processing Techniques On Answering SKR Query SI: Separate Indices Approach Query Processing 1 st Approach [Martins et al. GIR, 2005] Ans SP (Q.loc) = Answer to spatial constrains. Ans IR (Q.text) = Answer to textual constrains. Ans(Q) = Ans SP (Q.loc) Ans IR (Q.text) 2 nd Approach [Chen et al. SIGMOD, 2006] Ans SP (Q.loc) = Answer to spatial constrains. For each o Ans SP (Q.loc) check if o satisfies the textual constraints. Pros it is easy to maintain the two indices, it supports both textual, spatial and SKQ queries and updates and optimizations are performed in each index independently. Cons the pruning power of query processing algorithms is poor and leads to high I/O cost. 24
25 Indexing Approach Query Processing Techniques On Answering SKR Query Inverted Files + Grids: words occurrences contains spatial information. TS: Textual - Spatial Approach [Vaid et al. SSTD, 2005] o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) o 2 ( Italian Restaurant ) Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) Cell: c 5,1 25
26 Indexing Approach Query Processing Techniques On Answering SKR Query Inverted Files + Grids: words occurrences contains spatial information. TS: Textual - Spatial Approach [Vaid et al. SSTD, 2005] o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) o 2 ( Italian Restaurant ) Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) Cell: c 5,1 26
27 Query Processing Techniques On Answering SKR Query TS: Textual - Spatial Approach Query Processing Approach A = {c cid R c cid Q.loc } For each keyword t Q.text get the STOs such that their cell id is in A. Finally, get the intersection. Example: Q.loc = [(200, 0), (400, 200)] and Q.text = { Chinese, Restaurant }. A = {c 3,1, c 3,2, c 4,1, c 4,2 }. Evaluation: Better performance than SI approach. Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) 27
28 Indexing Approach Query Processing Techniques On Answering SKR Query Hybrid Index: Inverted File R*-Trees, R*-Tree Inverted Files [Zhou et al. CIKM, 2005] R*-Tree Inverted Files Inverted File R*-Trees n 0 Athletic..... Restaurant n 1 n 2 R*-Trees n 0 R*-Trees n 3 n 4 n 5 n 6 n 1 n 2 Bookstore Shoes Cinema {o 7 } {o 9 } {o 10 } Chinese Italian Restaurant Ristorante {o 6 } {o 4 } {o 6 } {o 4 } {o 1 } {o 2 } {o 5 } {o 6 } Athletic Greek Restaurant Shoes Chinese France Italian Restaurant Restaurante {o 8 } {o 5 } {o 8 } {o 5 } {o 3 } {o 2 } {o 1 } {o 1, o 2 } {o 3 } 28
29 Query Processing Techniques On Answering SKR Query Query Processing Example: Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. R*-Tree Inverted Files Inverted File R*-Trees n 0 Chinese..... Restaurant n 1 n 2 R*-Trees R*-Trees n 3 n 4 n 5 n 6 n 0 n 0 Bookstore Shoes Cinema Chinese Italian Restaurant Ristorante n 1 n 2 n 1 n 2 {o 7 } {o 9 } {o 10 } Athletic Greek Restaurant Shoes {o {o 4 } 6 } {o 6 } {o 4 } Chinese France Italian Restaurant Restaurante {o 1 } {o 6 } {o 1 } {o 2 } {o 5 } {o 6 } {o 8 } {o 5 } {o 8 } {o 5 } {o 3 } {o 2 } {o 1 } {o 1, o 2 } {o 3 } 29
30 Query Processing Techniques On Answering SKR Query R*-Tree Inverted Files & Inverted File R*-Trees Pros They reduce I/O cost and query processing. The Inverted File R*-Tree outperforms the R*-Tree Inverted File. Cons Inverted File R*-Tree: It does not exploit the spatial distribution of keywords. R*-Tree Inverted File: The pruning process is not powerful as we can get a lot of candidates from spatial restrictions. 30
31 Indexing Approach Query Processing Techniques On Answering SKR Query R*-Tree where the internal nodes are virtually augmented with textual information KR*-Tree: Keyword R*-Tree Approach [Hariharan et al. SSDBM, 2007] n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 1 o 2 o 3 o 7 o 9 o 10 o 5 o 8 o 4 o 6 Word Nodes Athletic n 0, n 1, n 4 Bookstore n 0, n 1, n 3 Chinese n 0, n 2, n 5, n 6 Cinema n 0, n 1, n 3 France n 0, n 2, n 5 Greek n 0, n 1, n 4 Italian n 0, n 2, n 5, n 6 Restaurant n 0, n 1, n 2, n 4, n 5, n 6 Restaurante n 0, n 2, n 5 Ristorante n 0, n 2, n 6 Shoes n 0, n 1, n 3, n 4 31
32 Query Processing Techniques On Answering SKR Query KR*-Tree Approach Query Processing Example: Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 1 o 2 o 3 o 7 o 9 o 10 o 5 o 8 o 4 o 6 Word Nodes Athletic n 0, n 1, n 4 Bookstore n 0, n 1, n 3 Chinese n 0, n 2, n 5, n 6 Cinema n 0, n 1, n 3 France n 0, n 2, n 5 Greek n 0, n 1, n 4 Italian n 0, n 2, n 5, n 6 Restaurant n 0, n 1, n 2, n 4, n 5, n 6 Restaurante n 0, n 2, n 5 Ristorante n 0, n 2, n 6 Shoes n 0, n 1, n 3, n 4 Experimental evaluation: More efficient than the previous approaches. 32
33 Indexing Approach Query Processing Techniques On Answering SKNN Query R-Tree & Signature Files IR 2 -Tree: Information Retrieval R-Tree Approach [De Felipe et al. ICDE, 2008] Block1: Signature Files A text has many words Hash(text) = Hash(many)= Hash(words)= Signature (Block1) = False Positives: Block2: OR A text has many Signature (Block2) = words Block n 0 n 1 n 2 n 3 n 4 n 5 n IR 2 -Tree: o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 33
34 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 0,
35 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 2,
36 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 5, 0.0 N 6,
37 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 6, o 1,
38 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: o 1, o 6,
39 Query Processing Techniques On Answering SKNN Query Multi-level IR 2 -Tree (MIR 2 -Tree) Uses different signature lengths for different levels. More complex update operations Fewer False Positives Experimental evaluation: MIR 2 -Tree is more efficient than IR 2 -Tree. False Positives. No comparison with other indices. 39
40 Indexing Approach Query Processing Techniques On Answering Top-k SK Query R-Tree where the internal nodes are augmented with textual & scoring information IR-Tree: Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Inverted file I 0 n 0 n 1 n 2 Inverted file I 1 Inverted file I 2 n 3 n 4 n 5 n 6 I 3 I 4 I 5 I 6 o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Inverted file I 3 Cinema: <o 9, 3> Bookstore: <o 10, 2> Shoes: <o 7, 2> Inverted file I 1 Cinema: <n 3, 3> Bookstore: <n 3, 2> Shoes: <n 3, 2>, <n 4, 3> Greek: <n 4, 7> Restaurant: <n 4, 2> 40
41 IR-Tree Approach Query Processing Techniques On Answering Top-k SK Query Query Processing Similar to the IR 2 -Tree query processing algorithm (Branch-and-Bound) They propose two metrics for scoring rectangles and objects. MIND ST (Q, N) D ST (Q, O) Both metrics are based on Euclidean distances and IR scores. Shortcoming n 1 Inverted file I 1 n 3 n 4 I 3 I 4 o 7 o 9 o 3 o 1 o 2 o 1 : restaurant, weight 5 o 2 : Chinese, weight 3 o 3 : restaurant, weight 4 o 3 : Chinese, weight 2 41
42 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: At construction time, take into account both the area parameter and the textual similarity between the indexed objects. DIR-Tree: Document Inverted file R-Tree Approach IR-Tree (similar to R-Tree) Area cost of inserting STO o into rectangle R: DIR-Tree [Cong et al. PVLDB, 2009] AreaCost(R) = area(r ) area(r) area(r ): area of R after inserting o SimAreaCost(R, o) = a AreaCost(R) maxarea + (1 a)(1 IRsim(R. text, o. text)) a (0, 1) Query Processing: Similar to IR-Tree. Experimental Evaluation: More efficient. 42
43 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: Enhance IR-Tree with clustering. CIR-Tree: Clustering Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Goal: Tighter bounds for rectangles i.e., MIND ST. STO where word w o.text STO where word w o.text N (node of IR-Tree) Query Processing: Similar to IR-Tree. Experimental Evaluation: Most efficient. 43
44 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: Enhance IR-Tree with clustering. CIR-Tree: Clustering Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Goal: Tighter bounds for rectangles i.e., MIND ST. C 1 C 2 STO where word w o.text STO where word w o.text N (node of IR-Tree) Query Processing: Similar to IR-Tree. Experimental Evaluation: Most efficient. 44
45 Indexing Approach Query Processing Techniques On Answering Top-k SK Query R-Tree where the internal nodes are augmented with textual & scoring information. IR-Tree2 d 0 n 0 [Li et al. TKDE, 2010] n 1 n 2 d 1 d 2 n 3 n 4 n 5 n 6 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 max tf Inverted file I 1 Shoes: o 7 d 1 < A 1 = [(40, 120), (280, 390)] D 1 = 5 M 1 = Cinema: 1, 1 Bookstore: 1, 1 Shoes: 2, 1 Greek: 1, 1 Restaurant: 1, 1 > df Cinema: 1, 1 45
46 Query Processing Techniques On Answering Top-k SK Query IR-Tree2 Query Processing! Q.loc = region, spatial proximity to the center of Q.loc Step 1: Compute idf = D Q.loc /D Q.loc,t, where D Q.loc is the number of STOs located in Q.loc and D Q.loc,t is the number of STOs in Q.loc s.t. t o.text. Step 2: Similar to IR-Tree. 46
47 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Inverted File Aggregate Spatial Index S2I: Spatial Inverted Index Approach [Rocha-Junior et al. SSTD, 2011] Infrequent term Frequent term Athletic Restaurant o 10, o 10.loc, 9. o 1, o 1.loc, 4 n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 47
48 Query Processing Techniques On Answering Top-k SK Query S2I Approach Query Processing Threshold Algorithm Round Robin Strategy Q.text = 2, k = 1 Let S i be a source responsible for the i th query keyword. Given the Q.loc and a positive integer j, S i returns the STO o with the j th highest score. Iteration j S 1 S 2 UB 1 UB 2 UB unseen 1 o 1 o o 4 o o 3 o o 9 o Experimental Evaluation: More efficient than DIR-Tree. No comparison with CIR-Tree. 48
49 A Survey on Spatial-Keyword Search CONCLUSION 49
50 Conclusion Spatial-keyword queries. Indexing schemes and the corresponding algorithms for query processing. Pros and cons of each proposed framework. There are still much left to cover on this topic There is no experimental evaluation that compares the frameworks properly. Spatial-keyword search on road networks. Combination of two indices! 50
51 Questions 51
Nearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationOptimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates
Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates X I A O YA N G W A N G 1, Y I N G Z H A N G 2, W E N J I E Z H A N G 1, X U E M I N L I N 1, M U H A M M A D A A M I R C H
More informationSpatial Keyword Search by using Point Regional QuadTree Algorithm
Spatial Keyword Search by using Point Regional QuadTree Algorithm Ms Harshitha A.R 1, Smt. Christy Persya A 2 PG Scholar, Department of ISE (SSE), BNMIT, Bangalore 560085, Karnataka, India 1 Associate
More informationIndexing Structures for Geographic Web Retrieval
Indexing Structures for Geographic Web Retrieval Leonardo Andrade 1 and Mário J. Silva 1 University of Lisboa, Faculty of Sciences Abstract. Context-aware search in mobile web environments demands new
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationOutline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting
Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient
More informationNearest Neighbor Search with Keywords
Nearest Neighbor Search with Keywords Yufei Tao KAIST June 3, 2013 In recent years, many search engines have started to support queries that combine keyword search with geography-related predicates (e.g.,
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationDealing with Text Databases
Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic
More informationEfficient Continuously Moving Top-K Spatial Keyword Query Processing
Efficient Continuously Moving Top-K Spatial Keyword Query Processing Dingming Wu Man Lung Yiu 2 Christian S. Jensen 3 Gao Cong 4 Department of Computer Science, Aalborg University, Denmark, dingming@cs.aau.dk
More informationPrototype for Finding Hotels in the Neighborhood using FNNS
Prototype for Finding Hotels in the Neighborhood using FNNS Pradeep N 1 1M.Tech Scholar,Dept. of CSE, Dr. AIT, Bangalore, India -------------------------------------------------------------------***---------------------------------------------------------------------
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationK-Nearest Neighbor Temporal Aggregate Queries
Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More informationSpatial Database. Ahmad Alhilal, Dimitris Tsaras
Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationP-LAG: Location-aware Group Recommendation for Passive Users
P-LAG: Location-aware Group Recommendation for Passive Users Yuqiu Qian, Ziyu Lu, Nikos Mamoulis, and David W. Cheung The University of Hong Kong {yqqian, zylu, dcheung}@cs.hku.hk University of Ioannina
More informationInformation Retrieval. Lecture 6
Information Retrieval Lecture 6 Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces This lecture
More informationEfficient Safe-Region Construction for Moving Top-K Spatial Keyword Queries
Efficient Safe-Region Construction for Moving Top-K Spatial Keyword Queries Weihuang Huang Guoliang Li Kian-Lee Tan Jianhua Feng Department of Computer Science and Technology, Tsinghua National Laboratory
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationQuerying. 1 o Semestre 2008/2009
Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the
More informationClassification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney
Email Classification Team Ravana Team Members Cliffton Fernandes Nikhil Keswaney Hello! Cliffton Fernandes MS-CS Nikhil Keswaney MS-CS 2 Topic Area Email Classification Spam: Unsolicited Junk Mail Ham
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationRecap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?
CS276A Information Retrieval Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces Lecture 7 This
More informationCS540 ANSWER SHEET
CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationPredictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data
Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com
More informationQuery Processing in Spatial Network Databases
Temporal and Spatial Data Management Fall 0 Query Processing in Spatial Network Databases SL06 Spatial network databases Shortest Path Incremental Euclidean Restriction Incremental Network Expansion Spatial
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections
More informationSPATIAL INDEXING. Vaibhav Bajpai
SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial
More informationRanking Sequential Patterns with Respect to Significance
Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking
More informationInformation Retrieval
Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method
More informationEvaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments
Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,
More informationLecture 2 September 4, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationHash-based Indexing: Application, Impact, and Realization Alternatives
: Application, Impact, and Realization Alternatives Benno Stein and Martin Potthast Bauhaus University Weimar Web-Technology and Information Systems Text-based Information Retrieval (TIR) Motivation Consider
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationPredicting Neighbor Goodness in Collaborative Filtering
Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:
More informationA Transformation-based Framework for KNN Set Similarity Search
SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 A Transformation-based Framewor for KNN Set Similarity Search Yong Zhang Member, IEEE, Jiacheng Wu, Jin Wang, Chunxiao Xing Member, IEEE
More informationCOMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from
COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard
More informationEfficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks
Efficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks Ugur Demiryurek, Farnoush Banaei-Kashani, and Cyrus Shahabi University of Southern California Department of Computer Science Los Angeles,
More information16 The Information Retrieval "Data Model"
16 The Information Retrieval "Data Model" 16.1 The general model Not presented in 16.2 Similarity the course! 16.3 Boolean Model Not relevant for exam. 16.4 Vector space Model 16.5 Implementation issues
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationTASM: Top-k Approximate Subtree Matching
TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,
More informationarxiv: v2 [cs.db] 5 May 2011
Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 arxiv:113.172v2 [cs.db] May
More informationMulti-Dimensional Top-k Dominating Queries
Noname manuscript No. (will be inserted by the editor) Multi-Dimensional Top-k Dominating Queries Man Lung Yiu Nikos Mamoulis Abstract The top-k dominating query returns k data objects which dominate the
More informationAlgorithms for Calculating Statistical Properties on Moving Points
Algorithms for Calculating Statistical Properties on Moving Points Dissertation Proposal Sorelle Friedler Committee: David Mount (Chair), William Gasarch Samir Khuller, Amitabh Varshney January 14, 2009
More informationRanked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus
More informationVector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics
More informationMulti-objective branch-and-cut algorithm and multi-modal traveling salesman problem
Multi-objective branch-and-cut algorithm and multi-modal traveling salesman problem Nicolas Jozefowiez 1, Gilbert Laporte 2, Frédéric Semet 3 1. LAAS-CNRS, INSA, Université de Toulouse, Toulouse, France,
More informationTDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1
TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf
More informationRanked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,
More informationNearest Neighbor Search with Keywords: Compression
Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationMining Strong Positive and Negative Sequential Patterns
Mining Strong Positive and Negative Sequential Patter NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG Department of Computer Science and Information Engineering Tamang University,
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationPavel Zezula, Giuseppe Amato,
SIMILARITY SEARCH The Metric Space Approach Pavel Zezula Giuseppe Amato Vlastislav Dohnal Michal Batko Table of Content Part I: Metric searching in a nutshell Foundations of metric space searching Survey
More informationHub, Authority and Relevance Scores in Multi-Relational Data for Query Search
Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,
More informationTrip Oriented Search on Activity Trajectory
Chen W, Zhao L, Xu JJ et al. Trip oriented search on activity trajectory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 3(4): 745 761 July 215. DOI 1.17/s1139-15-1558-6 Trip Oriented Search on Activity Trajectory
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationLecture 23 Branch-and-Bound Algorithm. November 3, 2009
Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal
More informationNearest Neighbor Search in Spatial
Nearest Neighbor Search in Spatial and Spatiotemporal Databases 1 Content 1. Conventional NN Queries 2. Reverse NN Queries 3. Aggregate NN Queries 4. NN Queries with Validity Information 5. Continuous
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationMining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data
Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series
More informationProbabilistic Cardinal Direction Queries On Spatio-Temporal Data
Probabilistic Cardinal Direction Queries On Spatio-Temporal Data Ganesh Viswanathan Midterm Project Report CIS 6930 Data Science: Large-Scale Advanced Data Analytics University of Florida September 3 rd,
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationRobotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard
Robotics 2 Data Association Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard Data Association Data association is the process of associating uncertain measurements to known tracks. Problem
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationcient and E ective Similarity Search based on Earth Mover s Distance
36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge
More informationTowards Collaborative Information Retrieval
Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationCompact Indexes for Flexible Top-k Retrieval
Compact Indexes for Flexible Top-k Retrieval Simon Gog Matthias Petri Institute of Theoretical Informatics, Karlsruhe Institute of Technology Computing and Information Systems, The University of Melbourne
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The
More informationModern Information Retrieval
Text Classification, Modern Information Retrieval, Addison Wesley, 2009 p. 1/141 Modern Information Retrieval Chapter 5 Text Classification Introduction A Characterization of Text Classification Algorithms
More informationTerm Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan
Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /
More informationOn Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada
On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei
More informationThe Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationTowards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann
Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;
More informationHigh-Dimensional Indexing by Distributed Aggregation
High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas
More informationCMPT 310 Artificial Intelligence Survey. Simon Fraser University Summer Instructor: Oliver Schulte
CMPT 310 Artificial Intelligence Survey Simon Fraser University Summer 2017 Instructor: Oliver Schulte Assignment 3: Chapters 13, 14, 18, 20. Probabilistic Reasoning and Learning Instructions: The university
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More information