A Survey on Spatial-Keyword Search

Size: px
Start display at page:

Download "A Survey on Spatial-Keyword Search"

Transcription

1 A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012

2 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms Strengths and Weaknesses Conclusion 2

3 A Survey on Spatial-Keyword Search PROBLEM DEFINITION AND MOTIVATION 3

4 Problem Definition and Motivation Definition: Spatio-Textual Object (STO) A STO o is an object that consist of a textual description (o.text) and a spatial location (o.loc). Example o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 4

5 Problem Definition and Motivation Definition: Spatial-Keyword Search (SKQ) Given a spatial constraint Q.loc and a set of keywords Q.text, a SKQ Q returns a list (ranked or not) of the STOs whose location satisfies Q.loc and whose textual description is relevant to Q.text. o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 5

6 Problem Definition and Motivation Definition: Spatial-Keyword Search (SKQ) Given a spatial constraint Q.loc and a set of keywords Q.text, a SKQ Q returns a list (ranked or not) of the STOs whose location satisfies Q.loc and whose textual description is relevant to Q.text. o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) q o 2 ( Italian Restaurant ) Current Location: (x, y) Keywords: Restaurant 1-NN o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Hong Kong Times Square 6

7 A Survey on Spatial-Keyword Search QUERY TYPES 7

8 Query Types Basic Spatial-Keyword Range Query (SKR) Spatial-Keyword Nearest Neighbor Query (SKNN) Top-k Spatial Keyword Query (Top-k SK) Variations Approximation Spatial-Keyword Range Query (ASK Range) Top-k Spatial Boolean query (ksb) Spatial-Keyword Auto-Completion (SK AC) 8

9 Query Types (Basic) Spatial-Keyword Range Query (SKR) Given a region Q.loc and a set of keywords Q.text, a SKR query Q returns all STOs within Q.loc that contain all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. Ans(Q) = {o 1, o 6 } 9

10 Query Types (Basic) Spatial-Keyword Range Query (SKR) Given a region Q.loc and a set of keywords Q.text, a SKR query Q returns all STOs within Q.loc that contain all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. Ans(Q) = {o 1, o 6 } 10

11 Query Types (Basic) Spatial-Keyword Nearest Neighbor Query (SKNN) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a SKNN query Q returns the k nearest STOs to Q.loc, whose textual description contains all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (500, 200), Q.text = { Restaurant, Chinese }, k = 1. Ans(Q): {o 1 }. 11

12 Query Types (Basic) Spatial-Keyword Nearest Neighbor Query (SKNN) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a SKNN query Q returns the k nearest STOs to Q.loc, whose textual description contains all keywords in Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) Q.loc o 3 ( Restaurante de France ) o 2 ( Italian Restaurant ) ( Chinese Restaurant ) o 6 o 4 ( Italian Ristorante ) Example Q.loc = (500, 200), Q.text = { Restaurant, Chinese }, k = 1. Ans(Q): {o 1 }. 12

13 Query Types (Basic) Top-k Spatial Keyword Query (Top-k SK) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a Top-k SK query Q returns a list of k STOs ranked according to a ranking function f based on their spatial proximity to Q.loc and on their textual similarity to Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example f = a f IR + (1-a) f SP, where a (0, 1), f IR : text similarity and f SP : spatial proximity. Q.loc = (500, 200), Q.text = { restaurant }, k = 2 and a = 0.5. Ans(Q) = {o 2, o 1 }. 13

14 Query Types (Basic) Top-k Spatial Keyword Query (Top-k SK) Given a query point Q.loc, a set of keywords Q.text and a positive integer k, a Top-k SK query Q returns a list of k STOs ranked according to a ranking function f based on their spatial proximity to Q.loc and on their textual similarity to Q.text o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) Q.loc o 3 ( Restaurante de France ) o 2 ( Italian Restaurant ) ( Chinese Restaurant ) o 6 o 4 ( Italian Ristorante ) Example f = a f IR + (1-a) f SP, where a (0, 1), f IR : text similarity and f SP : spatial proximity. Q.loc = (500, 200), Q.text = { restaurant }, k = 2 and a = 0.5. Ans(Q) = {o 2, o 1 }. 14

15 Query Types (Variations) Approximate Spatial-Keyword Range Query (ASK Range) Given a region Q.loc, a set of keywords Q.text and a positive integer t, an ASK Range query Q returns the STOs in Q.loc that contain similar strings to all keywords in Q.text with edit-distance threshold t o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)], Q.text = { Restaurant } and t = 1. Ans(Q) = {o 1, o 3, o 6 } 15

16 Query Types (Variations) Approximate Spatial-Keyword Range Query (ASK Range) Given a region Q.loc, a set of keywords Q.text and a positive integer t, an ASK Range query Q returns the STOs in Q.loc that contain similar strings to all keywords in Q.text with edit-distance threshold t o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = [(200, 0), (500, 300)], Q.text = { Restaurant } and t = 1. Ans(Q) = {o 1, o 3, o 6 } 16

17 Query Types (Variations) Top-k Spatial Boolean query (ksb) Given a query point Q.loc, a Boolean expression e on some query keywords and a positive integer k, a ksb query Q returns the k nearest STOs whose textual description satisfies e o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (400, 300), e = restaurant Chinese and k = 2. Ans(Q): {o 5, o 2 }. 17

18 Query Types (Variations) Top-k Spatial Boolean query (ksb) Given a query point Q.loc, a Boolean expression e on some query keywords and a positive integer k, a ksb query Q returns the k nearest STOs whose textual description satisfies e o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) Q.loc o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Q.loc = (400, 300), e = restaurant Chinese and k = 2. Ans(Q): {o 5, o 2 }. 18

19 Query Types (Variations) Spatial-Keyword Auto-Completion (SK AC) Given a point Q.loc, a string Q.text and a positive integer k, a SK AC query Q returns a list of k STOs, where Q.text is a prefix of their textual description, ranked according to a ranking function f that is based on their spatial proximity to p and several other static scores like popularity and ratings o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) Example Static scores are 1 for all STOs. Q.loc = (300, 0), Q.text = rest, k = 3. Ans(Q): {o 6, o 3, o 1 } 19

20 Query Types (Variations) Spatial-Keyword Auto-Completion (SK AC) Given a point Q.loc, a string Q.text and a positive integer k, a SK AC query Q returns a list of k STOs, where Q.text is a prefix of their textual description, ranked according to a ranking function f that is based on their spatial proximity to p and several other static scores like popularity and ratings o 7 ( Shoes ) o 9 ( Cinema ) o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 2 ( Italian Restaurant ) 100 o 10 ( bookstore ) o 3 ( Restaurante de France ) Example 0 ( Chinese Restaurant ) o 6 Q.loc o 4 ( Italian Ristorante ) Static scores are 1 for all STOs. Q.loc = (300, 0), Q.text = rest, k = 3. Ans(Q): {o 6, o 3, o 1 } 20

21 Query Types Query Type Spatial Categorization IR Categorization ASK Range Proximity Query Range SKR SKNN Contains all keywords Joint SKNN ASK knn Proximity Query ksb knn Boolean Query Model SK AC Auto-completion Top-k SK LkPT MkSK Moving knn Ranked Query Model RSTkNN Reverse knn mck GSK min-sum GSK min-max Aggregate Contains at least one keyword 21

22 A Survey on Spatial-Keyword Search QUERY PROCESSING TECHNIQUES 22

23 Indices Indexing Approach Query Processing Techniques On Answering SKR Query Separate indices for spatial and textual attributes SI: Separate Indices Approach Word Occurrences [Martins et al. GIR, 2005] o 7 ( Shoes ) n 4 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) n 0 Athletic o 8 Bookstore o 10 Chinese o 1, o 6 Cinema o 9 France o 3 Greek o 5 n 3 o 9 ( Cinema ) n 1 o 1 ( Chinese Restaurant ) n 5 o 2 ( Italian Restaurant ) Italian o 2, o 4 Restaurant o 1, o 2, o 5, o 6 Restaurante o 3 Ristorante o 4 o 10 ( bookstore ) ( Chinese Restaurant ) o 6 n 6 o 3 ( Restaurante de France ) n 2 o 4 ( Italian Ristorante ) Shoes o 7, o 8 n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 7 o 9 o 10 o 5 o 8 o 4 o 6 o 1 o 2 o 3 23

24 Query Processing Techniques On Answering SKR Query SI: Separate Indices Approach Query Processing 1 st Approach [Martins et al. GIR, 2005] Ans SP (Q.loc) = Answer to spatial constrains. Ans IR (Q.text) = Answer to textual constrains. Ans(Q) = Ans SP (Q.loc) Ans IR (Q.text) 2 nd Approach [Chen et al. SIGMOD, 2006] Ans SP (Q.loc) = Answer to spatial constrains. For each o Ans SP (Q.loc) check if o satisfies the textual constraints. Pros it is easy to maintain the two indices, it supports both textual, spatial and SKQ queries and updates and optimizations are performed in each index independently. Cons the pruning power of query processing algorithms is poor and leads to high I/O cost. 24

25 Indexing Approach Query Processing Techniques On Answering SKR Query Inverted Files + Grids: words occurrences contains spatial information. TS: Textual - Spatial Approach [Vaid et al. SSTD, 2005] o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) o 2 ( Italian Restaurant ) Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) Cell: c 5,1 25

26 Indexing Approach Query Processing Techniques On Answering SKR Query Inverted Files + Grids: words occurrences contains spatial information. TS: Textual - Spatial Approach [Vaid et al. SSTD, 2005] o 7 ( Shoes ) o 9 ( Cinema ) o 10 ( bookstore ) ( Chinese Restaurant ) o 6 o 8 ( Athletic Shoes ) o 5 ( Greek Restaurant ) o 1 ( Chinese Restaurant ) o 3 ( Restaurante de France ) o 4 ( Italian Ristorante ) o 2 ( Italian Restaurant ) Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) Cell: c 5,1 26

27 Query Processing Techniques On Answering SKR Query TS: Textual - Spatial Approach Query Processing Approach A = {c cid R c cid Q.loc } For each keyword t Q.text get the STOs such that their cell id is in A. Finally, get the intersection. Example: Q.loc = [(200, 0), (400, 200)] and Q.text = { Chinese, Restaurant }. A = {c 3,1, c 3,2, c 4,1, c 4,2 }. Evaluation: Better performance than SI approach. Word Occurrences Athletic (c 3,4, {o 8 }) Bookstore (c 1,2, {o 10 }) Chinese (c 4,3, {o 1 }), (c 3,1, {o 6 }) Cinema (c 1,3, {o 9 }) France (c 4,2, {o 3 }) Greek (c 3,4, {o 5 }) Italian (c 6,3, {o 2 }), (c 5,1, {o 4 }) Restaurant (c 4,3, {o 1 }), (c 6,3, {o 2 }), (c 3,4, {o 5 }), (c 3,1, {o 6 }) Restaurante (c 4,2, {o 3 }) Ristorante (c 5,1, {o 4 }) Shoes (c 1,4, {o 7 }), (c 3,4, {o 8 }) 27

28 Indexing Approach Query Processing Techniques On Answering SKR Query Hybrid Index: Inverted File R*-Trees, R*-Tree Inverted Files [Zhou et al. CIKM, 2005] R*-Tree Inverted Files Inverted File R*-Trees n 0 Athletic..... Restaurant n 1 n 2 R*-Trees n 0 R*-Trees n 3 n 4 n 5 n 6 n 1 n 2 Bookstore Shoes Cinema {o 7 } {o 9 } {o 10 } Chinese Italian Restaurant Ristorante {o 6 } {o 4 } {o 6 } {o 4 } {o 1 } {o 2 } {o 5 } {o 6 } Athletic Greek Restaurant Shoes Chinese France Italian Restaurant Restaurante {o 8 } {o 5 } {o 8 } {o 5 } {o 3 } {o 2 } {o 1 } {o 1, o 2 } {o 3 } 28

29 Query Processing Techniques On Answering SKR Query Query Processing Example: Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. R*-Tree Inverted Files Inverted File R*-Trees n 0 Chinese..... Restaurant n 1 n 2 R*-Trees R*-Trees n 3 n 4 n 5 n 6 n 0 n 0 Bookstore Shoes Cinema Chinese Italian Restaurant Ristorante n 1 n 2 n 1 n 2 {o 7 } {o 9 } {o 10 } Athletic Greek Restaurant Shoes {o {o 4 } 6 } {o 6 } {o 4 } Chinese France Italian Restaurant Restaurante {o 1 } {o 6 } {o 1 } {o 2 } {o 5 } {o 6 } {o 8 } {o 5 } {o 8 } {o 5 } {o 3 } {o 2 } {o 1 } {o 1, o 2 } {o 3 } 29

30 Query Processing Techniques On Answering SKR Query R*-Tree Inverted Files & Inverted File R*-Trees Pros They reduce I/O cost and query processing. The Inverted File R*-Tree outperforms the R*-Tree Inverted File. Cons Inverted File R*-Tree: It does not exploit the spatial distribution of keywords. R*-Tree Inverted File: The pruning process is not powerful as we can get a lot of candidates from spatial restrictions. 30

31 Indexing Approach Query Processing Techniques On Answering SKR Query R*-Tree where the internal nodes are virtually augmented with textual information KR*-Tree: Keyword R*-Tree Approach [Hariharan et al. SSDBM, 2007] n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 1 o 2 o 3 o 7 o 9 o 10 o 5 o 8 o 4 o 6 Word Nodes Athletic n 0, n 1, n 4 Bookstore n 0, n 1, n 3 Chinese n 0, n 2, n 5, n 6 Cinema n 0, n 1, n 3 France n 0, n 2, n 5 Greek n 0, n 1, n 4 Italian n 0, n 2, n 5, n 6 Restaurant n 0, n 1, n 2, n 4, n 5, n 6 Restaurante n 0, n 2, n 5 Ristorante n 0, n 2, n 6 Shoes n 0, n 1, n 3, n 4 31

32 Query Processing Techniques On Answering SKR Query KR*-Tree Approach Query Processing Example: Q.loc = [(200, 0), (500, 300)] and Q.text = { Chinese, Restaurant }. n 0 n 1 n 2 n 3 n 4 n 5 n 6 o 1 o 2 o 3 o 7 o 9 o 10 o 5 o 8 o 4 o 6 Word Nodes Athletic n 0, n 1, n 4 Bookstore n 0, n 1, n 3 Chinese n 0, n 2, n 5, n 6 Cinema n 0, n 1, n 3 France n 0, n 2, n 5 Greek n 0, n 1, n 4 Italian n 0, n 2, n 5, n 6 Restaurant n 0, n 1, n 2, n 4, n 5, n 6 Restaurante n 0, n 2, n 5 Ristorante n 0, n 2, n 6 Shoes n 0, n 1, n 3, n 4 Experimental evaluation: More efficient than the previous approaches. 32

33 Indexing Approach Query Processing Techniques On Answering SKNN Query R-Tree & Signature Files IR 2 -Tree: Information Retrieval R-Tree Approach [De Felipe et al. ICDE, 2008] Block1: Signature Files A text has many words Hash(text) = Hash(many)= Hash(words)= Signature (Block1) = False Positives: Block2: OR A text has many Signature (Block2) = words Block n 0 n 1 n 2 n 3 n 4 n 5 n IR 2 -Tree: o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 33

34 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 0,

35 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 2,

36 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 5, 0.0 N 6,

37 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: N 6, o 1,

38 Query Processing Techniques On Answering SKNN Query IR 2 -Tree Approach Query Processing n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Example: Q.loc = (500, 200), Q.text = { restaurant, Chinese } and k = 1. Signature(Q.text): Priority Queue: o 1, o 6,

39 Query Processing Techniques On Answering SKNN Query Multi-level IR 2 -Tree (MIR 2 -Tree) Uses different signature lengths for different levels. More complex update operations Fewer False Positives Experimental evaluation: MIR 2 -Tree is more efficient than IR 2 -Tree. False Positives. No comparison with other indices. 39

40 Indexing Approach Query Processing Techniques On Answering Top-k SK Query R-Tree where the internal nodes are augmented with textual & scoring information IR-Tree: Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Inverted file I 0 n 0 n 1 n 2 Inverted file I 1 Inverted file I 2 n 3 n 4 n 5 n 6 I 3 I 4 I 5 I 6 o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 Inverted file I 3 Cinema: <o 9, 3> Bookstore: <o 10, 2> Shoes: <o 7, 2> Inverted file I 1 Cinema: <n 3, 3> Bookstore: <n 3, 2> Shoes: <n 3, 2>, <n 4, 3> Greek: <n 4, 7> Restaurant: <n 4, 2> 40

41 IR-Tree Approach Query Processing Techniques On Answering Top-k SK Query Query Processing Similar to the IR 2 -Tree query processing algorithm (Branch-and-Bound) They propose two metrics for scoring rectangles and objects. MIND ST (Q, N) D ST (Q, O) Both metrics are based on Euclidean distances and IR scores. Shortcoming n 1 Inverted file I 1 n 3 n 4 I 3 I 4 o 7 o 9 o 3 o 1 o 2 o 1 : restaurant, weight 5 o 2 : Chinese, weight 3 o 3 : restaurant, weight 4 o 3 : Chinese, weight 2 41

42 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: At construction time, take into account both the area parameter and the textual similarity between the indexed objects. DIR-Tree: Document Inverted file R-Tree Approach IR-Tree (similar to R-Tree) Area cost of inserting STO o into rectangle R: DIR-Tree [Cong et al. PVLDB, 2009] AreaCost(R) = area(r ) area(r) area(r ): area of R after inserting o SimAreaCost(R, o) = a AreaCost(R) maxarea + (1 a)(1 IRsim(R. text, o. text)) a (0, 1) Query Processing: Similar to IR-Tree. Experimental Evaluation: More efficient. 42

43 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: Enhance IR-Tree with clustering. CIR-Tree: Clustering Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Goal: Tighter bounds for rectangles i.e., MIND ST. STO where word w o.text STO where word w o.text N (node of IR-Tree) Query Processing: Similar to IR-Tree. Experimental Evaluation: Most efficient. 43

44 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Enhanced indexing: Enhance IR-Tree with clustering. CIR-Tree: Clustering Inverted file R-Tree Approach [Cong et al. PVLDB, 2009] Goal: Tighter bounds for rectangles i.e., MIND ST. C 1 C 2 STO where word w o.text STO where word w o.text N (node of IR-Tree) Query Processing: Similar to IR-Tree. Experimental Evaluation: Most efficient. 44

45 Indexing Approach Query Processing Techniques On Answering Top-k SK Query R-Tree where the internal nodes are augmented with textual & scoring information. IR-Tree2 d 0 n 0 [Li et al. TKDE, 2010] n 1 n 2 d 1 d 2 n 3 n 4 n 5 n 6 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 max tf Inverted file I 1 Shoes: o 7 d 1 < A 1 = [(40, 120), (280, 390)] D 1 = 5 M 1 = Cinema: 1, 1 Bookstore: 1, 1 Shoes: 2, 1 Greek: 1, 1 Restaurant: 1, 1 > df Cinema: 1, 1 45

46 Query Processing Techniques On Answering Top-k SK Query IR-Tree2 Query Processing! Q.loc = region, spatial proximity to the center of Q.loc Step 1: Compute idf = D Q.loc /D Q.loc,t, where D Q.loc is the number of STOs located in Q.loc and D Q.loc,t is the number of STOs in Q.loc s.t. t o.text. Step 2: Similar to IR-Tree. 46

47 Indexing Approach Query Processing Techniques On Answering Top-k SK Query Inverted File Aggregate Spatial Index S2I: Spatial Inverted Index Approach [Rocha-Junior et al. SSTD, 2011] Infrequent term Frequent term Athletic Restaurant o 10, o 10.loc, 9. o 1, o 1.loc, 4 n n 1 n n 3 n 4 n 5 n o 7 o 9 o 10 o 5 o 8 o 1 o 2 o 3 o 4 o 6 47

48 Query Processing Techniques On Answering Top-k SK Query S2I Approach Query Processing Threshold Algorithm Round Robin Strategy Q.text = 2, k = 1 Let S i be a source responsible for the i th query keyword. Given the Q.loc and a positive integer j, S i returns the STO o with the j th highest score. Iteration j S 1 S 2 UB 1 UB 2 UB unseen 1 o 1 o o 4 o o 3 o o 9 o Experimental Evaluation: More efficient than DIR-Tree. No comparison with CIR-Tree. 48

49 A Survey on Spatial-Keyword Search CONCLUSION 49

50 Conclusion Spatial-keyword queries. Indexing schemes and the corresponding algorithms for query processing. Pros and cons of each proposed framework. There are still much left to cover on this topic There is no experimental evaluation that compares the frameworks properly. Spatial-keyword search on road networks. Combination of two indices! 50

51 Questions 51

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates

Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates X I A O YA N G W A N G 1, Y I N G Z H A N G 2, W E N J I E Z H A N G 1, X U E M I N L I N 1, M U H A M M A D A A M I R C H

More information

Spatial Keyword Search by using Point Regional QuadTree Algorithm

Spatial Keyword Search by using Point Regional QuadTree Algorithm Spatial Keyword Search by using Point Regional QuadTree Algorithm Ms Harshitha A.R 1, Smt. Christy Persya A 2 PG Scholar, Department of ISE (SSE), BNMIT, Bangalore 560085, Karnataka, India 1 Associate

More information

Indexing Structures for Geographic Web Retrieval

Indexing Structures for Geographic Web Retrieval Indexing Structures for Geographic Web Retrieval Leonardo Andrade 1 and Mário J. Silva 1 University of Lisboa, Faculty of Sciences Abstract. Context-aware search in mobile web environments demands new

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Nearest Neighbor Search with Keywords

Nearest Neighbor Search with Keywords Nearest Neighbor Search with Keywords Yufei Tao KAIST June 3, 2013 In recent years, many search engines have started to support queries that combine keyword search with geography-related predicates (e.g.,

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

Dealing with Text Databases

Dealing with Text Databases Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Efficient Continuously Moving Top-K Spatial Keyword Query Processing

Efficient Continuously Moving Top-K Spatial Keyword Query Processing Efficient Continuously Moving Top-K Spatial Keyword Query Processing Dingming Wu Man Lung Yiu 2 Christian S. Jensen 3 Gao Cong 4 Department of Computer Science, Aalborg University, Denmark, dingming@cs.aau.dk

More information

Prototype for Finding Hotels in the Neighborhood using FNNS

Prototype for Finding Hotels in the Neighborhood using FNNS Prototype for Finding Hotels in the Neighborhood using FNNS Pradeep N 1 1M.Tech Scholar,Dept. of CSE, Dr. AIT, Bangalore, India -------------------------------------------------------------------***---------------------------------------------------------------------

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying

More information

K-Nearest Neighbor Temporal Aggregate Queries

K-Nearest Neighbor Temporal Aggregate Queries Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

Spatial Database. Ahmad Alhilal, Dimitris Tsaras Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

P-LAG: Location-aware Group Recommendation for Passive Users

P-LAG: Location-aware Group Recommendation for Passive Users P-LAG: Location-aware Group Recommendation for Passive Users Yuqiu Qian, Ziyu Lu, Nikos Mamoulis, and David W. Cheung The University of Hong Kong {yqqian, zylu, dcheung}@cs.hku.hk University of Ioannina

More information

Information Retrieval. Lecture 6

Information Retrieval. Lecture 6 Information Retrieval Lecture 6 Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces This lecture

More information

Efficient Safe-Region Construction for Moving Top-K Spatial Keyword Queries

Efficient Safe-Region Construction for Moving Top-K Spatial Keyword Queries Efficient Safe-Region Construction for Moving Top-K Spatial Keyword Queries Weihuang Huang Guoliang Li Kian-Lee Tan Jianhua Feng Department of Computer Science and Technology, Tsinghua National Laboratory

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Querying. 1 o Semestre 2008/2009

Querying. 1 o Semestre 2008/2009 Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the

More information

Classification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney

Classification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney Email Classification Team Ravana Team Members Cliffton Fernandes Nikhil Keswaney Hello! Cliffton Fernandes MS-CS Nikhil Keswaney MS-CS 2 Topic Area Email Classification Spam: Unsolicited Junk Mail Ham

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors? CS276A Information Retrieval Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces Lecture 7 This

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com

More information

Query Processing in Spatial Network Databases

Query Processing in Spatial Network Databases Temporal and Spatial Data Management Fall 0 Query Processing in Spatial Network Databases SL06 Spatial network databases Shortest Path Incremental Euclidean Restriction Incremental Network Expansion Spatial

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections

More information

SPATIAL INDEXING. Vaibhav Bajpai

SPATIAL INDEXING. Vaibhav Bajpai SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial

More information

Ranking Sequential Patterns with Respect to Significance

Ranking Sequential Patterns with Respect to Significance Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking

More information

Information Retrieval

Information Retrieval Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments

Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Hash-based Indexing: Application, Impact, and Realization Alternatives

Hash-based Indexing: Application, Impact, and Realization Alternatives : Application, Impact, and Realization Alternatives Benno Stein and Martin Potthast Bauhaus University Weimar Web-Technology and Information Systems Text-based Information Retrieval (TIR) Motivation Consider

More information

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2

More information

Predicting Neighbor Goodness in Collaborative Filtering

Predicting Neighbor Goodness in Collaborative Filtering Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:

More information

A Transformation-based Framework for KNN Set Similarity Search

A Transformation-based Framework for KNN Set Similarity Search SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 A Transformation-based Framewor for KNN Set Similarity Search Yong Zhang Member, IEEE, Jiacheng Wu, Jin Wang, Chunxiao Xing Member, IEEE

More information

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard

More information

Efficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks

Efficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks Efficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks Ugur Demiryurek, Farnoush Banaei-Kashani, and Cyrus Shahabi University of Southern California Department of Computer Science Los Angeles,

More information

16 The Information Retrieval "Data Model"

16 The Information Retrieval Data Model 16 The Information Retrieval "Data Model" 16.1 The general model Not presented in 16.2 Similarity the course! 16.3 Boolean Model Not relevant for exam. 16.4 Vector space Model 16.5 Implementation issues

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

TASM: Top-k Approximate Subtree Matching

TASM: Top-k Approximate Subtree Matching TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,

More information

arxiv: v2 [cs.db] 5 May 2011

arxiv: v2 [cs.db] 5 May 2011 Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 arxiv:113.172v2 [cs.db] May

More information

Multi-Dimensional Top-k Dominating Queries

Multi-Dimensional Top-k Dominating Queries Noname manuscript No. (will be inserted by the editor) Multi-Dimensional Top-k Dominating Queries Man Lung Yiu Nikos Mamoulis Abstract The top-k dominating query returns k data objects which dominate the

More information

Algorithms for Calculating Statistical Properties on Moving Points

Algorithms for Calculating Statistical Properties on Moving Points Algorithms for Calculating Statistical Properties on Moving Points Dissertation Proposal Sorelle Friedler Committee: David Mount (Chair), William Gasarch Samir Khuller, Amitabh Varshney January 14, 2009

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus

More information

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics

More information

Multi-objective branch-and-cut algorithm and multi-modal traveling salesman problem

Multi-objective branch-and-cut algorithm and multi-modal traveling salesman problem Multi-objective branch-and-cut algorithm and multi-modal traveling salesman problem Nicolas Jozefowiez 1, Gilbert Laporte 2, Frédéric Semet 3 1. LAAS-CNRS, INSA, Université de Toulouse, Toulouse, France,

More information

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,

More information

Nearest Neighbor Search with Keywords: Compression

Nearest Neighbor Search with Keywords: Compression Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Mining Strong Positive and Negative Sequential Patterns

Mining Strong Positive and Negative Sequential Patterns Mining Strong Positive and Negative Sequential Patter NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG Department of Computer Science and Information Engineering Tamang University,

More information

Alphabet Friendly FM Index

Alphabet Friendly FM Index Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM

More information

Pavel Zezula, Giuseppe Amato,

Pavel Zezula, Giuseppe Amato, SIMILARITY SEARCH The Metric Space Approach Pavel Zezula Giuseppe Amato Vlastislav Dohnal Michal Batko Table of Content Part I: Metric searching in a nutshell Foundations of metric space searching Survey

More information

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,

More information

Trip Oriented Search on Activity Trajectory

Trip Oriented Search on Activity Trajectory Chen W, Zhao L, Xu JJ et al. Trip oriented search on activity trajectory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 3(4): 745 761 July 215. DOI 1.17/s1139-15-1558-6 Trip Oriented Search on Activity Trajectory

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

Nearest Neighbor Search in Spatial

Nearest Neighbor Search in Spatial Nearest Neighbor Search in Spatial and Spatiotemporal Databases 1 Content 1. Conventional NN Queries 2. Reverse NN Queries 3. Aggregate NN Queries 4. NN Queries with Validity Information 5. Continuous

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series

More information

Probabilistic Cardinal Direction Queries On Spatio-Temporal Data

Probabilistic Cardinal Direction Queries On Spatio-Temporal Data Probabilistic Cardinal Direction Queries On Spatio-Temporal Data Ganesh Viswanathan Midterm Project Report CIS 6930 Data Science: Large-Scale Advanced Data Analytics University of Florida September 3 rd,

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard Robotics 2 Data Association Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard Data Association Data association is the process of associating uncertain measurements to known tracks. Problem

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

cient and E ective Similarity Search based on Earth Mover s Distance

cient and E ective Similarity Search based on Earth Mover s Distance 36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge

More information

Towards Collaborative Information Retrieval

Towards Collaborative Information Retrieval Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Compact Indexes for Flexible Top-k Retrieval

Compact Indexes for Flexible Top-k Retrieval Compact Indexes for Flexible Top-k Retrieval Simon Gog Matthias Petri Institute of Theoretical Informatics, Karlsruhe Institute of Technology Computing and Information Systems, The University of Melbourne

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The

More information

Modern Information Retrieval

Modern Information Retrieval Text Classification, Modern Information Retrieval, Addison Wesley, 2009 p. 1/141 Modern Information Retrieval Chapter 5 Text Classification Introduction A Characterization of Text Classification Algorithms

More information

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

Lecture 1b: Text, terms, and bags of words

Lecture 1b: Text, terms, and bags of words Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection

More information

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

More information

High-Dimensional Indexing by Distributed Aggregation

High-Dimensional Indexing by Distributed Aggregation High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas

More information

CMPT 310 Artificial Intelligence Survey. Simon Fraser University Summer Instructor: Oliver Schulte

CMPT 310 Artificial Intelligence Survey. Simon Fraser University Summer Instructor: Oliver Schulte CMPT 310 Artificial Intelligence Survey Simon Fraser University Summer 2017 Instructor: Oliver Schulte Assignment 3: Chapters 13, 14, 18, 20. Probabilistic Reasoning and Learning Instructions: The university

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information