Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates

Size: px

Start display at page:

Download "Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates"

Camilla Bond
6 years ago
Views:

3 1. T H E U N I V E R S I T Y O F N E W S O U T H WA L E S, A U S T R A L I A 2.

1 Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates X I A O YA N G W A N G 1, Y I N G Z H A N G 2, W E N J I E Z H A N G 1, X U E M I N L I N 1, M U H A M M A D A A M I R C H E E M A 3 1. T H E U N I V E R S I T Y O F N E W S O U T H WA L E S, A U S T R A L I A 2. U N I V E R S I T Y O F T E C H N O L O G Y, S Y D N E Y, A U S T R A L I A 3. M O N A S H U N I V E R S I T Y, A U S T R A L I A

2 Outline Introduction Related Work Spatial Dominance Operators Experiments Conclusion 2

3 Introduction Objects with multiple instances are widely used. Uncertain object (instances are exclusive), e.g., uncertain spatial objects. Multi-valued object (instances are co-occurrence), e.g., NBA player records. Player C a a c Player D longitude a a rebound d c c d b d c latitude score 3

4 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. Easy for objects with single instance when metric is given. Query L2 4

5 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. Easy for objects with single instance when metric is given. Query L2 Query L2 5

6 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. Easy for objects with single instance when metric is given. Query L2 Query L2 6

7 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. Min/Max, Expected, Quantile. NN probability, Expected rank. Earth Mover s, Netflow. 7

8 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. N 1 : all pairs based NN for multi-valued or uncertain object E.g., Min/Max, Expected, Quantile. N 2 : possible world based NN for uncertain object E.g., NN probability, Expected rank. N 3 : selected pairs based NN for multi-valued or uncertain object E.g., Earth Mover s, Netflow. 8

9 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. N 1 : all pairs based NN for multi-valued or uncertain object E.g., Min/Max, Expected, Quantile. N 2 : possible world based NN for uncertain object E.g., NN probability, Expected rank. N 3 : selected pairs based NN for multi-valued or uncertain object E.g., Earth Mover s, Netflow. 9

10 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 1. Expected (all pairs based NN) q a a b A Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q E dis A, Q = = 5.75 b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q B Q E dis B, Q = =

11 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 2. NN probability (possible world based NN) a a q b A Q B Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q q

12 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 2. NN probability (possible world based NN) a a q q b A Q B Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q W 1 : a 1 b 1 q 1 -> p a,w1 = 1/8, p b,w1 = 0 12

13 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 2. NN probability (possible world based NN) a a q q b A Q B Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q W 1 : a 1 b 1 q 1 -> p a,w1 = 1/8, p b,w1 = 0 W 2 : a 2 b 1 q 1 -> p a,w2 = 0, p b,w2 = 1/8 13

14 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 2. NN probability (possible world based NN) a a q q b A Q B Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q W 1 : a 1 b 1 q 1 -> p a,w1 = 1/8, p b,w1 = 0 W 2 : a 2 b 1 q 1 -> p a,w2 = 0, p b,w2 = 1/8 => W 8 : a 2 b 2 q 2 -> p a,w8 = 1/8, p b,w8 = 0 NN p A, Q = p a,wi = 6/8 NN p B, Q = p b,wi = 2/8 14

15 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 3. Earth Mover s (selected pairs based NN ) a A Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q a b a 1 1, 0.5 q 1 q q a 2 9, 0.5 q 2 15

16 Introduction Nearest Neighbour (NN) search: Given a query object Q, return the nearest object to the query. There are a lot of NN ranking functions for objects with multiple instances. 3. Earth Mover s (selected pairs based NN ) a A Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q a b a 1 1, 0.5 q 1 q => EMD(A, Q) = 1*0.5+9*0.5 = 5 q a 2 9, 0.5 q 2 16

17 Introduction There are a lot of NN ranking functions for objects with multiple instances. The parameters in NN function can be set to infinite value, e.g., quantile. 17

18 Introduction There are a lot of NN ranking functions for objects with multiple instances. The parameters in NN function can be set to infinite value, e.g., quantile. Motivation A user may not have a specific NN function in mind, it is desirable to provide her with a set of NN candidates. 18

19 Introduction A spatial dominate (SD) operator is used to define the partial order between objects regarding a query Q, i.e., SD(A, B, Q) means A dominates B, otherwise SD(A, B, Q). Given a SD operator, the NN candidate set consists of the objects that are not dominated by any other objects. Optimal SD operator w.r.t a family F of NN functions Correctness: SD(A, B, Q) f F, that f (A) f (B) Completeness: SD(A, B, Q) f F, that f (A) > f (B) 19

20 Related Work Full dominance operator (F-SD) We have FSD(A,B,Q), if and only if q Q, max A, q min (B, q) (SIGMOD10, SIGMOD14) Cheng Long, et. al, The Hong Kong University of Science and Technology Tobias Emrich, et. al, Institute for Informatics, Ludwig-Maximilians-Universität München FSD B, A, Q FSD(B, C, Q) C B c1 q1 Q Object C b1 A Over pessimistic: contain many redundant objects (no completeness property) 20

21 Related Work NN Core (TKDE10) Sze Man Yuen, et. al, Chinese University of Hong Kong A B: if A has higher chance to be closer to the query than B; NN candidates : Minimal set of objects, each of which beats any object not in the NN candidates. NN-Core A Object C a q c b c B C b a Too aggressive: violate correctness property 21

22 Related Work Stochastic order has been widely used in various domains to compare the goodness of two random variables distributions. Stochastic Order. Given two independent random variables X and Y, we say X is smaller than Y in usual stochastic order, denoted by X st Y, if Pr(X λ) Pr(Y λ) for every λ R. A Q a 1 q 1 a 1 q 2 a 2 q 1 a 2 q B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q Pr A Q 1.5 = 0.25; Pr B Q 1.5 = 0 Pr A Q 10 = 1; Pr B Q 10 =

23 Problem Definition Objects with multiple instances An object U consists of a set {u i } of instances, and a discrete probability mass function assigns each instance u i a probability value, denoted by p(u i ), where p u i = 1. The multi-valued objects are treated as discrete random variable if their weight can be normalized. Problem statement Devise spatial dominance (SD) operators by carefully considering various NN function families. 23

24 Classify NN Functions N 1 : all pairs based NN Aggregation over pair-wise s g(a Q ), e.g., Min/Max, Expected, Quantile. N 2 : possible world based NN Aggregation over score on each possible world g(a W ), e.g., NN probability, Expected rank. N 3 : selected pairs based NN Aggregation over selected pairs g(s(a Q )), e.g., Earth Mover s, Netflow. 24

25 SD Operator Stochastic-SD (S-SD, opt. w.r.t N 1 ) Given two objects U and V, and the query Q, we have SSD(U,V,Q) if and only if U Q st V Q and U Q V Q. Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 25

26 SD Operator Stochastic-SD (S-SD, opt. w.r.t N 1 ) Given two objects U and V, and the query Q, we have SSD(U,V,Q) if and only if U Q st V Q and U Q V Q. Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 26

27 SD Operator Stochastic-SD (S-SD, opt. w.r.t N 1 ) Given two objects U and V, and the query Q, we have SSD(U,V,Q) if and only if U Q st V Q and U Q V Q. Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 27

28 SD Operator Stochastic-SD (S-SD, opt. w.r.t N 1 ) Given two objects U and V, and the query Q, we have SSD(U,V,Q) if and only if U Q st V Q and U Q V Q. Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 28

29 SD Operator Strict Stochastic-SD (SS-SD, opt. w.r.t N 1,2 ) Given two objects U and V, and the query Q, we have SSSD(U,V,Q) if and only if U q st V q for q Q and U Q V Q. A q1 A q2 Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 C q2 C q1 29

30 SD Operator Strict Stochastic-SD (SS-SD, opt. w.r.t N 1,2 ) Given two objects U and V, and the query Q, we have SSSD(U,V,Q) if and only if U q st V q for q Q and U Q V Q. A q1 A q2 Object C A Q a 1 q 1 a 2 q 1 a 1 q 2 a 2 q 2 a a b B Q b 1 q 1 b 1 q 2 b 2 q 1 b 2 q 2 q q c c C Q c 1 q 2 c 2 q 2 c 1 q 1 c 2 q 1 C q2 C q1 30

31 SD Operator Peer-SD (P-SD, opt. w.r.t N 1,2,3 ) Given two objects U and V, and the query Q, we have PSD(U,V,Q) if there is a mapping between U and V, for each pair <u,v>, u is always closer to Q than v with same weight. b b a a a a q q q q

32 SD Operator Peer-SD (P-SD, opt. w.r.t N 1,2,3 ) Given two objects U and V, and the query Q, we have PSD(U,V,Q) if there is a mapping between U and V, for each pair <u,v>, u is always closer to Q than v with same weight. b b b a a a a q q a q q

33 SD Operator Peer-SD (P-SD, opt. w.r.t N 1,2,3 ) Given two objects U and V, and the query Q, we have PSD(U,V,Q) if there is a mapping between U and V, for each pair <u,v>, u is always closer to Q than v with same weight. b b b a a a a q q a q q

34 P-SD Check P-SD check Naively have to check all the mapping. The P-SD check between U and V can be reduced to compute the network flow problem, PSD(U,V,Q) iff the network flow is 1 b a a s a 1 b 1 t q a 2 b 2 q

35 P-SD Check P-SD check Naively have to check all the mapping. The P-SD check between U and V can be reduced to compute the network flow problem, PSD(U,V,Q) iff the network flow is 1 b a a s a 1 b 1 t q a 2 b 2 q

36 SD Operator Properties SD operator containment NNC(O, Q, S-SD) NNC(O, Q, SS-SD) NNC(O, Q, P-SD) NNC(O, Q, F-SD) F-SD based NN P-SD based NN SS-SD based NN S-SD based NN N 1,2,3 N 1,2,3 N 1,2 N 1 NN candidate search Based on branch and bound framework like skyline search 36

37 Experiments Compare Algorithm SSD, SSSD, PSD, FSD and F + SD Datasets: real dataset: NBA, Gowalla; semi-real dataset: House, CA, USA; synthetic dataset. Evaluation parameter Values dimensionality d 2, 3, 4, 5 # of objects n 100k, 200k, 400k, 600k, 1M # of object instances m d 20, 40, 60, 80, 100 edge length of object h d 100, 200, 300, 400, 500 object center distribution anti (A), indep (E) # of query instances m q 10, 20, 30, 40, 50 edge length of query h q 100, 200, 300, 400,

38 Experiments (a) Candidate Size of Different Datasets (b) Response Time of Different Datasets 38

39 Experiments (a) Varying n (c) Varying n (b) Varying h d (d) Varying h d 39

40 Conclusion Formalize three families of NN functions that cover popular NN ranking mechanisms. Advocate three SD operator that are optimal to different family of NN functions. Propose efficient NN candidate search algorithm for three SD operators. 40

41 Thanks! Q&A 41

42 Experiments 42

43 Experiments 43

A Survey on Spatial-Keyword Search

A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms