Multi-Approximate-Keyword Routing Query

Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Introduction and motivation Approximate keyword search is important: GIS data has errors and uncertainty with it. GIS data is keeping evolving, routinely data cleaning and data integration is expensive People may make mistakes in query input (typos) Shortest path search has many applications: map service. strategic planning of resources Our work: Multi-Approximate-Keyword Routing (MAKR) query. A combination of shortest path search and approximate keyword search Given a source and destination pair (s, t) and a query keyword set ψ on a road network, the goal is to find the shortest path that passes through at least one matching object per keyword.

Multi-Approximate-Keyword Routing (MAKR) query v 1 p 1 : gym p 4 : Moe s Club p 2 : Bob s market p 3 : theate v 2 s p 5 : Museum p 6 : Spielburg t p 9 : theater v 3 v 4 p 7 : theatre p 8 : grocery v 8 v 7 Vertex Point Edge p 10 : Spielberg v 5 v 6

Multi-Approximate-Keyword Routing (MAKR) query v 1 p 1 : gym p 4 : Moe s Club p 2 : Bob s market p 3 : theate v 2 s p 5 : Museum p 6 : Spielburg t p 9 : theater v 3 v 4 p 7 : theatre p 8 : grocery v 8 Approximate string similarity: edit distance ɛ(δ 1, δ 2 ) = τ. v 7 Vertex Point Edge p 10 : Spielberg v 5 v 6

Multi-Approximate-Keyword Routing (MAKR) query v 1 p 1 : gym p 4 : Moe s Club p 2 : Bob s market p 3 : theate v 2 s p 5 : Museum p 6 : Spielburg t p 9 : theater Vertex Point Edge p 10 : Spielberg v 5 v 6 v 3 v 4 p 7 : theatre p 8 : grocery ɛ(δ 1, p 7.δ) = 2 ɛ(δ 2, p 10.δ) = 0 Example: s, t, ψ = {(δ 1 = theater, τ 1 = 2), (δ 2 = Spielberg, τ 2 = 1)}. v 8 c 1 v 7

Multi-Approximate-Keyword Routing (MAKR) query v 1 p 1 : gym p 4 : Moe s Club p 2 : Bob s market p 3 : theate v 2 s p 5 : Museum p 6 : Spielburg t c 2 p 9 : theater Vertex Point Edge p 10 : Spielberg v 5 v 6 v 3 v 4 p 7 : theatre p 8 : grocery ɛ(δ 1, p 9.δ) = 0 ɛ(δ 2, p 6.δ) = 1 Example: s, t, ψ = {(δ 1 = theater, τ 1 = 2), (δ 2 = Spielberg, τ 2 = 1)}. path length: d(c 2 ) < d(c 1 ) v 8 v 7

Multi-Approximate-Keyword Routing (MAKR) query v 1 p 1 : gym p 2 : Bob s market v 2 p 4 : Moe s Club c: candidate path p 3 : theate s p 5 : Museum p 6 : Spielburg t p 9 : theater v 3 v 4 p 7 : theatre p 8 : grocery v 8 v 7 Vertex Point Edge p 10 : Spielberg v 5 v 6 Example: s, t, ψ = {(δ 1 = theater, τ 1 = 2), (δ 2 = Spielberg, τ 2 = 1)}. ψ(c) = {δ 1 = theater}

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Data structure: Disk-based storage of the road network v 1 p 1 gym p 2 Bob s market p 3 theate v 2 v 8 v 7 v 5 v 6 v 3 v 4 d(v 1, v 2 ) = 6 d(v 1, v 8 ) = 8 d(v 1, p 1 ) = 1 d(v 1, p 2 ) = 3 d(v 1, p 3 ) = 5 v i : network vertex.

Data structure: Disk-based storage of the road network v 1 p 1 gym p 2 Bob s market p 3 theate v 2 v 8 v 7 v 5 v 6 v 3 v 4 d(v 1, v 2 ) = 6 d(v 1, v 8 ) = 8 d(v 1, p 1 ) = 1 d(v 1, p 2 ) = 3 d(v 1, p 3 ) = 5 Adjacency list B+-tree 1 2 3. Adjacency list file v 1, Rdist 1, 2 v 2 6 v 8 8 v 2, Rdist 2, 3 v 1 6 v 3 2 v 5. v i : network vertex. RDIST i : distances to the landmarks. Points file (v 1, v 2), 3 1 1 gym 2 3 Bob s market 3 5 theate (v 1, v 8), 2 4 3 Moe s Club 8 5 6 Museum.. 1 4 6 Points file B+-tree [sl97]: CCAM: A connectivity-clustered access method for networks and network computations. In IEEE TKDE, 1997.

Data structure: FilterTree for Approximate Keywords-Matching 1 aa 1 4 7 19. jo root 3 2 n zz Length filter Prefix filter Position filter An inverted list (point ids whose associated strings have length of 3 and the gram jo at position 2) [lll08]: Efficient merging and filtering algorithms for approximate string searches. In ICDE, 2008.

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Exact solution overview Intuition: PER Path Expansion and Refinement.

Exact solution overview Intuition: PER Path Expansion and Refinement. Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), add a point p from P(w) into current shortest candidate path, s.t. p P(w), ɛ(p.δ, w) τ w, to minimize the impact to d(c) L p 2 : ch {s, p 1, t} p 1 : ee s t {s, p 3, t} p 3 : yb p 4 : zb {s, p 2, t} IO efficient priority queue of candidate paths: initialized with c s tha each covers a dinstinct, single w ψ

Exact solution overview Intuition: PER Path Expansion and Refinement. Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), add a point p from P(w) into current shortest candidate path, s.t. p P(w), ɛ(p.δ, w) τ w, to minimize the impact to d(c) s p 1 : ee p 2 : ch t ψ(c) = {ef } p 3 : yb ψ ψ(c) = {ab, cd} p 4 : zb

Exact solution overview Intuition: PER Path Expansion and Refinement. Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), add a point p from P(w) into current shortest candidate path, s.t. p P(w), ɛ(p.δ, w) τ w, to minimize the impact to d(c) s p 1 : ee p 2 : ch t insert sp 1 p 3 t to L ψ(c) = {ef } p 3 : yb ψ ψ(c) = {ab, cd} p 4 : zb

Exact solution overview Intuition: PER Path Expansion and Refinement. Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), add a point p from P(w) into current shortest candidate path, s.t. p P(w), ɛ(p.δ, w) τ w, to minimize the impact to d(c) p 2 : ch insert sp 1 p 2 t to L p 1 : ee s t ψ(c) = {ef } p 3 : yb ψ ψ(c) = {ab, cd} p 4 : zb

Exact solution overview Improvement. use Landmarks to estimate distances when finding points; modify and then combine with FilterTree to find p P(w) incrementally; refine d(c) when c becomes a qualified path. two methods to refine d(c): PER-full and PER-partial

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Approximate solutions for MAKR query Problem with the exact solution: Theorem 1: The MAKR problem is NP-hard.

Approximate solutions for MAKR query Problem with the exact solution: Theorem 1: The MAKR problem is NP-hard. Approximate solutions: The local minimum path algorithms: A LMP1 and A LMP2. The global minimum path algorithm: A GMP.

The local minimum distance algorithms: A LMP1 and A LMP2 Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} s t

The local minimum distance algorithms: A LMP1 and A LMP2 Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each segment (p i, p j ), find a point p, p.δ similar to keywords in ψ ψ(c), to minimize sum of d(p i, p) and d(p, p j ). s p 1 : ee t

The local minimum distance algorithms: A LMP1 and A LMP2 Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} s t

The local minimum distance algorithms: A LMP1 and A LMP2 Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), we iterate through the segments in c and add the point p P(w), which minimizes d(c), to one segment (p i, p j ) of c. p 1 : yb s t

The minimum distance algorithm: A GMP Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} s t

The minimum distance algorithm: A GMP Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), find a point p P(w) to minimize sum of d(s, p) and d(p, t). p 1 : yb s t

The minimum distance algorithm: A GMP Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} For each keyword w ψ ψ(c), find a point p P(w) to minimize sum of d(s, p) and d(p, t). p 1 : yb s p 3 : ch p 2 : ee t

The minimum distance algorithm: A GMP Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} p 1 : yb s p 3 : ch p 2 : ee t

The minimum distance algorithm: A GMP Q : s, t, ψ = {(ab, 1), (cd, 1), (ef, 1)} p 1 : yb s p 3 : ch p 2 : ee t Theorem 2: The A GMP algorithm gives a κ-approximate path. This bound is tight.

Challenges in approximate solutions Challenges in all approximate methods: how to find p P(w) incrementally for each type of objective function (instead of finding P(w) all at once and iterate through points in P(w) one by one)? how to avoid exact distance computation as much as possible?

Improvement on approximate solutions by network partitioning Voronoi-diagram-like partition (by Erwig and Hagen s algorithm). : Vetex

Improvement on approximate solutions by network partitioning Voronoi-diagram-like partition (by Erwig and Hagen s algorithm). : Vetex G 3 G 1 G 2

Improvement on approximate solutions by network partitioning Voronoi-diagram-like partition (by Erwig and Hagen s algorithm). : Vetex

Improvement on approximate solutions by network partitioning Voronoi-diagram-like partition (by Erwig and Hagen s algorithm). : Vetex G 3 G 1 G 2

Improvement on approximate solutions by network partitioning Voronoi-diagram-like partition (by Erwig and Hagen s algorithm). : Vetex G 3 G 1 G 2 d (p, G i ): lower bound distance from p to the boundary of G i, computed using the landmarks. d (s, G i ) + d (G i, t) d (s, p) + d (p, t), p G i.

Extensions Top-k MAKR query: Exact methods. Approximate methods.

Extensions Top-k MAKR query: Exact methods. Approximate methods. Multiple strings.

Extensions Top-k MAKR query: Exact methods. Approximate methods. Multiple strings. Updates.

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Experiment setup All experiments were executed on a Linux machine with an Intel Xeon CPU at 2.13GHz and 6GB of memory.

Experiment setup All experiments were executed on a Linux machine with an Intel Xeon CPU at 2.13GHz and 6GB of memory. Data sets: road networks from the Digital Chart of the World Server: City of Oldenburg (OL,6105 vertices, 7029 edges) California(CA,21048 vertices, 21693 edges) North America (NA,175813 vertices, 179179 edges) building locations in OL, CA and NA from the OpenStreetMap project. The default experimental parameters: Symbol Definition Default Value P number of points for exact solution 10, 000 P number of points for approximate solution 1, 000, 000 κ number of query strings 6 τ edit distance threshold 2 road network CA

Query time: Running Time (secs) 10 3 10 2 10 1 10 0 10 1 PER partial PER full A GMP A LMP1 A LMP2 P = 10, 000 10 2 2 4 6 8 10 κ

Query time: Running Time (secs) 10 3 10 2 10 1 10 0 PER partial PER full A GMP A LMP1 A LMP2 P = 10, 000 10 1 10 3 10 4 10 5 10 6 P

Query time: Running Time (secs) 10 3 10 2 10 1 10 0 10 1 PER partial PER full A GMP A LMP1 A LMP2 P = 10, 000 10 2 OL CA NA

Query time: 10 3 PER partial PER full Running Time (secs) 10 2 10 1 10 0 A GMP A LMP1 A LMP2 P = 10, 000 10 1 1 2 3 τ

Scalability of approximate solutions: 15 A GMP A LMP1 A LMP2 Running Time (secs) 12 9 6 3 0 P = 1, 000, 000 2 4 6 8 10 κ

Scalability of approximate solutions: Running Time (secs) 8 A GMP A LMP1 A LMP2 6 4 2 0 P = 1, 000, 000 0.5 1 1.5 2 P : X10 6

Scalability of approximate solutions: 10 2 A GMP A LMP1 A LMP2 Running Time (secs) 10 1 10 0 10 1 P = 1, 000, 000 OL CA NA

Scalability of approximate solutions: 15 A GMP A LMP1 A LMP2 Running Time (secs) 12 9 6 3 0 P = 1, 000, 000 1 2 3 τ

Approximation quality: r: Approximation Ratio 2.5 A GMP A LMP1 A LMP2 2 1.5 1 2 4 6 8 10 κ

Approximation quality: r: Approximation Ratio 2.5 A GMP A LMP1 A LMP2 2 1.5 1 10 3 10 4 10 5 10 6 P

Approximation quality: r: Approximation Ratio 2.2 A GMP A LMP1 A LMP2 1.9 1.6 1.3 1 OL CA NA

Approximation quality: r: Approximation Ratio 2.5 A GMP A LMP1 A LMP2 2 1.5 1 1 2 3 τ

Outline 1 Introduction and Motivation 2 Preliminary 3 Exact solutions 4 Approximate solutions 5 Experiments 6 Related Work and Concluding Remarks

Related work The optimal sequenced route (OSR) query [sks07]. : different keywords. [sks07]: The Optimal Sequenced Route Query. In VLDBJ, 2007.

Related work The optimal sequenced route (OSR) query [sks07]. Q = s, t, ( ) s t : different keywords. [sks07]: The Optimal Sequenced Route Query. In VLDBJ, 2007.

Related work The optimal sequenced route (OSR) query [sks07]. Q = s, t, ( ) s c t : different keywords. [sks07]: The Optimal Sequenced Route Query. In VLDBJ, 2007.

Related work The optimal sequenced route (OSR) query [sks07]. Exact keyword query and only handles the query keywords sequentially. Q = s, t, ( ) s c t : different keywords. [sks07]: The Optimal Sequenced Route Query. In VLDBJ, 2007.

Related work The optimal sequenced route (OSR) query [sks07]. Exact keyword query and only handles the query keywords sequentially. In MAKR queries, categories are dynamically decided only at the query time. Q = s, t, ( ) s c t : different keywords. [sks07]: The Optimal Sequenced Route Query. In VLDBJ, 2007.

The end Thank You Q and A