Distance oracles in edge-labeled graphs

Similar documents
5/20/2011. HITT An electron moves from point i to point f, in the direction of a uniform electric field. During this displacement:

OBJECTIVE To investigate the parallel connection of R, L, and C. 1 Electricity & Electronics Constructor EEC470

WYSE Academic Challenge Sectional Mathematics 2006 Solution Set

Hotelling s Rule. Therefore arbitrage forces P(t) = P o e rt.

Relevance feedback and query expansion. Goal: To refine the answer set by involving the user in the retrieval process (feedback/interaction)

CS579 - Homework 2. Tu Phan. March 10, 2004

CHAPTER 24 GAUSS LAW

MEM202 Engineering Mechanics Statics Course Web site:

Work, Energy, and Power. AP Physics C

Summary chapter 4. Electric field s can distort charge distributions in atoms and molecules by stretching and rotating:

ME 3600 Control Systems Frequency Domain Analysis

Application of Net Radiation Transfer Method for Optimization and Calculation of Reduction Heat Transfer, Using Spherical Radiation Shields

Analytical Solution to Diffusion-Advection Equation in Spherical Coordinate Based on the Fundamental Bloch NMR Flow Equations

5.1 Moment of a Force Scalar Formation

Example 11: The man shown in Figure (a) pulls on the cord with a force of 70

Journal of Theoretics

Example

A) N B) 0.0 N C) N D) N E) N

CHAPTER GAUSS'S LAW

Announcements Candidates Visiting Next Monday 11 12:20 Class 4pm Research Talk Opportunity to learn a little about what physicists do

Electric Charge. Electric charge is quantized. Electric charge is conserved

Consider the simple circuit of Figure 1 in which a load impedance of r is connected to a voltage source. The no load voltage of r

A) (0.46 î ) N B) (0.17 î ) N

Subjects discussed: Aircraft Engine Noise : Principles; Regulations

Fri. 10/23 (C14) Linear Dielectrics (read rest at your discretion) Mon. (C 17) , E to B; Lorentz Force Law: fields

Math 301: The Erdős-Stone-Simonovitz Theorem and Extremal Numbers for Bipartite Graphs

Magnetism. Chapter 21

Lecture #2 : Impedance matching for narrowband block

On the ratio of maximum and minimum degree in maximal intersecting families

The Substring Search Problem

Journal of Solid Mechanics and Materials Engineering

EPr over F(X} AA+ A+A. For AeF, a generalized inverse. ON POLYNOMIAL EPr MATRICES

On the ratio of maximum and minimum degree in maximal intersecting families

Chapter 3: Cluster Analysis

Optimal Design of Transonic Fan Blade Leading Edge Shape Using CFD and Simultaneous Perturbation Stochastic Approximation Method

Sensors and Actuators Introduction to sensors

A Direct Method for the Evaluation of Lower and Upper Bound Ratchet Limits

A Direct Method for the Evaluation of Lower and Upper Bound Ratchet Limits

On the Micropolar Fluid Flow through Porous Media

Section 4.2 Radians, Arc Length, and Area of a Sector

Introduction. Electrostatics

Phys 332 Electricity & Magnetism Day 3. Note: I should have recommended reading section 1.5 (delta function) as well. rˆ rˆ

Computational modeling techniques

The Gradient and Applications This unit is based on Sections 9.5 and 9.6, Chapter 9. All assigned readings and exercises are from the textbook

Chapter 5 Trigonometric Functions

Understanding Control Charting: Techniques and Assumptions

Supplementary information Efficient Enumeration of Monocyclic Chemical Graphs with Given Path Frequencies

AT622 Section 15 Radiative Transfer Revisited: Two-Stream Models

Fractional Zero Forcing via Three-color Forcing Games

A Crash Course in (2 2) Matrices

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

AIR FORCE RESEARCH LABORATORY

EXPERT JUDGMENT IN FORECASTING PRESIDENTIAL ELECTIONS: A PRELIMINARY EVALUATION. Alfred G. Cuzán and Randall J. Jones, Jr.

A Bijective Approach to the Permutational Power of a Priority Queue

Method for Approximating Irrational Numbers

Surface and Interface Science Physics 627; Chemistry 542. Lecture 10 March 1, 2013

Electric Fields and Electric Forces

SOURCE MODEL OF THE 2010 ELAZIG KOVANCILAR EARTHQUAKE (M w 6.1) FOR BROADBAND GROUND MOTION SIMULATION

The Implementation of the Conditions for the Existence of the Most Specific Generalizations w.r.t. General EL-TBoxes

3.1 Random variables

Outline. Steady Heat Transfer with Conduction and Convection. Review Steady, 1-D, Review Heat Generation. Review Heat Generation II

A Maximum Likelihood Method to Improve Faint-Source Flux and Color Estimates

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Exploration of the three-person duel

Solutions: Solution. d = 3.0g/cm we can calculate the number of Xe atoms per unit volume, Given m and the given values from Table 7.

The Millikan Experiment: Determining the Elementary Charge

We can see from the graph above that the intersection is, i.e., [ ).

Physics 111. Exam #1. January 26, 2018

INVERSE QUANTUM STATES OF HYDROGEN

AND THE REST CATEGORICAL. by M. D. Moustafa

School of Chemical & Biological Engineering, Konkuk University

Steady State Analysis of Squirrel-Cage Induction Machine with Skin-Effect

Electromagnetic Waves

Centripetal Force OBJECTIVE INTRODUCTION APPARATUS THEORY

Multiple Experts with Binary Features

REPORT ITU-R SA Protection of the space VLBI telemetry link

OLYMON. Produced by the Canadian Mathematical Society and the Department of Mathematics of the University of Toronto. Issue 9:2.

Chapter 15. ELECTRIC POTENTIALS and ENERGY CONSIDERATIONS

MATH 415, WEEK 3: Parameter-Dependence and Bifurcations

Support-Vector Machines

Physics 2B Chapter 22 Notes - Magnetic Field Spring 2018

THE COUPLING SPECTRUM: A NEW METHOD FOR DETECTING TEMPORAL NONLINEAR CAUSALITY IN FINANCIAL TIME SERIES

CHE CHAPTER 11 Spring 2005 GENERAL 2ND ORDER REACTION IN TURBULENT TUBULAR REACTORS

Conspiracy and Information Flow in the Take-Grant Protection Model

To Feel a Force Chapter 7 Static equilibrium - torque and friction

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

Numerical Integration

Auchmuty High School Mathematics Department Advanced Higher Notes Teacher Version

When two numbers are written as the product of their prime factors, they are in factored form.

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

Explosive Contagion in Networks (Supplementary Information)

EM Boundary Value Problems

A) 100 K B) 150 K C) 200 K D) 250 K E) 350 K

ac p Answers to questions for The New Introduction to Geographical Economics, 2 nd edition Chapter 3 The core model of geographical economics

OPERATIONAL AMPLIFIERS

Recap of the last lecture. CS276A Information Retrieval. This lecture. Relevance Feedback: Example. Relevance Feedback

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Current Balance Warm Up

March 15. Induction and Inductance Chapter 31

Strees Analysis in Elastic Half Space Due To a Thermoelastic Strain

Transcription:

Distance acles in edge-labeled gaphs Fancesc Bnchi Aistides Ginis Fancesc Gull Antti Ukknen Yah Labs Bacelna, Spain {bnchi,gull}@yah-inc.cm Helsinki Institute f Infmatin Technlgy HIIT Aalt Univesity {aistides.ginis,antti.ukknen}@aalt.fi ABSTRACT A fundamental peatin ve edge-labeled gaphs is the cmputatin f shtest-path distances subject t a cnstaint n the set f pemissible edge labels. Applying exact algithms f such an peatin is nt a viable ptin, especially f massive gaphs, in scenais whee the distance cmputatin is used as a pimitive f me cmplex cmputatins. In this pape we study the pblem f efficient appximatin f shtest-path queies with edge-label cnstaints, f which we devise tw indexes based n the idea f landmaks: distances fm all vetices f the gaph t a selected subset f landmak vetices ae pe-cmputed and then used at quey time t efficiently appximate distance queies. The maj challenge t face is that, in pinciple, an expnential numbe f cnstaint label sets needs t be sted f each vetex-landmak pai, which makes the index pe-cmputatin and stage fa fm tivial. We tackle this challenge fm tw diffeent pespectives, which lead t indexes with diffeent chaacteistics: ne index is faste and me accuate, but it equies me space than the the. We extensively evaluate u techniques n eal and synthetic datasets, shwing that u indexes can efficiently and accuately estimate label-cnstained distance queies.. INTRODUCTION Cmputing shtest-path distances between any tw vetices f a gaph is ne f the mst fundamental gaph pimitives, used in a lage vaiety f applicatins and methds. F tday s gaph sizes it is ften nt feasible t cmpute exact shtest-path distances by elying n sme f the well-knwn basic methds: unning the Dijksta s algithm, sme f its efficient vaiants [2], n gaphs f billins f vetices may equie unaffdable time f a numbe f eal-wld applicatins. This emains tue even f mdeatelysized gaphs if shtest-path queies ae used as pimitive in the cntexts f me cmplex eal-time applicatins (e.g., in ecmmende systems). F these easns, designing fast shtest-pathcmputatin algithms has becme an active eseach aea. Numeus algithms have been s fa defined f speeding-up shtest-path-distance cmputatin see [25] f an up-t-date su- (c) 204, Cpyight is with the auths. Published in Pc. 7th Intenatinal Cnfeence n Extending Database Technlgy (EDBT), Mach 24-28, 204, Athens, Geece: ISBN 978-3-8938065-3, n OpenPceedings.g. Distibutin f this pape is pemitted unde the tems f the Ceative Cmmns license CC-by-nc-nd 4.0 s g g Figue : Example edge-labeled gaph whee the label-cnstained shtest-path distance quey s, t, {} etuns 4, while the quey s, t, {, g} etuns 3, and the quey s, t, {, g, } etuns 2. vey. Such algithms can be badly classified int exact and appximate. Althugh desiable in sme scenais, exact methds usually ely n specific chaacteistics f the input gaph, which may limit the applicability 2 in me geneal cntexts. Instead, appximate 3 methds typically allw f a fine tuning between accuacy and efficiency that makes them able t satisfy the vaius S equiements/cnstaints 4 2 f evey specific 3 applicatin T cntext. Many methds f appximating shtest-path distances ae based n the idea f landmaks [27, 24, 2 3, 8, 22, 2, 28]. F a standad gaph G = (V, E), the landmak appach wks by selecting a subset f vetices X V, X = k, and cmputing the (exact) distances d(u, x) between evey pai f vetices u V and x X. At quey time, given any tw vetices s, t V, the tiangle inequality ensues that max x X d(x, s) d(x, t) d(s, t) min (d(x, s) + d(x, t)). x X Thus, the shtest-path distance between s and t can be estimated by eithe the uppe bund the lwe bund abve, by sme in-between value such as the median [2]. Denting by n and m the numbe f vetices and edges in the gaph, espectively, the landmak-based index takes O(km) ffline time, as it equies ne scan f the gaph pe landmak, while the stage equiement is O(kn). At quey time, an appximate shtest-path distance d(s, t) is fund in O(k) time by accessing the pecmputed landmak-t-vetex distances f s and t. Shtest paths n edge-labeled gaphs. In a vaiety f applicatin dmains it is inceasingly cmmn t cme acss gaphs whse edges ae assciated with a label denting the type f the elatinship f the tw incident vetices [6, 23, 26, 29, 8, 5]. F instance, uses f a scial netwk have the pssibility f categizing thei wn cnnectins in diffeent scial cicles (e.g., cicles in Ggle+, lists in Facebk Twitte) [20], which makes the edges f the scial netwk chaacteizing diffeent elatinship types, such as fiends, elatives, clleagues, and s n. RDF esuces such as the Ggle Knwledge Gaph YAGO ae natually epesented as gaphs whse links ae labeled with the type f the ppety (pedicate) that chaacteizes the elatinship between the tw cnnected entities. In a c-authship netwk like DBLP t

cllabatins (links) between any tw auths ae chaacteized by the tpic(s) f the papes c-authed by thse auths [6]. In a ptein-inteactin netwk edge labels epesent diffeent types f inteactins between pteins, e.g., physical assciatin, diect inteactin, c-lcalizatin, and s n [3]. Othe examples ae multi-dimensinal netwks (i.e., netwks deived fm the integatin f multiple netwks) [26, 5], metablic netwks [9], ecmmendatin netwks [9]. When dealing with edge-labeled gaphs, it is ften equied t cnside label-cnstained shtest-path distance queies, i.e., shtest-path distance queies with a cnstaint n the set f pemitted edge labels. Me pecisely, given tw vetices s and t and a set f labels C, the quey s, t, C asks t cmpute the length f a shtest path fm s t t, using nly edges whse label belngs t C see Figue f an illustatin. Applicatins. Label-cnstained shtest-path distance queies find natual applicatin in a vaiety f eal-wld scenais. Paticulaly, being usually a pimitive invlved in me cmplex tasks, answeing/appximating such queies efficiently is mandaty. As a cncete example, cnside sme f the tday s nvel systems f advanced infmatin etieval and knwledge explatin, such as the Ggle Knwledge Gaph, Facebk Gaph Seach, simila RDF esuces. As said abve, the data fmat undelying these esuces can natually be epesented as a set f entities linked by diffeent types (edge labels) f assciatin. When seaching these knwledge esuces, ne has t answe queies f the fm: Hw elated ae entities A and B, cntextualized t additinal use infmatin C?. Hee, C epesents the cntext via which ne is inteested t assciate A and B: it cespnds t the (semantically-anntated) quey being issued and/ the inteest pfile f the cuent use, and it can natually be mdeled as a set f edge labels. In a eal applicatin, the elatedness f A and B is assessed by cmplex machine-leaned anking functins that explit vaius featues. Shtest-path distance is a cental featue typically cnsideed (it is, e.g., highly-celated t the pedictin f the link between tw entities [4]), and, given the pesence f cntext C, the shtest-path distance needs t be label-cnstained. Such knwledge-explatin systems need t pvide final answes in eal-time, thus equiing that all the cmpnents f the anking functin, including label-cnstained shtest-path distances, must be cmputed/appximated vey quickly. A simila scenai aises in scial netwks, whee the shtestpath distance is ne f the mst effective featues used f machineleaning-based link-pedictin systems [4, 0]. When edge labels ae available, the link-pedictin systems can be empweed by adding the functinality f pedicting the type f the link, and nt nly whethe the link exists nt []. T still explit shtestpath distances as featues in the machine-leaning ce task, bth the (ffline) mdel-leaning phase and the nline pedictin phase need t ely n a set f examples f shtest-path distances cnstained t the use f pemissible labels, f diffeent instances f such cnstaint label sets. This means that a cental task in this cntext cespnds t answeing seveal label-cnstained shtestpath distance queies at a time. Netwk alignment, that is the identificatin f a matching amng diffeent (sub-)netwks, is a fundamental peatin in ptein-inteactin netwks [30]. A specific type f netwkalignment quey implemented in mst existing cmmecial systems such as PathBLAST (www.pathblast.g) [7] is: given an input pathway (i.e., a sequence f pteins), find all the pathways f a given taget netwk that match the quey pathway. As answeing exactly these queies equies slving a numbe f subgaph ismphisms, existing systems ely n appximated methds. Such methds can be significantly speeded-up by taking int accunt the labels natually pesent n the edges f any pteininteactin netwk and expliting label-cnstained shtest-path distance queies. As an example, nce having discveed ne pathway P that matches the quey, the idea is t take the set f labels C lying n the edges f P. Then, when asking if a pathway stating fm anthe ptein in a diffeent zne f the netwk can match the quey, ne can fist un a (appximated) label-cnstained shtest-path quey specifying C as a cnstaint label set and use the answe t this quey as a puning ule: if the distance etuned is sufficiently lage than the length f P, ne can safely cnclude that n matching paths can exist stating fm the ptein being cnsideed. An analgus easning clealy hlds if simple shtest-path queies ae invlved, but using label-cnstained shtest-path distance queies makes the quey me estictive, thus esulting in me effective puning. Challenges. Like nn-labeled gaphs, answeing shtest-path distance queies in edge-labeled gaphs efficiently is a vey cucial task. Unftunately, the edge-label cnstaint ceates a nn-tivial bstacle t the adaptatin f any existing technique f simple shtest-path distance estimatin. In fact, thee is n way f existing indexes t deal with the expnential numbe f pssible quey cnstaint label sets. F instance, a natual yet naïve way f extending the landmak appach t an edge-labeled gaph G with a label set L is t ceate a diffeent instance f G f each ne f the 2 L pssible label cmbinatins, and index each gaph instance sepaately. Then, any quey including a given cnstaint label set C is answeed fm the index f the gaph instance cespnding t C. The pblem with this naïve appach is f cuse that the numbe f gaph indexes inceases expnentially with the size f the label set L. Even a mdeate set f ten labels inceases the index size by thee des f magnitude and makes the appach phibitively expensive. Outline. In this pape we study the pblem f efficiently appximating pint-t-pint shtest-path distance queies with edgelabel cnstaints. T the best f u knwledge, this pblem has neve been cnsideed s fa. Indeed, mst eseach n edgelabeled gaphs has fcused n the subset-cnstained eachability pblem [6, 29, 8], which is nly a special case f the pblem we tackle in this pape: thse wks ae nly able t say if any tw vetices ae cnnected by a path cntaining nly the pemissible labels, while we ae inteested in assessing the distance between the tw vetices. T u knwledge, the nly wk dealing with shtest path in edge-labeled gaphs, in paticula in ad netwks, is the ne by Rice and Tstas [23], which is an adaptatin f the cntactin hieachies [] methd t the cntext f edge-labeled gaphs. Hweve, that wk diffes fm us in tw main aspects. Fist, it is suited f gaphs that have the chaacteistics f ad netwks: as empiically shwn in u expeiments, it seems less apppiate f handling the me geneal gaphs (e.g., gaphs with a pwelaw degee distibutin, such as scial netwks). Secnd, me imptantly, Rice and Tstas fcus n exact slutins, while, f all the mtivatins discussed abve, we devise hee appximate techniques. Ou gal is t devise indexing stategies t efficiently appximate pint-t-pint shtest-path distance queies n eal-wld edge-labeled gaphs. On such gaphs, the numbe f labels is typically mdeate, i.e., in the de f few tens; 2 nevetheless, as al- An equivalent appach is t index a single gaph and ste, f each landmak-vetex pai, a distance f each ne f the pssible 2 L label cmbinatins. 2 Even n edge-labeled gaphs mdeling RDF esuces, whee the numbe f lwlevel labels can in pinciple be much highe, what eally mattes ae the few uppe-

eady pinted ut abve, devising ppe slutins even f this numbe f labels is vey challenging, due t the intinsic expnential blwup that needs t be vecme. We ppse tw indexes that adapt the idea f landmaks t edgelabeled gaphs in tw diffeent ways. The fist appach, dubbed Pweset Cve PwCv f sht, is mtivated by the bsevatin that cetain cnstaint label sets can be subsumed by thes. Theefe, it is pssible t build the index f a numbe f minimal, nn-edundant label sets that is substantially smalle than 2 L. Even thugh this appach des nt pvide an asympttic impvement ve the naïve methd, it still gives temendus savings in pactice and makes it pssible t index shtest-path distances in many eal-wld netwks. The PwCv index is a valuable slutin f mst eal-wld edge-labeled gaphs when the numbe f labels emains within a ange f few tens. Hweve, building PwCv indexes becmes unaffdable as the numbe f edge labels inceases beynd this. Thus, we intduce a secnd appach that keeps a simple and lighte index. This appach is based n assigning landmaks t a single label, s that they can be used in appximating queies in which thei label is pat f the cnstaint label set. We dub this appach Chmatic Landmaks, ChmLand f sht, as each landmak has assigned ne cl (label). While PwCv builds a lage index, ChmLand deals with the inheent cmplexity f the pblem duing quey pcessing. Oveall, u indexes ffe a tadeff f indexing time/space vs. accuacy/efficiency: the fist index is faste and me accuate, but it has lage space and pepcessing equiements. Als, a key advantage f bth u indexes is that the accuacy vs. efficiency tade-ff can be fine-tuned by selecting an apppiate numbe f landmaks: the fewe the landmaks, the faste the quey evaluatin, and the lwe the accuacy. Cntibutins. In summay, u cntibutins ae as fllws: We design tw landmak-based index stuctues f efficiently appximating label-cnstained pint-t-pint shtest-path queies in edge-labeled gaphs. We ppse efficient algithms f building u indexes. Paticulaly, f PwCv, we pesent an efficient way t tavese the label pweset by puning unpmising label sets. We devise nvel stategies f finding a gd set f landmaks f bth the ppsed indexes. We evaluate u indexes n bth eal-wld and synthetic edge-labeled gaphs. Ou indexes utpefm by a lage magin bth the exact methd and the naïve indexing scheme: they achieve a speed-up fact up t thee des f magnitude cmpaed t exact shtest-path distance cmputatin, while exhibiting small e. Radmap. The est f the pape is ganized as fllws. In Sectin 2 we fmally state u pblem. The tw ppsed indexes ae descibed in Sectin 3 and 4, espectively. Sectin 5 pesents expeiments, while Sectin 6 cncludes the pape. 2. PROBLEM DEFINITION The input t u pblem is an edge-labeled gaph G = (V, E, L, l), whee V is a set f n vetices, E V V is a set f m edges, L is a set f labels, and l : E L is a labeling level labels f the hieachies that ae typically explited t semantically ganize the whle set f lw-level labels. Figue 2: The landmak x and the vetex u ae cnnected by thee paths. We can bseve that the label sets {} and {, g} ae SP-minimal with espect t x and u, while {, } is nt. We thus d nt need t cmpute/ste the {, }-cnstained shtest-path distance, as it can implicitly be deived fm the label set {} that subsumes {, }. functin that assigns a label in L t each edge in E. F the sake f pesentatin, we fcus n undiected unweighted gaphs, even thugh all u cncepts and ideas can easily be extended t handle diected and weighted gaphs as well. We als find it intuitive t think f the edge label as the cl f the edge, s we use the tems label and cl intechangeably in the est f the pape. Given a set f cls C L and tw vetices u, v V, we define the C-cnstained path p C(u, v) t be a path between u and v cntaining nly edges e such that l(e) C. The C-cnstained shtest-path distance d C(u, v) is the length f a shtest path ve all C-cnstained paths between u and v. If n such paths exist, we define d C(u, v) =. In this pape we study label-cnstained pint-t-pint shtestpath distance queies (LC-PPSPD), i.e., tiples s, t, C, whee s, t V and C L, which ask t find the C-cnstained shtestpath distance d C(s, t). Nte that the LC-PPSPD pblem can be slved exactly in plynmial time by simply emving fm G all edges whse cl is nt in C and cmputing the shtest-path distance n the esulting gaph. Ou gal is t devise indexing techniques t pefm fast and accuate nline appximatins. Nte that all u methds ae geneal enugh t easily apply t gaphs having multiple labels n the edges. The nly mdificatin needed cncens the definitin f C-cnstained path, which, in the multiple-label case, is defined as a path with nly edges e such that all ( at least ne f) the labels f e belng t C. Then, u methds need nly tivial adaptatins in de t take int accunt this genealized definitin f C-cnstained path. 3. POWERSET COVER INDEX The index we ppse hee is based n the simple bsevatin that, in eal-wld gaphs, it is likely that diffeent cnstaint label sets yield the same distances between gaph vetices, as shwn in the example f Figue 2. Rathe than cmputing and sting distances f all pssible label cmbinatins, we thus nly cnside thse nes that ae eally equied. 3. Index veview and quey pcessing We fist intduce the ntins f subsumptin and SP-minimality. We define these ntins f landmak-vetex pais (x, u), even thugh they hld f any tw vetices u, v V. DEFINITION. Given a landmak x X, a vetex u V, and tw label sets S, T L, we say that S subsumes T with espect t x and u if and nly if S T and d S(x, u) = d T (x, u). DEFINITION 2. A label set S L is said shtest pathminimal (SP-minimal) with espect t a landmak x X and vetex u V if and nly if it is nt subsumed by any the label set with espect t x and u. The abve definitins imply that any nn-sp-minimal label set C can be intepeted as edundant, as the cespnding C-

cnstained shtest-path distance d C(x, u) can be deived by using a subset f C. The idea is bette illustated in the example f Figue 2. This means that sting all SP-minimal label sets f a vetexlandmak pai (x, u) is sufficient f etieving the exact distance between x and u, f each subset C L. The next theem shws that the SP-minimal label sets ae sufficient t etieve C- cnstained shtest-path distances f any landmak-vetex pai (x, u) and any label-set cnstaint C, as such a distance simply cespnds t the minimum distance taken ve all SP-minimal label sets that ae subsets f C. THEOREM. Given a landmak-vetex pai (x, u), let SP xu be the set f S, d S pais cntaining all SP-minimal label sets S with espect t x and u alng with the cespnding S- cnstained shtest-path distance d S. Then, f any label set C L, the C-cnstained distance d C(x, u) can be etieved fm SP xu as d C(x, u) = {, if thee is n S, ds SP xu s.t. S C min{d S S, d S SP xu, S C}, thewise. PROOF. By definitin f subsumptin and SP-minimality, the label sets in SP xu that accunt f etieving the exact distance d C(x, u) f any label set C ae the subsets f C. Hence, if n subset f C is included in SP xu, we can immediately cnclude that thee ae n paths between u and x cntaining nly labels in C, and, theefe, d C(x, u) =. On the the hand, if SP xu cntains at least ne subset f C, the distance d C(x, u) can be etieved based n the staightfwad bsevatin that the S-cnstained distance d S between x and u cmputed f any subset S C epesents an uppe bund t the distance d C(x, u). Hence, it hlds that d C(x, u) d S, S, d s SP xu, S C, which clealy implies that d C(x, u) min{d S S, d S SP xu, S C}. Hweve, we ecall that SP xu cntains all SP-minimal label sets with espect t x and u. Thus, by definitin f SP-minimality, the abve inequality must be an equality f sme SP-minimal set that subsumes C; this means that d C(x, u) = min{d S S, d S SP xu, S C}. Oveall, the stuctue f the PwCv index cnsists f all sets SP xu, f each landmak-vetex pai (x, u). We patitin the label sets S within each SP xu accding t thei assciated distances d S and we ganize any gup f label sets shaing the same distance int a small-edundancy data stuctue, e.g., a pefix tee. Quey pcessing. Given a quey s, t, C, the C-cnstained shtest-path distance between s and t is appximated similaly as in the landmak appach f nn-labeled gaphs. Indeed, we simply need t etieve fm the index the distances d C(x, s), d C(x, t), f all x X, and appximate d C(s, t) esting t the tiangle inequality, as descibed in Sectin. We cmpute the distances d C(x, s) (and d C(x, t)) by visiting the gups f label sets in SP xs, stating fm the gup that has minimum distance (Theem ). We stp when a gup cntains a set S that is subset f C, and we etun the cespnding distance d C. If n subset f C is encunteed duing the visit, we etun. Cmplexity. The space cmplexity f the PwCv index and the quey-pcessing time depend n the size f the vaius sets SP xu. In paticula, if H is the maximum size ve all sets SP xu, the ttal space f the index is O(kHn), while the quey-pcessing time is O(kH L ). In the wst case, H culd be O(2 L ). Nevetheless, we emak that H is actually bunded by a functin f the maximum finite distance d max in the gaph, as shwn in Ppsitin. Algithm TavesePweset-ButeFce Input: an edge-labeled gaph G=(V, E, L, l), a set f landmaks X Output: f each pai (x, u), whee x X and u V, a set SP xu f C, d pais sting all SP-minimal label sets C with espect t x and u alng with the cespnding C-cnstained shtest path distance d : SP xu, x X, u V 2: f all x X d 3: D 4: f all C L d 5: D[C] CnstainedSSSP(G, x, C) 6: end f 7: f all C L, u V s.t. D[C, u] < d 8: if C is SP-minimal w..t. x and u then 9: SP xu SP xu { C, D[C, u] } 0: end if : end f 2: end f PROPOSITION. It hlds that H d max d= d max = max u,v V,C L {d C(u, v) d C(u, v) < }. ( L d ), whee PROOF. The claim fllws diectly fm the fact that any SPminimal label set C cannt have size lage than d max. Indeed, given an SP-minimal label set C with espect t x and u, all labels within C must belng t each C-cnstained shtest path between x and u, thewise, filteing the labels that d nt appea in sme f these shtest paths ut fm C wuld lead t a subset S that subsumes C, thus making C nn-sp-minimal. This implies that C d C(x, u) d max, C SP xu, SP xu. The maximum size H f any SP xu is theefe nt lage than all pssible ways f chsing fm the input label set L a set f d d max distinct labels, i.e., H = max{ SP xu x X, u V \ {x}} d max d= ( L d Due t the small-wld phenmenn, the distance d max emains in pactice vey small. Indeed, as we expeimentally shw in Sectin 5, the aveage numbe f distances t be sted pe landmakvetex pai is at mst quadatic, in mst cases even linea, in the numbe f labels in the input gaph. 3.2 Building the index We nw descibe hw t build a PwCv index. We stat with an veview f a basic bute-fce appach. Then, we discuss a numbe f puning ules t impve its efficiency. 3.2. A bute-fce algithm A bute-fce algithm t build a PwCv index is utlined as Algithm. F each landmak x, it pefms the fllwing tw main steps. Fist, a C-cnstained single-suce shtest path (SSSP) with suce x is cmputed f each label set C L (i.e., an SSSP whee edges with label nt in C ae igned); the esult f these SSSPs, i.e., the C-cnstained distances d C(x, u) f all vetices u V, is sted int the vect D (Lines 4-6). Nte that D[C, u] = d C(x, u). Then (Lines 7-), f each label set C L and each vetex u V, the SP-minimality f C with espect t x and u is checked; if C is ecgnized as SP-minimal, then it is added (alng with the cespnding distance D[C, u]) t the utput set SP xu. The fist phase f cmputing the SSSPs f all landmaks and all label sets takes O(2 L mk), while checking SP-minimality f all landmaks, all labelsets, and all vetices takes O(2 L nk L ). The latte is a esult based n the fllwing theem: THEOREM 2. Given a landmak x X and a vetex u V, any label set C L is SP-minimal with espect t x and u if and nly if d C(x, u) < d C (x, u), f all C C such that C = C. ).

PROOF. The necessay cnditin (i.e., C is SP-minimal nly if d C(x, u) < d C (x, u), f all C C such that C = C ) easily fllws fm the definitin f SP-minimality: C cannt be SP-minimal with espect t x and u if thee exist a subset C f C such that d C (x, u) = d C(x, u). The sufficient cnditin (i.e., C is SP-minimal if d C(x, u) < d C (x, u), f all C C s.t. C = C ) hlds based n the fllwing bsevatin. If the distance d C(x, u) < d C (x, u), f all subsets C f C f size C = C, then this must hld f any subset f C as well, egadless f the size. This implies that C is SP-minimal. That is, the SP-minimality f a label set C is checked in O( C ) = O( L ) time by cnsideing nly the set f (peviusly cmputed) distances {D[C, u] C C, C = C }. We will explit this esult als late. In cnclusin, the veall unning time f the TavesePweset-ButeFce algithm is O(2 L k(m + n L )). 3.2.2 Puning the seach space We nw define a numbe f puning ules t impve the efficiency f the TavesePweset-ButeFce algithm. We fcus u discussin n a single landmak x. Ou puning ules ae classified int thee categies: Skipping unnecessay label sets, i.e., ecgnize ealy the label sets C f which thee exists n vetex u such that C is SPminimal with espect t x and u. Skipping unnecessay SP-minimality tests, i.e., nce a C- cnstained SSSP with suce x has been cmputed, identify a set f vetices f which C cannt be SP-minimal and skip the cespnding SP-minimality test. Speeding-up SP-minimality tests, i.e., f sme vetices u, ecgnize if a label set C is SP-minimal nt with espect t x and u me efficiently than O( C ) time. We discuss next the thee categies in me detail. Skipping unnecessay label sets. F any given landmak x, the labelsets that can safely be discaded ae thse label sets C f which the set f vetices eachable fm x is empty, i.e., thse label sets C such that d C(x, u) =, f all u V. Ealy detectin f such label sets can be caied ut based n the bsevatin that a label set C yields an empty set f vetices eachable fm x if and nly if C cntains n labels pesent n the edges incident t x. In that case (and nly in that case), thee is n way f the landmak x t emain cnnected t the est f the gaph. This bsevatin is fmalized next. OBSERVATION. Given an edge-labeled gaph G = (V, E, L, l) and a landmak x X, let L x be the set f all labels placed n edges incident t x, i.e., L x = {l(x, u) (x, u) E}. F any label set C L it hlds that: d C(x, u) = f all u V if and nly if C L \ L x. T explit Obsevatin, we mdify the algithm Tavese- Pweset-ButeFce as fllws. Instead f visiting all C L (Lines 4-6 in Algithm ), we avid geneating unnecessay label sets by emplying a stategy that esembles candidate geneatin f the well-knwn Apii algithm f fequent-itemset mining [2]. While Apii is a level-wise bttm-up stategy, in u setting, we need t geneate candidates (i.e., label sets) in a tp-dwn fashin. Hweve, with a simple tick, we can still ely n a bttm-up stategy and keep all its efficiency advantages. The idea is t geneate candidates in a standad bttm-up fashin, while testing the Functin GeneateCandidates(G, x) Input: an edge-labeled gaph G = (V, E, L, l), a landmak x Output: a set f label sets C : C 2: L x {l(x, u) (x, u) E} 3: Cand {{l} l L} 4: while Cand d 5: Cand Cand \{C C Cand, C L x} 6: C C {(L \ C) C Cand} 7: Cand ApiiNextLevel(Cand) 8: end while 9: etun C puning cnditin in Obsevatin n the cmplement f each candidate C. Me pecisely, given a candidate C, we decide whethe it shuld be filteed ut by checking if L \ C is a subset f L \ L x (whee L x = {l(x, u) (x, u) E}),, equivalently, whethe C L x. The details f the pcedue just descibed ae epted as Functin. Skipping unnecessay SP-minimality tests. Once a C- cnstained SSSP with suce x has been cmputed f any label set C, the SP-minimality f C shuld in pinciple be checked with espect t all vetices u having distance d C(x, u) < (Lines 7- in Algithm ). Hee we shw hw t ealy ecgnize vetices u f which C cannt be SP-minimal, thus aviding t take int cnsideatin such vetices at all. Paticulaly, we bseve the fllwing: f any label set C t be SP-minimal with espect t a vetex u (and a landmak x) each label in C must be pesent n at least ne edge f evey C-cnstained shtest path between x and u; thewise, C cannt be SP-minimal with espect t u because thee wuld exist a shtest path between x and u that uses nly a subset f C. An immediate cnsequence f this is that C cannt be SP-minimal with espect t any vetex u having distance d C(x, u) < C. We fmalize this bsevatin next. OBSERVATION 2. Given a landmak x X and a vetex u V, any label set C L is SP-minimal with espect t x and u nly if d C(x, u) C. T pfitably explit the abve bsevatin, the utput f any C-cnstained SSSP with suce x can be ganized int an apppiate data stuctue whee, given a distance t, all bjects u having d C(x, u) = t can be accessed in cnstant time. This way, we can pcess nly vetices at a distance t C and skip all the nes. Speeding-up SP-minimality tests. Finally, we discuss tw bsevatins that allw the SP-minimality f cetain label sets C and vetices u be checked in O() time athe than O( C ). The fist bsevatin is the fllwing. Cnside a vetex u and all uncnstained shtest paths cnnecting u t the landmak x. Assume that at least ne f thse shtest paths is mnchmatic. Dente by l u the unique label f that path. Then, it is easy t see that any label set C cntaining l u cannt be SP-minimal with espect t x and u. We fmalize this bsevatin next. OBSERVATION 3. Given a landmak x X, let u be a vetex in V such that thee exists an uncnstained shtest path between x and u that is mnchmatic and let l u dente the unique label n the edges f such a mnchmatic path. It hlds that all label sets C {l u} ae nn-sp-minimal with espect t x and u. T explit Obsevatin 3 in pactice, we cmpute and ste the set V x f all vetices u that have a mnchmatic shtest path cnnecting them t x, alng with the unique label l u f the cespnding path f evey u. This can be achieved in O(m + n)

Algithm 2 TavesePweset Input: an edge-labeled gaph G=(V, E, L, l), a set f landmaks X Output: f each pai (x, u), whee x X and u V, a set SP xu f C, d pais sting all SP-minimal label sets C with espect t x and u alng with the cespnding C-cnstained shtest path distance d : SP xu, x X, u V 2: f all x X d 3: D, V 4: C GeneateCandidates(G, x) {Obsev. } 5: f all C C d 6: D[C], V[C] CnstainedSSSP(G, x, C) 7: end f 8: L SingleLabelSP(G, x, D[L], V[L]) {Obsev. 3} 9: f all C C d 0: f all t C {Obsev. 2} d : Vt nnspminimal(c, V[C, t ], V[C, t]) {Obsev. 4} 2: f all u V t, {L[u]} C {Obsev. 3} d 3: if C is SP-minimal w..t. x and u then 4: Vt V t \ {u} 5: end if 6: end f 7: SP xu SP xu { C, D[C, u] }, u V[C, t] \ V t 8: end f 9: end f 20: end f time by cmputing an uncnstained SSSP with suce x and then visiting the utput f such a SSSP level-by-level, i.e., stating fm vetices at distance fm x and pceeding by inceasing the distance ne-by-ne. This infmatin can pfitably be explited evey time an SP-minimality check is equied f any label set C and vetex u V x: C can be ecgnized as nn-sp-minimal with espect t x and u in cnstant time by simply checking whethe the label l u belngs t C. Let us nw discuss u secnd bsevatin. We ecall that, t explit Obsevatin 2, the utput f any C-cnstained SSSP is ganized as a cllectin f vetex sets accessible in cnstant time based n the distance fm the landmak x. Let V t dente the vetex set at distance t fm x. The SP-minimality f C is checked distance-bydistance, stating fm the set V C. Then, f any vetex within V t, we can safely assume that the SP-minimality f all vetices in V t has aleady been checked. We can explit this as fllws. Given a vetex u V t, let V t,u dente the set f all vetices in V t cnnected t u by an edge whse label belngs t the cnstaint label set C, i.e., V t,u = {v V t (u, v) E, l(u, v) C}. Ou bsevatin is: a label set C is SP-minimal with espect t a vetex u V t and a landmak x if C has been ecgnized as SP-minimal with espect t evey vetex in V t,u. Indeed, it is easy t see that, if such a cnditin hlds, then thee cannt exist any shtest path cnnecting u t x whse labels ae a subset f C. Hence, C is SP-minimal with espect t u as well. OBSERVATION 4. Given a landmak x X and a label set C L, let V t dente the set f all vetices u having distance d C(x, u) = t and V t,u = {v V t (u, v) E, l(u, v) C}. F any vetex u V t, if C is SP-minimal with espect t all vetices v V t,u (and x), then C is SP-minimal with espect t u (and x). Accding t the abve bsevatin, the SP-minimality check f C can be limited nly t vetices v V t having at least ne nn-sp-minimal neighb in V t. The data stuctues t find such vetices ae easily pduced by the C-cnstained SSSP algithm. F all the vetices in V t we can cnclude that C is SP-minimal withut futhe cmputatins. 3.2.3 The TavesePweset algithm The algithm that explits the puning ules descibed in Sectin 3.2.2 is dubbed as TavesePweset (Algithm 2). Fist, Obsevatin (and Functin ) is explited t avid geneating unnecessay label sets (Line 4), and, hence, cmpute labelcnstained SSSPs nly f the necessay label sets (Lines 5-7). Nte that a C-cnstained SSSP etuns the distance f each vetex u V fm x (vect D[C]), as well as a vect f vetex sets indexed by the distance fm x (vect V[C]). We als assume that all edges with label in C cnnecting vetices between tw cnsecutive vetex sets V[C, t ], V[C, t] ae available. T explit Obsevatin 3, a vect L is cmputed by functin SingleLabelSP (Line 8): L cntains, f all vetices u, the label f the mnchmatic shtest path between u and x (if any). In the main cycle f the algithm (Lines 9-9), all label sets within C ae pcessed. Based n Obsevatin 2, the vetices taken int accunt f each C C ae nly thse at distance C fm x (Line 0). Amng such vetices, the subset f thse nes f which C cannt immediately be ecgnized as SP-minimal based n Obsevatin 4 is fistly identified (Line ). Finally, nly the vetices in the latte subset f which Obsevatin 3 des nt apply (Line 2) ente a standad SP-minimality test, i.e., an SPminimality test pefmed based n Theem 2 (Lines 3-5). 3.3 Selecting landmaks Exact landmak selectin. We nw tun u attentin t the pblem f selecting gd landmaks f the PwCv index. We fist fmalize the POWCOV-LANDMARK-SELECTION pblem that asks t find a minimum-sized landmak set that allws the Pw- Cv index t answe all queies exactly. DEFINITION 3. Given a set f landmaks X and a quey Q = s, t, C, let d PC (Q, X) dente the appximate answe t Q pvided by the PwCv index using the landmaks X. A set f landmaks X is called PwCv-exact if and nly if dpc (Q, X) = d C(s, t), f all queies Q = s, t, C. PROBLEM (POWCOV-LANDMARK-SELECTION). Given an edge-labeled gaph G = (V, E, L, l), find a minimum-sized set f landmaks X V such that X is PwCv-exact. A fist step f tackling Pblem is t detemine the cnditins unde which any landmak set X is PwCv-exact. This is stated in the fllwing lemma. LEMMA. Given an edge-labeled gaph G = (V, E, L, l), a set f landmaks X V is PwCv-exact if and nly if, f all pais f vetices u, v and f all SP-minimal label sets C with espect t u and v, thee exists a landmak in X lying n at least ne C-cnstained shtest path p C(u, v). PROOF. Let us pve the fist diectin f the lgical implicatin. Fist, by the ntin f SP-minimality, the cnditin that thee exists a C-cnstained shtest path that intesects X f all vetices u, v and f all SP-minimal label sets C with espect t u and v actually hlds f all label sets, nt nly the SP-minimal nes. This means that, f each quey s, t, C, thee exists a landmak lying n at least ne C-cnstained shtest path between s and t. This is sufficient f the uppe bund given by tiangle inequality t be exact, pvided that the intemediate distances d C(s, x) and d C(x, t) sted in the index cnsideed ae exact f each input quey. The latte cnditin is ensued by PwCv, theefe the claim fllws. The the side f the implicatin states that if thee exists a pai u, v and an SP-minimal label set C (with espect t u and v) such

that all C-cnstained shtest paths p C(u, v) d nt pass thugh any landmak in X, then X is nt PwCv-exact. If this aises, then a quey u, v, C is answeed by PwCv esting t the uppe bund cmputed accding t a path p S(u, v) that invlves sme subset S C. The SP-minimality f C guaantees that d S(u, v) > d C(u, v), then the quey u, v, C is nt answeed exactly. Based n Lemma, we can nw deive a key esult that elates the POWCOV-LANDMARK-SELECTION pblem with the vetexcve pblem. Recall that a vetex cve f a gaph G = (V, E) is a subset f vetices V V such that f all edges (u, v) E it is eithe u V v V. The elatin is stated in the next theem. THEOREM 3. Given an edge-labeled gaph G = (V, E, L, l), a set f landmaks X V is PwCv-exact if and nly if X is a vetex cve f G. PROOF. We pve the theem by shwing that any landmak set X satisfies Lemma if and nly if it is a vetex cve f G. Fist, by definitin f vetex cve, it is guaanteed that, f each pai f vetices u, v in G and f each path p between u and v, thee exists at least ne landmak x X belnging t p. This implies that, f each quey label set C, at least ne landmak lies n a C-cnstained shtest path between u and v. Thus, X satisfies Lemma. On the the hand, nte that if X is nt a vetex cve f G, then thee exists an edge (u, v) in G such that u / X and v / X. This means that the quey u, v, C, whee C = {l(u, v)}, cannt be answeed exactly. Indeed, d C(u, v) is clealy equal t, but, as neithe u n v belngs t the landmak set X, the bund pvided f this quey is necessaily 2 because it is cmputed n a path passing thugh at least ne vetex the than u and v. It easy t see that the label set C = {l(u, v)} is SP-minimal with espect t u and v and the path cmpsed by the single edge (u, v) is the nly C-cnstained shtest path between u and v. Thus, we have fund a pai f vetices u, v and an SP-minimal label set with espect u and v such that n C-cnstained shtest path between u and v cntains a landmak in X. This makes X vilating the cnditin equied by Lemma. The theem fllws. An immediate cllay f Theem 3 is that a minimum vetex cve is a slutin f Pblem. COROLLARY. A set f landmaks X is a slutin f the POWCOV-LANDMARK-SELECTION pblem if and nly if X is a minimum vetex cve f the input gaph. Appximate landmak selectin. Cllay suggests t select landmaks by finding a minimum vetex cve f the gaph G. Hweve, even thugh the size f the minimum vetex cve may be efficiently appximated within a fact 2 [5], this size, in many cases in pactice, may be Ω(n) This is t lage f u index, as the basic assumptin f any landmak-based index is is t have a numbe f landmaks n. T vecme this dawback, we depat fm the equiement f finding a set f landmaks that allws answeing all queies exactly. Instead, we aim at maximizing the numbe f queies that can be answeed exactly with k landmaks. Hweve, cnsideing a pblem fmulatin whee the ptimal k landmaks explicitly maximize the numbe f queies cectly answeed may lead t inefficient ptimizatin stategies. The intuitin behind this is that, while in the nn-labeled case assessing the numbe f queies exactly answeed by a single landmak needs ne SSSP and time O(m), the labeled case equies 2 L SSSPs (ne f each quey label set) and time O(m2 L ), which makes even simple heuistics like lcal seach nt scalable. F this easn, and futhe mtivated by the abve esults abut vetex cveing, we fmulate the pblem as a k-max-vertex-cover pblem [7]: given an intege k, find k vetices in the input gaph s that the numbe f cveed edges is maximized. Nte that such a fmulatin is still clse t the fmulatin that explicitly cnsides the numbe f queies exactly answeed, as the me the edges cveed by a landmak set, the me the queies answeed cectly. At the same time, this fmulatin has the advantage that can be (appximately) slved efficiently, as shwn next. The k-max-vertex-cover pblem is NP-had [7], but it admits a simple and efficient appximatin algithm which we call GeedyMVC. Given the input gaph and a patial slutin cntaining < k landmaks (initially empty), the algithm always selects the vetex that cves the lagest numbe f still uncveed vetices. That is, the vetex that maximizes the maginal gain f the cve is added t the set f landmaks in each iteatin, until k landmaks have been cllected. Using standad aguments f submdula-functin maximizatin [5], the GeedyMVC algithm ( ) can be shwn t achieve an appximatin fact f e 0.632. This means that the slutin we btain f k landmaks cves at least 63% f the vetices cveed by the ptimal slutin with k landmaks. By expliting this and adapting anthe esult stated in [7], we btain the fllwing theem: THEOREM 4. The GeedyMVC algithm used f selecting landmaks f the PwCv index pvides an appximatin guaantee f max { e, k n }. PROOF. Given a gaph G = (V, E), let V C = [u,..., u k ] V be the utput f GeedyMVC, deed based n the specific iteatin whee vetices ae added t V C. Let als δ i dente the numbe f edges cveed by u i in the gaph esulting afte the executins f iteatins thugh i, i [..k]. Finally, let V C dente the ptimal slutin, and E[V C] and E[V C ] the ttal numbe f edges cveed by V C and V C, espectively. Fist, we nte that k i= δi = E[V C], while n i= δi = E E[V C ]. Als, as in each iteatin GeedyMVC picks the maximum degee vetex, whee the degee is cmputed based n the educed gaph esulting fm all pevius iteatins, it hlds that δ δ 2 δ k. This implies that k i= δi k n n i= δi. Cmbining all these findings, it esults that E[V C] = k i= δi k n n i= δi = k E[V n C ]. Thus, GeedyMVC is a k - n appximatin algithm f k-max-vertex-cover. Cmbining this esult with the esult deived fm submdulafunctin maximizatin we btain an veall appximatin fact f max {, } k e n. 4. CHROMATIC LANDMARKS INDEX In the wst case, the cnstuctin time f the PwCv index emains expnential in the numbe f labels. This is because fundamentally PwCv is based n the same stategy as the bute-fce appach, it nly uses a numbe f efficient puning heuistics. In this sectin, we ppse a secnd index, called Chmatic Landmaks (ChmLand), which is kept as light as pssible and simila t the standad landmak appach f nn-labeled gaphs. As a tade-ff, ChmLand incus in inceased quey time and ffes less accuate answes. 4. Stuctue f the index The main idea f the ChmLand index is that each f the k landmaks is assigned t a single label (cl) in L. The land-

maks f this index ae called chmatic landmaks. Any distance invlving ne me f such landmaks is efeed t as chmatic distance. In paticula, the chmatic distance between a landmak x X and a vetex u V is defined as the distance d {c(x)} (x, u) cmputed using nly edges having the cl c(x) assigned t x, while the chmatic distance between any tw landmaks x, y X cespnds t the distance d {c(x),c(y)} (x, y) using nly edges f the cls f bth x and y. We dente by cd(, ) the chmatic distance, and thus, we have cd(x, u) = d {c(x)} (x, u) vetex-t-landmak; cd(x, y) = d {c(x),c(y)} (x, y) landmakt-landmak. The stuctue f the ChmLand index is thus simple: f each vetex u V \ X we ste the (mn-)chmatic distances t all landmaks, and f each landmak x X we ste the (bi-)chmatic distances t all the landmaks. Building and sting the index can be accmplished with k BFS tavesals, equiing O(km) time and O(kn) space. 4.2 Quey pcessing A simple way t pcess LC-PPSPD queies using the Chm- Land index, is t use the uppe-bund based n the tiangle inequality, as discussed in the Intductin. PROPOSITION 2. Given a quey s, t, C and a set f chmatic landmaks X, it hlds that d C(s, t) min{cd(x, s) + cd(x, t) x X and c(x) C}. PROOF. Fist, we bseve that d C(s, t) d S(s, t), S C. Cmbining this with tiangle inequality leads t: d C(s, t) d C(x, s) + d C(x, t), x X d {c(x)} (x, s) + d {c(x)} (x, t), x X s.t. c(x) C = cd(x, s) + cd(x, t), x X s.t. c(x) C, which clealy implies that d C(s, t) min{(cd(x, u) + cd(x, t)) x X c(x) C}. This quey-pcessing stategy equies O(k) time, like the landmak appach f standad PPSPD queies. Nevetheless, the abve appach may esult in p accuacy as distances in the pecmputed index cnside nly mnchmatic paths, and the shtestpath distance f lage cnstaint label sets may nt be accuately appximated by a mnchmatic path. We handle the abve issue by cnsideing paths that pass thugh me landmaks. We emak that, f PPSPD queies n nn-labeled gaphs, using multiple landmaks des nt yield any impvement, because the length f a path using landmak x is uppe bunded by the length f a path using landmaks x and y. Instead, in the multichmatic cntext, the estimatin can be impved by using me landmaks. Indeed, if tw chmatic landmaks x and y ae assigned t diffeent cls, the fllwing uppe bund f d C(u, v) culd be tighte than the simple ne in Ppsitin 2 (see Figue 3 f an example): d C(s, t) min{cd(u, x) + cd(x, y) + cd(y, v) x, y X and c(x), c(y) C and c(x) c(y)}. The abve bsevatin can be genealized t a sequence f landmaks x,..., x z X, whee any tw cnsecutive landmaks have diffeent cls. Next we shw hw t use multiple landmaks in de t get even tighte bunds f a given LC-PPSPD quey. Recall that the ChmLand index is essentially a table sting mn-chmatic distances between landmak-vetex pais and bi-chmatic distances between landmak-landmak pais. Intuitively, ne can think f this index as an auxiliay gaph G X = S d {G} (S,X) X d {G,O} (X,Y) Y d {O} (Y,T) Figue 3: ChmLand quey-pcessing stategy f the quey s, t, {geen,ange}. The distance d {g,} (s, t) is appximated by a path passing tugh tw landmaks, x and y. The shtest path fm s t t might nt pass thugh x y, but its length is uppe bunded by d {g} (s, x) + d {g,} (x, y) + d {} (y, t). This impves upn using nly x if d g(t, x) > d {g,} (x, y) + d {} (y, t). (V, X, E X, c, w), whee V is the set f vetices f the iginal gaph G, X V is the set f landmaks, and c : X L is a functin that assigns landmaks t cls. The edge set E X f this auxiliay gaph is defined as fllws: thee exists an edge between any landmak-vetex pai (x, u) if and nly if cd(u, x) < ; thee exists an edge between any landmak-landmak pai (x, y) if and nly if c(x) c(y) and cd(x, y) <. Each edge in E X is labeled with the cl(s) f the incident landmak(s) and is weighted by a functin w : E X N defined as w(u, v) = cd(u, v). We btain the desied bund by expliting the fllwing esult. THEOREM 5. Let G = (V, E, L, l) be an edge-labeled gaph and X V a set f landmaks. Let G X = (V, X, E X, c, w) be the auxiliay gaph f G defined ve G and X. Given a label set C and tw vetices u, v V, let G X[u, v, C] dente the subgaph f G X induced by the set f vetices {u, v} {x X c(x) C}. F any quey s, t, C the shtest path distance δ C(s, t) between s and t cmputed n G X[s, t, C] is the tightest uppe bund t d C(s, t) that can be cmputed fm the infmatin sted by ChmLand index. PROOF. Given any Y X, let sp Y (s, t) be the shtest-path distance cmputed ve the subgaph f G X induced by the vetices in {s, t} Y. It hlds that c(y) C, y Y d C(s, t) sp Y (s, t), as d C(u, v) cd(u, v), u, v {s, t} Y. Cnvesely, it esults that y Y such that c(y) / C d C(s, t) sp Y (s, t), because, this way, thee might exist a pai f vetices u, v {s, t} Y such that d C(u, v) > cd(u, v), and this culd vilate the veall uppe bund d C(s, t) sp Y (s, t). F this pupse, nly subsets Y X whse landmaks ae all cupled with cls within the quey label set C guaantee sund uppe bunds t the distance d C(s, t). The tightest amng these sund bunds is defined by taking the lagest f these sets Y, i.e., Y = {x x X, c(x) C}. This means that the tightest, sund uppe bund f d C(s, t) given the infmatin in ChmLand is equal t the shtest-path distance cmputed n the subgaph f G X induced by the vetices {s, t} {x x X, c(x) C}, i.e., G X[s, t, C]. The theem suggests that t appximate d C(s, t) f a quey s, t, C we need t cnside the subgaph f G X induced by the vetices s, t, and all landmaks in X whse cl belngs t C, and then cmpute the (exact) shtest-path distance between s and t n that subgaph. This stategy equies unning a shtest-path algithm (e.g., Dijksta) n a weighted gaph with O(k) vetices and O(k 2 ) edges; thus, the quey pcessing f the enhanced Chm- Land index equies O(k 2 ) time. Nte that since k 2 n, this stategy is faste than cmputing exact distances withut any index. An illustatin f the entie pcess is epted in Figue 4. 4.3 Selecting landmaks A key esult in selecting landmaks f the PwCv index, stated in Lemma, is that the index can pvide the exact answe t a T

3 2 X g g U p X g S 4 2 3 2 T Figue 5: Simple edge-labeled gaph G whee the vetex cve {x} is nt a valid landmak set f ChmLand. Algithm 2 ChmLandLcalSeach 2 Input: an edge-labeled gaph G = (V, E, L, l), an intege k 3 Figue 4: Illustatin f a ChmLand index. Each landmak Output: a set X f k landmaks, a landmak labeling functin c (squae) is assigned t a cl. Each edge is labeled S with the 4 cl(s) 2 3 : X, c andmselect(v, L, k) T f the landmak(s) and weighted by the cespnding chmatic distance cd(, ). Given the quey s, t, {geen, ange, ed}, we take 3: 2 epeat 2: J J(G, X, c) the subgaph induced by s and t and all landmaks whse cl is in 4: andmly pick: a vetex u V \ X, a landmak x X, a cl {g,, }, and we appximate d {g,,} (s, t) by cmputing the shtestpath distance between s and t n that subgaph. In this example the 5: X, c swap(x, c, u, x, l) l L shtest path is passing nly tugh the geen and ange landmaks, 6: if J(G, X, c ) > J then and has length 6. Nte that all bunds defined by a single landmak 7: X, c X, c, J J(G, X, c ) give infinite distances. 8: end if 9: until stp quey s, t, C nly if thee exists at least ne landmak n a shtest path p C(u, v). This statement des nt hld f the Chm- Land index. Indeed, the fllwing theem shws that a single landmak n p C(u, v) des nt suffice; instead, the numbe f landmaks shuld be at least as lage as the numbe f distinct cls n p C(u, v). THEOREM 6. Given an edge-labeled gaph G = (V, E, L, l), a set f landmaks X V allws the ChmLand index t pvide exact answes nly if f all pais u, v V and all label sets C L, thee exists a shtest path p C(u, v) such that X {i (i, j) p C(u, v)} cls(p C(u, v)). PROOF. Given any tw vetices u, v V and a label set C L, let PC(u, v) be the set f all C-cnstained shtest path between u and v. T pve the claim, it is sufficient t shw that, if X des nt cntain at least h = cls(p C(u, v)) landmaks lying n sme p C(u, v) PC(u, v), then thee exists at least ne quey that ChmLand cannt answe exactly when emplying X. T this end, nte that, as each landmak has assigned nly ne cl, if the landmaks lying n p C(u, v) ae less than h, f all p C(u, v) PC(u, v), then thee wuld be at least ne cl nt cveed by any landmak n each p C(u, v). This means that the ChmLand index actually cnsides at least ne edge in each p C(u, v) as missing. Thus, whateve cls ae assigned t the landmaks, the answe pvided by ChmLand t a quey u, v, C wuld be always > p C(u, v), as it wuld be cmputed by cnsideing a path nt belnging t PC(u, v), and, theefe, nt shtest. The theem suggests that landmak selectin in ChmLand is me cmplex than in PwCv. F instance, a vetex cve epesents n lnge a valid slutin. As an example, cnside the simple gaph G shwn in Figue 5. Althugh the set X = {x} is a vetex cve f G, it is nt a valid landmak set f ChmLand whateve cl is assigned t x, thee is n way t pvide exact answes t queies invlving label sets with size lage than. Pblem fmulatin. Cnside again the example shwn in Figue 3: the shtest-path distance d {g,} (s, t) is appximated by the sum f thee chmatic distances, i.e., d {g} (s, x) + d {g,} (x, y) + d {} (y, t). Nte that the smalle these chmatic distances ae, the tighte is the appximatin ging t be. Mtivated by this example, we fcus n selecting a set f landmaks s that any vetex f the gaph is clse t at least ne landmak f any given cl. We tanslate this intuitin t an ptimizatin pblem. Fist, given a landmak x X, a vetex u V, and a landmak-labeling functin c : X L, we define the similaity functin sim c as fllws: { 0 if d{c(x)} (x, u) =, sim c(x, u) = d {c(x)} (x, u) thewise. We then define u landmak-selectin pblem as fllws. PROBLEM 2 (CHROMLAND-LANDMARK-SELECTION). Given an edge-labeled gaph G = (V, E, L, l) and an intege k, find a set f k landmaks X V and a landmak-labeling functin c : X L s as t maximize the bjective functin J(G, X, c) = u V max x X simc(x, u). A slutin based n k-median. The CHROMLAND- LANDMARK-SELECTION pblem can be intepeted as a vaiant f the k-median pblem [3]. Specifically, we map CHROM- LAND-LANDMARK-SELECTION t a vaiant f k-median as fllws: let (M, D, H) be a bipatite gaph, whee M and D ae disjint sets f vetices, and H is the set f edges. The set D, epesenting demand pints, is a cpy f the vetex set V in the iginal gaph. The set M, epesenting median pints, is a cpy f all vetex-cl pais V L. The weight f an edge ( x, c x, u) M D is sim c(x, u). The gal f the standad k-median pblem is t select a set f median pints M M s that all demand pints in D ae seved by thei clsest median pint in M and the ttal sevice cst is minimized. Hee, by defining the edge weights using the similaity functin sim c we cast the pblem as a maximizatin pblem. Nte that setting the set f medians M equal t V L allws us t als detemine the landmak-labeling functin: if a median pint x, c x is selected in the slutin set M, we then set the cl f x equal t c x. T make this wk, we need t impse the additinal cnstaint that the set M cntains nly distinct landmaks, i.e., f all distinct pais x, c x, y, c y M we equie that x y. A key challenge is t design an algithm that des nt cmpute/mateialize all paiwise similaities, as this wuld equie unaffdable Ω(n 2 ) time/space. The slutin we ppse is an adap-