Sharing Hash Codes for Multiple Purposes

Size: px

Start display at page:

Download "Sharing Hash Codes for Multiple Purposes"

Angelina Sims
5 years ago
Views:

1 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST Sharin Hash Codes for Multiple Purposes Wiktor Pronobis, Danny Panknin, Johannes Kirschnick, Vinesh Srinivasan, Wojciech Samek, Volker Markl, Manohar Kaul, Klaus-Robert Müller, and Shinichi Nakajima arxiv: v3 [statml] 1 Jun 017 Abstract Locality sensitive hashin LSH is a powerful tool for sublinear-time approximate nearest neihbor search, and a variety of hashin schemes have been proposed for different dissimilarity measures However, hash codes sinificantly depend on the dissimilarity, which prohibits users from adjustin the dissimilarity at query time In this paper, we propose multiple purpose LSH mp-lsh which shares the hash codes for different dissimilarities mp-lsh supports L, cosine, and inner product dissimilarities, and their correspondin weihted sums, where the weihts can be adjusted at query time It also allows us to modify the importance of pre-defined roups of features Thus, mp-lsh enables us, for example, to retrieve similar items to a query with the user preference taken into account, to find a similar material to a query with some properties stability, utility, etc optimized, and to turn on or off a part of multi-modal information brihtness, color, audio, text, etc in imae/video retrieval We theoretically and empirically analyze the performance of three variants of mp-lsh, and demonstrate their usefulness on real-world data sets Index Terms Locality Sensitive Hashin, Approximate Near Neihbor Search, Information Retrieval, Collaborative Filterin I INTRODUCTION Lare amounts of data are bein collected every day in the sciences and industry Analysin such truly bi data sets even by linear methods can become infeasible, thus sublinear methods such as locality sensitive hashin LSH have become an important analysis tool For some data collections, the purpose can be clearly expressed from the start, for example, text/imae/video/speech analysis or recommender systems In other cases such as dru discovery or the material enome project, the ultimate query structure to such data may still not be fully fixed In other words, measurements, simulations or observations may be recorded without bein able to spell out the full specific purpose althouh the eneral oal: better drus, more potent materials is clear Motivated by the latter case, we consider how one can use LSH schemes without definin any specific dissimilarity at the data acquisition and pre-processin phase contributed equally correspondin author Klaus-robertmueller@tu-berlinde Wikor Pronobis, Danny Panknin, Klaus-Robert Müller, and Shinichi Nakajima are with Technische Unversität Berlin, Machine Learnin Group, Marchstr 3, Berlin, Germany Johannes Kirschnick and Volker Markl are with DFKI, Lanuae Technoloy Lab, Alt-Moabit 91c, Berlin, Germany Vinesh Srinivasan and Wojciech Samek are with Fraunhofer HHI Volker Markl is with Technische Unversität Berlin, Database Systems and Information Manaement Group, Einsteinufer 17, Berlin, Germany Manohar Kaul is with IIT Hyderabad Klaus-Robert Müller is with Korea University, and with Max Planck Society Wojciech Samek, Volker Markl, Klaus-Robert Müller, and Shinichi Nakajima are with Berlin Bi Data Center, Berlin, Germany LSH, one of the key technoloies for bi data analysis, enables approximate nearest neihbor search ANNS in sublinear time [1], [] With LSH functions for a required dissimilarity measure in hand, each data sample is assined to a hash bucket in the pre-prosessin stae At runtime, ANNS with theoretical uarantees can be performed by restrictin the search to the samples that lie within the hash bucket, to which the query point is assined, alon with the samples lyin in the neihbourin buckets A challene in developin LSH without definin specific purpose is that the existin LSH schemes, desined for different dissimilarity measures, provide sinificantly different hash codes Therefore, a naive realization requires us to prepare the same number of hash tables as the number of possible taret dissimilarities, which is not realistic if we need to adjust the importance of multiple criteria In this paper, we propose three variants of multiple purpose LSH mp-lsh, which support L, cosine, and inner product IP dissimilarities, and their weihted sums, where the weihts can be adjusted at query time The first proposed method, called mp-lsh with vector aumentation mp-lsh-va, maps the data space into an aumented vector space, so that the squared-l-distance in the aumented space matches the required dissimilarity measure up to a constant This scheme can be seen as an extension of recent developments of LSH for maximum IP search MIPS [3], [4], [5], [6] The sinificant difference from the previous methods is that our method is desined to modify the dissimilarity by chanin the aumented query vector We show that mp- LSH-VA is locality sensitive for L and IP dissimilarities and their weihted sums However, its performance for the L dissimilarity is sinificantly inferior to the standard L-LSH [7] In addition, mp-lsh-va does not support the cosine-distance Our second proposed method, called mp-lsh with code concatenation mp-lsh-cc, concatenates the hash codes for L, cosine, and IP dissimilarities, and constructs a special structure, called cover tree [8], which enables efficient NNS with the weihts for the dissimilarity measures controlled by adjustin the metric in the code space Althouh mp-lsh-cc is conceptually simple and its performance is uaranteed by the oriinal LSH scheme for each dissimilarity, it is not memory efficient, which also results in increased query time Considerin the drawbacks of the aforementioned two variants led us to our final and recommended proposal, called mp-lsh with code aumentation and transformation mp- LSH-CAT It supports L, cosine, and IP dissimilarities by aumentin the hash codes, instead of the oriinal vector mp- LSH-CAT is memory efficient, since it shares most information over the hash codes for different dissimilarities, so that the

2 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST 015 aumentation is minimized We theoretically and empirically analyze the performance of mp-lsh methods, and demonstrate their usefulness on realworld data sets Our mp-lsh methods also allow us to modify the importance of pre-defined roups of features Adjustability of the dissimilarity measure at query time is not only useful in the absence of future analysis plans, but also applicable to multi-criteria searches The followin lists some sample applications of multi-criteria queries in diverse areas: 1 In recommender systems, suestin items which are similar to a user-provided query and also match the user s preference In material science, findin materials which are similar to a query material and also possess desired properties such as stability, conductivity, and medical utility 3 In video retrieval, we can adjust the importance of multimodal information such as brihtness, color, audio, and text at query time II BACKGROUND In this section, we briefly overview previous locality sensitive hashin LSH techniques Assume that we have a sample pool X = {x n R L } N n=1 in L-dimensional space Given a query q R L, nearest neihbor search NNS solves the followin problem: x = armin Lq, x, 1 x X where L, is a dissimilarity measure A naive approach computes the dissimilarity from the query to all samples, and then chooses the most similar samples, which takes ON time On the other hand, approximate NNS can be performed in sublinear time We define the followin three terms: Definition 1: S 0 -near neihbor For S 0 > 0, x is called S 0 -near neihbor of q, if Lq, x S 0 Definition : c-approximate nearest neihbor search Given S 0 > 0, δ > 0, and c > 1, c-approximate nearest neihbor search c-anns reports some cs 0 -near neihbor of q with probability 1 δ, if there exists an S 0 -near neihbor of q in X Definition 3: Locality sensitive hashin A family H = {h : R L K} of functions is called S 0, cs 0, p 1, p -sensitive for a dissimilarity measure L : R L R L R, if the followin two conditions hold for any q, x R L : if Lq, x S 0 then P hq = hx p 1, if Lq, x cs 0 then P hq = hx p, where P denotes the probability of the event with respect to the random draw of hash functions Note that p 1 > p is required for LSH to be useful The imae K of hash functions is typically binary or inteer The followin proposition uarantees that locality sensitive hashin LSH functions enable c-anns in sublinear time Proposition 1: [1] Given a family of S 0, cs 0, p 1, p - sensitive hash functions, there exists an alorithm for c-anns with ON ρ lo N query time and ON 1+ρ space, where lo p1 ρ = lo p < 1 Below, we introduce three LSH families Let N L µ, Σ be the L-dimensional Gaussian distribution, U L α, β be the L- dimensional uniform distribution with its support [α, β] for all dimensions, and I L be the L-dimensional identity matrix The sin function, sinz : R H { 1, 1} H, applies elementwise, ivin 1 for z h 0 and 1 for z h < 0 Likewise, the floor operator applies element-wise for a vector We denote by, the anle between two vectors Proposition : L-LSH [7] For the L-distance L L q, x = q x, the hash function h L a,bx = R 1 a x + b, where R > 0 is a fixed real number, a N L 0, I L, and b U 1 0, R, satisfies Ph L L a,b q = hl a,b x = FR L Lq, x, where FR Ld = 1 Φ R/d πr/d 1 e R/d / Here, Φz = z 1 π e y dy is the standard cumulative Gaussian Proposition 3: sin-lsh [9], [10] For the cosine-distance L cos q, x = 1 cos q, x = 1 q x q x, the hash function h sin a x = sina x, 3 where a N L 0, I L, satisfies P h sin a q = h sin a x = F sin L cos q, x, where F sin d = 1 1 π cos 1 1 d 4 Proposition 4: [6] simple-lsh Assume that the samples and the query are rescaled so that max x X x 1, q 1 For the inner product dissimilarity L ip q, x = 1 q x with which the NNS problem 1 is called maximum IP search MIPS, the asymmetric hash functions h smp q a q = h sin a q = sina q, 5 where q = q; 0, h smp x a x = h sin a x = sina x, 6 where x = x; 1 x, satisfy P h smp q a q = h smp x a x = F sin L ip q, x These three LSH methods above are standard and stateof-the-art amon the data-independent LSH schemes for each dissimilarity measure Althouh all methods involve the same random projection a x, the resultin hash codes are sinificantly different from each other III PROPOSED METHODS AND THEORY In this section, we first define the problem settin Then, we propose three LSH methods for multiple dissimilarity measures, and conduct a theoretical analysis A Problem Settin Similarly to the simple-lsh Proposition 4, we rescale the samples so that max x X x 1 We also assume

3 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST q 1 1 Let us assume multi-modal data, where we can separate the feature vectors into G roups, ie, q = q 1 ; ; q G, x = x 1 ; ; x G For example, each roup corresponds to monochrome, color, audio, and text features in video retrieval We also accept multiple queries {q w } W w=1 for a sinle retrieval task Our oal is to perform ANNS for the followin dissimilarity measure, which we call multiple purpose MP dissimilarity: { L mp {q w }, x = W G w=1 =1 + η w x q w x 1 qw + λ w γ w q w x 1 q w x }, 7 where γ w, η w, λ w R G + are the feature weihts such that W G w=1 =1 γw + η w + λ w = 1 In the sinle query case, where W = 1, settin γ = 1/, 0, 1/, 0,, 0, η = λ = 0,, 0 corresponds to L-NNS based on the first and the third feature roups, while settin γ = η = 0,, 0, λ = 1/, 0, 1/, 0,, 0 corresponds to MIPS on the same feature roups When we like to down-weiht the importance of sinal amplitude e, brihtness of imae of the -th feature roup, we should increase the weiht η w for the cosine-distance, and decrease the weiht γ w for the squared-l-distance Multiple queries are useful when we mix NNS and MIPS, for which the queries lie in different spaces with the same dimensionality For example, by settin γ 1 = λ = 1/4, 0, 1/4, 0,, 0, γ = η 1 = η = λ 1 = 0,, 0, we can retrieve items, which are close to the item query q 1 and match the user preference query q An important requirement for our proposal is that the weihts {γ w, η w, λ w } can be adjusted at query time B Multiple purpose LSH with Vector Aumentation mp-lsh- VA Our first method, called multiple purpose LSH with vector aumentation mp-lsh-va, is inspired by the research on asymmetric LSHs for MIPS [3], [4], [5], [6], where the query and the samples are aumented with additional entries, so that the squared-l-distance in the aumented space coincides with the taret dissimilarity up to a constant A sinificant difference of our proposal from the previous methods is that we desin the aumentation so that we can adjust the dissimilarity measure ie, the feature weihts {γ w, λ w } in Eq7 by modifyin the aumented query vector Since mp-lsh-va, unfortunately, does not support the cosine-distance, we set η w = 0 in this subsection We define the weihted sum query by q = q 1 ; ; q G = W w=1 where φ w = γ w + λ w φ w 1 q w 1 ; ; φ w We aument the queries and the samples as follows: q = q; r, x = x; y, G qw G, 1 This assumption is reasonable for L-NNS if the size of the sample pool is sufficiently lare, and the query follows the same distribution as the samples For MIPS, the norm of the query can be arbitrarily modified, and we set it to q = 1 A semicolon denotes the row-wise concatenation of vectors, like in matlab where r R M is a vector-valued function of {q w }, and y R M is a function of x We constrain the aumentation y for the sample vector so that it satisfies, for a constant c 1 1, x = c 1, ie, y = c 1 x 8 Under this constraint, the norm of any aumented sample is equal to c 1, which allows us to use sin-lsh Proposition 3 to perform L-NNS The squared-l-distance between the query and a sample in the aumented space can be expressed as q x = q x + r y + const 9 For M = 1, only the choice satisfyin Eq8 is simple-lsh for r = 0, iven in Proposition 4 We consider the case for M, and desin r and y so that Eq9 matches the MP dissimilarity 7 The aumentation that matches the MP dissimilarity is not unique Here, we introduce the followin easy construction with M = G + 3: q = q ; c q, x = x ; 0 where 10 q = x = q 1 ; ; q G }{{} W ; w=1 γw q R L x 1 ; ; x G }{{} x R L 1 ; ; W w=1 γw G }{{ ; 0; µ } x ; 1 r R G+ } ; ; x K ; ν; 1 {{} y R G+ Here, we defined µ = W G w=1 =1 γw q w, ν = c 1 x + 1 G 4 =1 x c 1 = max x X x G =1 x c = max q q, With the vector aumentation 10, Eq9 matches Eq7 up to a constant see Appendix A:, q x = c 1 + c q x = L mp {q w }, x + const The collision probability, ie, the probability that the query and the sample are iven the same code, can be analytically computed: Theorem 1: Assume that the samples are rescaled so that max x X x 1 and q w 1, w For the MP dissimilarity L mp {q w }, x, iven by Eq7, with η w = 0, w, the asymmetric hash functions ha VA q ha VA x {q w } = h sin q = sina q, a x = h sin x = sina x, where q and x are iven by Eq10, satisfy P ha VA q {q w } =h VA x a x = F 1 sin + Lmp{qw },x λ 1 c 1c a Proof Via construction, it holds that x = c 1 and q = c, and simple calculations see Appendix A ive q x =,

4 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST a LNNS γ = 1, λ = 0 b MIPS γ = 0, λ = 0 c Mixed γ = 05, λ = 05 Fi 1 Theoretical values ρ = lo p 1 lower is better, which indicates the LSH performance see Proposition 1 The horizontal axis indicates c for c-anns lo p λ 1 Lmp{qw },x Then, applyin Propostion 3 immediately proves the theorem lo p1 Fiure 1 depicts the theoretical value of ρ = lo p of mp- LSH-VA, computed by usin Thoerem 1, for different weiht settins for G = 1 Note that ρ determines the quality of LSH smaller is better for c-anns performance see Proposition 1 In the case for L-NNS and MIPS, the ρ values of the standard LSH methods, ie, L-LSH Proposition and simple-lsh Proposition 4, are also shown for comparison Althouh mp-lsh-va offers attractive flexibility with adjustable dissimilarity, Fiure 1 implies its inferior performance to the standard methods, especially in the L-NNS case The reason miht be a too stron asymmetry between the query and the samples: a query and a sample are far apart in the aumented space, even if they are close to each other in the oriinal space We can see this from the first G entries in r and y in Eq10, respectively Those entries for the query are non-neative, ie, r m 0 for m = 1,, G, while the correspondin entries for the sample are non-positive, ie, y m 0 for m = 1,, G We believe that there is room to improve the performance of mp-lsh-va, e, by addin constants and chanin the scales of some aumented entries, which we leave as our future work In the next subsections, we propose alternative approaches, where codes are as symmetric as possible, and down-weihtin is done by chanin the metric in the code space This effectively keeps close points in the oriinal space close in the code space C Multiple purpose LSH with Code Concatenation mp-lsh- CC Let γ W w=1 λw queries by q L = = W w=1 γw, η = W w=1 ηw, and λ =, and define the metric-wise weihted averae W w=1 γw γ q w q w, q cos = W w=1 ηw q w, q w and q ip = W w=1 λw Our second proposal, called multiple purpose LSH with code concatenation mp-lsh-cc, simply concatenates multiple LSH codes, and performs NNS under the followin distance metric at query time: D CC {q w }, x= T =1t=1 γ R π h L t q L h L x t + q cos h sin + q ip h smp q t t q cos q ip h sin t x h smp x t x, 11 where h t denotes the t-th independent draw of the correspondin LSH code for t = 1,, T The distance 11 is a multimetric, a linear combination of metrics [8], in the code space For a multi-metric, we can use the cover tree [11] for efficient exact NNS Assumin that all adjustable linear weihts are upper-bounded by 1, the cover tree expresses neihborin relation between samples, takin all possible weiht settins into account NNS is conducted by boundin the code metric for a iven weiht settin Thus, mp-lsh-cc allows selective exploration of hash buckets, so that we only need to accurately measure the distance to the samples assined to the hash buckets within a small code distance The query time complexity of the cover tree is Oκ 1 lo N, where κ is a data-dependent expansion constant [1] Another ood aspect of the cover tree is that it allows dynamic insertion and deletion of new samples, and therefore, it lends itself naturally to the streamin settin Appendix F describes further details In the pure case for L, cosine, or IP dissimilarity, the hash code of mp-lsh-cc is equivalent to the base LSH code, and therefore, the performance is uaranteed by Propositions 4, respectively However, mp-lsh-cc is not optimal in terms of memory consumption and NNS efficiency This inefficiency comes from the fact that it redundantly stores the same anular or cosine-distance information into each of the L-, sin-, and simple-lsh codes Note that the information of a vector is dominated by its anular components unless the dimensionality L is very small D Multiple purpose LSH with Code Aumentation and Transformation mp-lsh-cat Our third proposal, called multiple purpose LSH with code aumentation and transformation mp-lsh-cat, offers sinificantly less memory requirement and faster NNS than mp- LSH-CC by sharin the anular information for all considered dissimilarity measures Let q L+ip = W w=1 γw + λ w q w

5 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST We essentially use sin-hash functions that we aument with norm information of the data, ivin us the followin aumented codes: H CAT q {q w } = Hq L+ip ; Hq cos ; 0 G and 1 H CAT x x = Hx; Hx; j x, where 13 Hv = sina 1 v 1,, sina G v G, 14 Hv = v 1 sina 1 v 1,, v G sina G v G, jv = v 1 ; ; v G, for a partitioned vector v = v 1 ; ; v G R L and 0 G = 0; ; 0 R G Here, each entry of A = A 1,, A G R T L follows A t,l N 0, 1 For two matrices H, H R T +1 G in the transformed hash code space, we measure the distance with the followin multi-metric: D CAT H, H = G =1 + β T t=t +1 H t, H t, T α t=1 H t, H t, + γ T H T +1, H T +1,, 15 where α = q L+ip and β = q cos Althouh the hash codes consist of T +1G entries, we do not need to store all the entries, and computation can be simpler and faster by first computin the total number of collisions in the sin-lsh part 14 for = 1,, G: C v, v = T t=1 { Hv t, = Hv t, } 16 Note that this computation, which dominates the computation cost for evaluatin code distances, can be performed efficiently with bit operations With the total number of collisions 16, the metric 15 between a query set {q w } and a sample x can be expressed as D CAT H CAT q {q w }, H CAT x x = G =1 α T + x T C q L+ip, x + β T C q cos, x + γ T x 17 Given a query set, this can be computed from Hx R T G and x for = 1,, G Therefore, we only need to store the pure T G sin-bits, which is required by sin-lsh alone, and G additional float numbers Similarly to mplsh-cc, we use the cover tree for efficient NNS based on the code distance 15 In the cover tree construction, we set the metric weihts to their upper-bounds, ie, α = β = γ = 1, and measure the distance between samples by D CAT H CAT x x, H CAT x x = G =1 x x C x, x + x + x + T C x, x + T x x 18 Since the collision probability can be zero, we cannot directly apply the standard LSH theory with the ρ value uaranteein the ANNS performance Instead, we show that the metric 15 of mplsh-cat approximates the MP dissimilarity 7, and the quality of ANNS is uaranteed Theorem : For η w = 0, w, it is lim T D CAT T = 1 L mp{q w }, x + const + error, with error 0105 λ 1 + γ 1 proof is iven in Appendix C Theorem 3: For γ w = λ w = 0, w, it is lim T D CAT T with error 0105 η 1 proof is iven in Appendix D Corollary 1: with lim T D CAT T = 1 L mp{q w }, x + const + error, = L mp {q w }, x + const + error, error 041 The error with a maximum of 041 ranes one order of manitude below the MP dissimilarity havin itself a rane of 4 Note that Corollary 1 implies ood approximation for the boundary cases, squared-l-, IP- and cosine-distance, throuh mplsh-cat since they are special cases of weihts: For example, mplsh-cat approximates squared-l-distance when settin λ w = η w = 0, w The followin theorem uarantees ANNS to succeed with mp-lsh-cat for pure MIPS case with specified probability proof is iven in Appendix E: Theorem 4: Let S 0 0,, cs 0 S , and set T 48 t t 1 lo n ε, where t > t 1 depend on S 0 and c see Appendix E for details With probability larer than 1 ε 3 ε n mp-lsh- CAT uarantees c-anns with respect to L ip MIPS Note that it is straiht forward to show Theorem 4 for squared- L- and cosine-distance In Section IV, we will empirically show the ood performance of mplsh-cat in eneral cases E Memory Requirement For all LSH schemes, one can trade off the memory consumption and accuracy performance by chanin the hash bit lenth T However, the memory consumption for specific hashin schemes heavily differs from the other schemes such that a comparison of performance is inadequate for a lobally shared T In this subsection, we derive individual numbers of hashes for each scheme, iven a fixed memory budet We count the theoretically minimal number of bits required to store the hash code of one data point The two fundamental components we are confronted with are sin-hashes and discretized reals Sin-hashes can be represented by exactly one bit For the reals we choose a resolution such that their discretizations take values in a set of fixed size The

6 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST a LNNS γ = 1, λ = 0 b MIPS γ = 0, λ = 1 c Mixed γ = 05, λ = 05 Fi Precision recall curves hiher is better on MovieLens10M data for K = 5 and T = 56 a LNNS γ = 1, λ = 0 b MIPS γ = 0, λ = 1 c Mixed γ = 05, λ = 05 Fi 3 Precision recall curves on NetFlix data for K = 10 and T = 51 L-hash function h L a,b x = R 1 a x + b is a random variable with potentially infinite, discrete values Nevertheless we can come up with a realistic upper-bound of values the L-hash essentially takes Note that R 1 a x follows a N µ = 0, σ = R x 1 distribution and x 1 Then P R 1 a x > 4σ < 10 4 Therefore L-hash essentially takes one of 8 R discrete values stored by 3 lo R bits Namely, for R = L-hash requires 13 bits We also store the norm-part of mp-lsh-cat usin 13 bits Denote by stor CAT T the required storae of mp-lsh-cat Then stor CAT T = T CAT + 13, which we set as our fixed memory budet for a iven T CAT The baselines sin- and simple-lsh, so mp-lsh-va are pure sin-hashes, thus ivin them a budet of T sin = T smp = T VA = stor CAT T hashes As discussed above L-LSH may take T L = storcatt 13 hashes For mp-lsh-cc we allocate a third of the budet for each of the three components ivin T CC = TCC L, T sin CC, T smp CC = stor CAT T 1 39, 1 3, 1 3 This consideration is used when we compare mp-lsh-cc and mp-lsh-cat in Section IV-B IV EXPERIMENT Here, we conduct an empirical evaluation on several realworld data sets A Collaborative Filterin We first evaluate our methods on collaborative filterin data, the MovieLens10M 3 and the Netflix datasets [13] Followin the experiment in [3], [5], we applied PureSVD [14] to et L-dimensional user and item vectors, where L = 150 for MovieLens and L = 300 for Netflix We centered the samples 3 so that x X x = 0, which does not affect the L-NNS as well as the MIPS solution Reardin the L-dimensional vector as a sinle feature roup G = 1, we evaluated the performance in L-NNS W = 1, γ = 1, η = λ = 0, MIPS W = 1, γ = η = 0, λ = 1, and their weihted sum W =, γ 1 = 05, λ = 05, γ = λ 1 = η 1 = η = 0 The queries for L-NNS were chosen randomly from the items, while the queries for MIPS were chosen from the users For each query, we found its K = 1, 5, 10 nearest neihbors in terms of the MP dissimilarity 7 by linear search, and used them as the round truth We set the hash bit lenth to T = 18, 56, 51, and rank the samples items based on the Hammin distance for the baseline methods and mp-lsh-va For mp-lsh-cc and mp-lsh-cat, we rank the samples based on their code distances 11 and 15, respectively After that, we drew the precision-recall curve, defined as Precision = relevantseen k and Recall = relevantseen K for different k, where relevant seen is the number of the true K nearest neihbors that are ranked within the top k positions by the LSH methods Fiures and 3 show the results on MovieLens10M for K = 5 and T = 56 and NetFlix for K = 10 and T = 51, respectively, where each curve was averaed over 000 randomly chosen queries We observe that mp-lsh-va performs very poorly in L- NNS as bad as simple-lsh, which is not desined for L- distance, althouh it performs reasonably in MIPS On the other hand, mp-lsh-cc and mp-lsh-cat perform well for all cases Similar tendency was observed for other values of K and T Since poor performance of mp-lsh-va was shown in theory Fiure 1 and experiment Fiures and 3, we will focus on mp-lsh-cc and mp-lsh-cat in the subsequent subsections

7 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST TABLE I ANNS RESULTS FOR MP-LSH-CC WITH T CC = TCC L, T sin CC, T smp CC = 104, 104, 104 Recall@k Query time msec Cover Tree Construction sec Storae per sample L bytes MIPS bytes L+MIPS 5, bytes TABLE II ANNS RESULTS WITH MP-LSH-CAT WITH T CAT = 104 Recall@k Query time msec Cover Tree Construction sec Storae per sample L bytes MIPS bytes L+MIPS 5, bytes TABLE III ANNS RESULTS FOR MP-LSH-CC WITH T CC = TCC L, T sin CC, T smp CC = 7, 346, 346 Recall@k Query time msec Cover Tree Construction sec Storae per sample L bytes MIPS bytes L+MIPS 5, bytes B Computation Time in Query Search Next, we evaluate query search time and memory consumption of mp-lsh-cc and mp-lsh-cat on the texmex dataset 4 [15], which was enerated from millions of imaes by applyin the standard SIFT descriptor [16] with L = 18 Similarly to Section IV-A, we conducted experiment on L-NNS, MIPS, and their weihted sum with the same settin for the weihts γ, η, λ We constructed the cover tree with N = 10 7 samples, randomly chosen from the ANN_SIFT1B dataset The queries were chosen from the defined query set, and the query for MIPS is normalized so that q = 1 We ran the performance experiment on a machine with 48 cores 4 AMD Opteron TM 638 Processors and 51 GB main memory on Ubuntu 1045 LTS Tables I III summarize recall@k, query time, cover tree construction time, and required memory storae Here, recall@k is the recall for K = 1 and iven k All reported values, except the cover tree construction time, are averaed over 100 queries We see that mp-lsh-cc Table I and mp-lsh-cat Table II for T = 104 perform comparably well in terms of accuracy see the columns for recall@k But mp-lsh- CAT is much faster see query time and requires sinificantly less memory see storae per sample Table III shows the performance of mp-lsh-cc with equal memory requirement to mp-lsh-cat for T = 104 More specifically, we use different bit lenth for each dissimilarity measure, and set them to T CC = TCC L, T sin CC, T smp CC = 7, 346, 346, with which the memory budet is shared equally for each dissimilarity measure, accordin to Section III-E By comparin Table II and Table III, we see that mp-lsh-cc for T CC = 7, 346, 346, which uses similar memory storae per sample, ives sinificantly worse recall@k than mp-lsh-cat for T = Thus, we conclude that both mp-lsh-cc and mp-lsh-cat perform well, but we recommend the latter for the case of limited memory budet, or in applications where the query search time is crucial C Demonstration of Imae Retrieval with Mixed Queries Finally, we demonstrate the usefulness of our flexible mp- LSH in an imae retrieval task on the ILSVRC01 data set [17] We computed a feature vector for each imae by concatenatin the 4096-dimensional fc7 activations of the trained VGG16 model [18] with 10-dimensional color features 5 Since user preference vector is not available, we use classifier vectors, which are the weihts associated with the respective ImaeNet classes, as MIPS queries the entries correspondin to the color features are set to zero This simulates users who like a particular class of imaes We performed ANNS based on the MP dissimilarity by usin our mp-lsh-cat with T = 51 in the sample pool consistin of all N 1M imaes In Fiure 4a, each of the three rows consists of the query at the left end, and the correspondin top-ranked imaes In the first row, the shown black do imae was used as the L query q 1, and similar black do imaes were retrieved accordin to the L dissimilarity γ 1 = 10 and λ = 00 In the second row, the VGG16 classifier vector for trench coats was used as the MIPS query q, and imaes containin trench coats were retrieved accordin to the MIPS dissimilarity γ 1 = 00 and λ = 10 In the third row, imaes containin black trench coats were retrieved accordin to the mixed dissimilarity for γ 1 = 06 and λ = 04 Fiure 4b shows another example with a strawberry L query and 5 We computed historams on the central crop of an imae coverin 50% of the area for each rb color channel with 8 and 3 bins We normalized the historams and concatenate them

JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST 015

creams Mixed Query Mixed Query = 06 = 04 = 06 = 04 a

with mixed queries In both of a and b, the top row

with mp-lsh-cat for T = 51 accordin to the L

MIPS query and the imaes retrieved accordin to the IP

shows the imaes retrieved accordin to the mixed

MIPS query We see that, in both examples, mplsh-cat

are close to the L query, and relevant to the MIPS

demo6 V C ONCLUSION makin In addition we would like to

mechanism in terms of salient features that have lead

RODUCT IN P ROOF OF T HEOREM 1 e and x e, The inner

amounts of data, it becomes mandatory to increase

in Eq10, is iven by computationally involved Hashin, in

x w=1 =1 γ tive hashin LSH has become a hihly efficient

sublinear time, such as L 1 =1 γ kq k + kx k

inner product search MIPS While for typical

query has to be fixed beforehand, it is not!

data, w perhaps, even reweihtin this dynamically at

contributes exactly herefore, namely by proposin {z }

w kq x k methods which enable L-/cosine-distance NNS,

8 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST 015 L Query 8 L Query Mips Query Mips Query Trenchcoats Ice creams Mixed Query Mixed Query = 06 = 04 = 06 = 04 a Trench coats b Ice creams Fi 4 Imae retrieval results with mixed queries In both of a and b, the top row shows L query left end and the imaes retrieved by ANNS with mp-lsh-cat for T = 51 accordin to the L dissimilarity γ 1 = 10 and λ = 00, the second row shows MIPS query and the imaes retrieved accordin to the IP dissimilarity γ 1 = 00 and λ = 10, and the third row shows the imaes retrieved accordin to the mixed dissimilarity for γ 1 = 06 and λ = 04 the ice creams MIPS query We see that, in both examples, mplsh-cat handles the combined query well: it brins imaes that are close to the L query, and relevant to the MIPS query Other examples can be found throuh our online demo6 V C ONCLUSION makin In addition we would like to analyze the interpretability of the nonlinear query mechanism in terms of salient features that have lead to the query result A PPENDIX A D ERIVATION OF I NNER P RODUCT IN P ROOF OF T HEOREM 1 e and x e, The inner product between the aumented vectors q When queryin hue amounts of data, it becomes mandatory to increase efficiency, ie, even linear methods may be too defined in Eq10, is iven by computationally involved Hashin, in particular locality sensipw PG w w w> > e e q x = + λ q x w=1 =1 γ tive hashin LSH has become a hihly efficient workhorse PG w w that can yield answers to queries in sublinear time, such as L 1 =1 γ kq k + kx k /cosine-distance nearest neihbor search NNS or maximum inner product search MIPS While for typical applications PW PG w w> x = 1 w=1 =1 λ q the type of query has to be fixed beforehand, it is not! uncommon to query with respect to several aspects in data, w perhaps, even reweihtin this dynamically at query time w w> kq k + kx k q x + γ Our paper contributes exactly herefore, namely by proposin {z } three multiple purpose locality sensitive hashin mp-lsh w kq x k methods which enable L-/cosine-distance NNS, MIPS, and L {q w },x = kλk1 mp their weihted sums7 A user can now indeed and efficiently chane the importance of the weihts at query time without A PPENDIX B recomputin the hash functions Our paper has placed its L EMMA : I NNER P RODUCT A PPROXIMATION focus on provin the feasibilty and efficiency of the mp-lsh methods, and introducin the very interestin cover tree concept For q, x RL let which is less commonly applied in the machine learnin world PT f t1 dt q, x = T1 t=1 Hqt1 Hx for fast queryin over the defined multi-metric space Finally we provide a demonstration on the usefulness of our novel with expectation technique Future studies will extend the possibilities of mp-lsh for f 11 dq, x = EdT q, x = E Hq11 Hx further includin other types of dissimilarity measure, e, the distance from hyperplane [], and further applications with and define q> x Lq, x = 1 kqk combined queires, e, retrieval with one complex multiple purpose query, say, a pareto-front for subsequent decision Althouh a lot of hashin schemes for multi-modal data have been proposed [19], [0], [1], most of them are data-dependent, and do not offer adjustability of the importance weihts at query time Lemma 1: The followin statements hold: a: It holds that dq, x = 1 kxk 1 π ^q, x

9 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST b: For E x = 0105 x it is c: Let bq, x = 1 π and for Lq, x 1 it is d: It holds that Lq, x dq, x E x 19 q x q, then for Lq, x 1 it is Lq, x dq, x bq, x 1 Lq, x dq, x bq, x 1 Lq, x dq, x min{1 π Lq, x 1, E x} and for s x = 058 x, if Lq, x 1 s x, it is 1 π Lq, x 1 E x Proof a: Definin p col = 1 1 π q, x we have E Hq 11 Hx 11 = 1 x pcol x 1 pcol Proof b: = 1 x pcol 1 = 1 x 1 q, x π q x Lq, x dq, x = x q x For z = 1 4 π 1 + q, x π x max z 1 + z [ 1,1] π arccosz we obtain the maximum E x = x z 1 + π arccosz 0105 x Proof d: The inequality follows from b and c Lettin E x s x = x, π the first bound is tihter than E x, if Lq, x 1 s x Note that d T q, x dq, x as T Therefore all statements are also valid, replacin dq, x by d T q, x with T lare enouh For η w = 0, w we have W L mp {q w }, x = Recall that q L+ip APPENDIX C PROOF OF THEOREM w=1 =1 γ w = W w=1 γw q w + λ w 1 T D CAT H CAT q {q w }, H CAT x x = x λw q w x q w Therefore =1 γ x We use that + q L+ip 1 + x 1 T C q L+ip, x 1 T C q L+ip, x = T 19 where e E 1 such that T H x t Hq L+ip t t=1 x q L+ip x q L+ip + e, 1 T D CAT H CAT q {q w }, H CAT x x = = 1 =1 w=1 γ w q w =1 γ x + ql+ip 1 x q L+ip q L+ip + x e W [ ] x λw q w x + =1 q L+ip 1 W =1 w=1 γ w q w } {{ } const + =1 q L+ip x e } {{ } error = 1 L mp{q w }, x λ 1 + const + error We can bound the error-term by error max e {1,,G} E 1 q L+ip E 1 W G γ w =1 w=1 =1 q L+ip x x E 1 q L+ip 1 + λ w q w E 1 λ 1 + γ 1 APPENDIX D PROOF OF THEOREM 3 For γ w = λ w = 0, w we have L mp {q w }, x = =1 W w=1 =1 q w η w q w x q w x Recall that q cos = W w=1 ηw Therefore q w 1 T D CAT H CAT q {q w }, H CAT x x = q cos 1 1 T C q cos, x 19 =1 x q cos 1 x q cos + e q cos

10 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST where error = q cos =1 } {{ } const W = =1 w=1 η w =1 x q cos x + x q w =1 e q cos } {{ } error x q w + const + error = 1 L mp{q w }, x η 1 + const + error, max e {1,,G} E 1 W G η w =1 w=1 =1 q cos q w APPENDIX E PROOF OF THEOREM 4 / q w = E 1 η 1 Without loss of enerality we prove the theorem for the plain MIPS case with G = 1, W = 1 and λ = 1 Then α = 1 and the measure simplifies to D CAT H CAT q {q w }, H CAT x x = T d T q ip, x For C 1 q ip, x with µ = EC 1 q ip, x = T 1 1 π x, qip and 0 < δ 1 < 1, δ > 0 we use the followin Chernoff -bounds: P C 1 q ip, x 1 δ 1 µ { exp µ } δ 1 0 P C 1 q ip, x 1 + δ µ { exp µ } 3 min{δ, δ} 1 The approximate nearest-neihbor problem with r > 0 and c > 1 is defined as follows: If there exists an x with L ip q ip, x r then we return an x with L ip q ip, x < cr For cr > r +E 1 we can set T loarithmically dependent on the dataset size to solve the approximate nearest-neihbor problem for L ip, usin d T with constant success probability: For this we require a viable t that fulfills L ip q ip, x > cr dq ip, x > t and L ip q ip, x r dq ip, x <= t Namely set t = t1+t, where r + E 1, r 1 s 1 t 1 = 1 1 r π, r 1 s 1, 1 r, r 1 cr, cr 1 and t = 1 + cr 1 π, cr 1, 1 + s 1 cr E 1, cr 1 + s 1 In any case it is t > t 1 : First note that t 1 and t are strictly monotone increasin in r and cr, respectively It therefore suffices to show t t 1 for the lower bound t based on cr = r + E 1 Case r 1 s 1 : It is t 1 = r + E 1 and t = cr, where t 1 = r + E 1 = cr = t Case r 1 s 1, 1 E 1 ]: It is t 1 = 1 π 1 r and t = cr such that t 1 = 1 π 1 r r + E 1 = cr = t 1 π 1 r E 1 1 r s 1 r 1 s 1 Case r 1 E 1, 1]: It is t 1 = 1 π 1 r and t = 1 + π cr 1 with cr > 1 such that t 1 = 1 π 1 r π cr 1 = t Case r 1, 1+s 1 E 1 ]: It is t 1 = r and t = 1+ π cr 1 such that t 1 = r 1 + π r + E 1 1 = 1 + π cr 1 = t 1 π r 1 π 1 π E 1 + E 1 r 1 + s 1 E 1 Case r > 1 + s 1 E 1 : It is t 1 = r and t = cr E 1, where t 1 = r = cr E 1 = t Now, define δ = t dq ip, x 1 + x dq ip, x = T t dqip, x x µ For L ip q ip, x r we can lower bound the probability of d T q ip, x not exceedin the specified threshold: P d T q ip, x t = P Cq ip, x 1 δµ = 1 P Cq ip, x 1 δµ 0 1 exp { µ δ} We can show dq ip, x t 1, usin Lemma 1, c and d: Case r 1 s 1 : dq ip, x L ip q ip, x E 1 dq ip, x r + E 1 Case r 1 s 1, 1: dq ip, x L ip q ip, x 1 π 1 L ipq ip, x dq ip, x 1 π 1 L ipq ip, x 1 π 1 r = t 1 Case r 1: For L ip q ip, x 1 it is dq ip, x 1 Else dq ip, x L ip q ip, x such that dq ip, x max{1, L ip q ip, x} r = t 1 Thus we can bound and δ dqip,x t 1<t T t t 1 x µ δ µ T t t 1 16µ x 1 T t t = T t t 1 µ 4µ µ T T t t 1 16 such that P d T q ip, x t 1 exp { t t 1 } T 3,

11 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST For L ip q ip, x > cr we can upper bound the probability of d T q ip, x droppin below the specified threshold: P d T q ip, x t = P Cq ip, x 1 + δµ 1 { exp µ } 3 min{δ, δ } We can show dq ip, x t, usin Lemma 1, c and d: Case cr 1: For L ip q ip, x 1 it is dq ip, x 1 Else dq ip, x L ip q ip, x such that dq ip, x min{1, L ip q ip, x} cr = t Case cr 1, 1 + s 1 : L ip q ip, x dq ip, x 1 π L ipq ip, x 1 dq ip, x 1 + π L ipq ip, x 1 1 π cr 1 = t Case cr 1 + s 1 : L ip q ip, x dq ip, x E 1 dq ip, x cr E 1 = t Thus we can bound δ dqip,x t >t such that T t t x µ P d T q ip, x t exp µ T { exp min = exp t t 1 4 <1 = exp x 1 T t t µ = T t t 1, 4µ { { }} min T t t 1 1, T t t 1 48µ }} { T t t 1 1, T t t1 48 { { T3 min t t 1 4, t t 1 4 } { T 3 t t 1 4 Now, define the events = exp }} } { t t1 48 T E 1 q ip, x : either L ip q ip, x > r or d T q ip, x t E q ip : x X : L ip q ip, x > cr d T q ip, x > t 3 Assume that there exists x with L ip q ip, x r Then the alorithm is successful if both, E 1 q ip, x and E q ip hold simultaneously Let T 48 t t 1 lo n ε It is P E q ip =1 P x X : L ip q ip, x>cr, d T q ip, x t Also it holds 1 P L ip q ip, x > cr, d T q ip, x t x X } 1 n exp { t t1 48 T 1 ε P E 1 q ip, x ε 1 n3 Therefore the probability of the alorithm to perform approximate nearest neihbor search correctly is larer than P E q ip, E 1 q ip, x 1 P E q ip P E 1 q ip, x 1 ε ε n3 APPENDIX F DETAILS OF COVER TREE Here, we detail how to selectively explore the hash buckets with the code dissimilarity measure in non-increasin order The difficulty is in that the dissimilarity D is a linear combination of metrics, where the weihts are selected at query time Such a metric is referred to as a dynamic metric function or a multimetric [8] We use a tree data structure, called the cover tree [11], to index the metric space We bein the description of the cover tree by introducin the expansion constant and the base of the expansion constant Expansion Constant κ [1]: is defined as the smallest value κ ψ such that every ball in the dataset X can be covered by κ balls in X of radius equal 1/ψ Here, ψ is the base of the expansion constant Data Structure: Given a set of data points X, the cover tree T is a leveled tree where each level is associated with an inteer label i, which decreases as the tree is descended For ease of explanation, let B ψ ix denote a closed ball centered at point x with radius ψ i, ie, B ψ ix = {p X : Dp, x ψ i } At every level i of T except the root, we create a union of possibly overlappin closed balls with radius ψ i that cover or contain all the data points X The centers of this coverin set of balls are stored in nodes at level i of T Let C i denote the set of nodes at level i The cover tree T obeys the followin three invariants at all levels: 1 Nestin C i C i 1 Once a point x X is in a node in C i, then it also appears in all its successor nodes Coverin For every x C i 1, there exists a x C i where x lies inside B ψ ix, and exactly one such x is a parent of x 3 Separation For all x 1, x C i, x 1 lies outside B ψ ix and x lies outside B ψ ix 1 This structure has a space bound of ON, where N is the number of samples Construction: We use the batch construction method [11], where the cover tree T is built in a top-down fashion Initially, we pick a data point x 0 and an inteer s, such that the closed ball B ψ sx 0 is the tihtest fit that covers the entire dataset X This point x 0 is placed in a sinle node, called the root of the tree T We denote the root node as C i where i = s In order to enerate the set C i 1 of the child nodes for C i, we reedily pick a set of points includin point x 0 from C i to satisfy the Nestin invariant and enerate closed balls of radius ψ i 1 centered on them, in such a way that: a all center points lie inside B ψ ix 0 Coverin invariant, b no center point intersects with other balls of radius ψ i 1 at level i 1 Separation invariant, and c the union of these closed balls covers the entire dataset X These chosen center points form the set of nodes C i 1 Child nodes are recursively enerated from each node in C i 1, until each data point in X is the center of a closed ball and resides in a leaf node of T Note that, while we construct our cover tree, we use our distance function D with all the weihts set to 10, which upper bounds all subsequent distance metrics that depend on the queries The construction time complexity is Oκ 1 N ln N

12 JOURNAL OF LATEX CLASS FILES, VOL 14, NO 8, AUGUST To achieve a more compact cover tree, we store only element identification numbers IDs in the cover tree, and not the oriinal vectors Furthermore, we store the hash bits usin compressed representation bit-sets that reduce the storae size compared to a naive implementation down to T bits For mp-lsh-ca with G = 1, each element in the cover tree contains T bits and inteers For example, indexin a 18 dimensional vector naively requires 103 bytes, but indexin the fully aumented one requires only 4 bytes, yieldin a 977% memory savin 8 Queryin: The nearest neihbor search with a cover tree is performed as follows The search for the nearest neihbor beins at the root of the cover tree and descends level-wise On each descent, we build a candidate set C, which holds all the child nodes center points of our closed balls We then prune away centers nodes in C that cannot possibly lead to a nearest neihbor to the query point q, if we descended down them The prunin mechanism is predicated on a proven result in [11] which states that for any point x C i 1, the distance between x and any descendant x is upper bounded by ψ i Therefore, any center point whose distance from q exceeds min x C Dq, x + ψ i cannot possibly have a descendant that can replace the current closest center point to q and hence can safely be pruned We add an additional check to speedup the search by not always descendin to the leaf node The time complexity of queryin the cover tree is Oκ 1 ln N Effect of multi-metric distance while queryin: It is important to note that minimizin overlap between the closed balls on hiher levels ie, closer to the root of the cover tree can allow us to effectively prune a very lare portion of the search space and compute the nearest neihbor faster Recall that the cover tree is constructed by settin our distance function D with all the weihts set to 10 Durin queryin, we allow D to be a linear combination of metrics, where the weihts lie in the rane [0, 1], which means that the distance metric D used durin queryin always under-estimates the distances and reports lower distances Durin queryin, the cover tree s structure is still intact and all the invariant properties satisfied The main difference occurs in computation of min x C Dq, x, which is the shortest distance from a center point to the query q usin the new distance metric Interestinly, this new distance ets even smaller, thus reducin our search radius ie, min x C Dq, x + ψ i centered at q, which in turn implies that at every level we manae to prune more center points, as the overlap between the closed balls also is reduced Streamin: The cover tree lends itself naturally to the settin where nearest neihbor computations have to be performed on a stream of data points This is because the cover tree allows dynamic insertion and deletion of points The time complexity for both these operations is Oκ 6 ln N, which is faster than queryin Parameter choice: In our implementation for experiment, we set the base of expansion constant to ψ = 1, which we empirically found to work best on the texmex dataset Acknowledments: This work was supported by the German Research Foundation GRK 1589/1 by the Federal Ministry of Education and Research BMBF under the project Berlin Bi Data Center FKZ 01IS14013A REFERENCES [1] P Indyk and R Motwani, Approximate nearest neihbors: Towards removin the curse of dimensionality, in STOC, 1998, pp [] J Wan, H T Schen, J Son, and J Ji, Hashin for similarity search: A survey, arxiv:140897v1 [csds], 014 [3] A Shrivastava and P Li, Asymmetric LSH ALSH for sublinear time maximum inner product search MIPS, in NIPS, vol 7, 014 [4] Y Bachrach, Y Finkelstein, R Gilad-Bachrach, L Katzir, N Koenistein, N Nice, and U Paquet, Speedin up the Xbox recommender system usin a euclidean transformation for inner-product spaces, in Proc of RecSys, 014 [5] A Shrivastava and P Li, Improved asymmetric locality sensitive hashin ALSH for maximum inner product search MIPS, Proc of UAI, 015 [6] B Neyshabur and N Srebro, On symmetric and asymmetric lshs for inner product search, in ICML, vol 3, 015 [7] M Datar, N Immorlica, P Indyk, and V S Mirrokn, Locality-sensitive hashin scheme based on p-stable distributions, in SCG, 004, pp 53 6 [8] B Bustos, S Kreft, and T Skopal, Adaptin metric indexes for searchin in multi-metric spaces, Multimedia Tools and Applications, vol 58, no 3, pp , 01 [9] M X Goemans and D P Williamson, Improved approximation alorithms for maximum cut and satisfiability problems usin semidefinite prorammin, Journal of ACM, vol 4, no 6, pp , 1995 [10] M S Charikar, Similarity estimation techniques from roundin alorithms, in STOC, 00, pp [11] A Beyelzimer, S Kakade, and J Lanford, Cover trees for nearest neihbor, in ICML, 006, pp [1] J Heinonen, Lectures on analysis on metric spaces, ser Universitext, 001 [13] S Funk, Try this at home simon/journal/006111html, 006 [14] P Cremonesi, Y Koren, and R Turrin, Performance of recommender alorithms on top-n recommendation tasks, in Proc of RecSys, 010, pp [15] H Jéou, R Tavenard, M Douze, and L Amsale, Searchin in one billion vectors: re-rank with source codin, in ICASSP, 011, pp [16] D G Lowe, Distinctive imae features from scale-invariant keypoints, Int J Comput Vision, vol 60, no, pp , 004 [17] O Russakovsky, J Den, H Su, J Krause, S Satheesh, S Ma, Z Huan, A Karpathy, A Khosla, M Bernstein, A C Ber, and L Fei-Fei, ImaeNet Lare Scale Visual Reconition Challene, International Journal of Computer Vision IJCV, vol 115, no 3, pp 11 5, 015 [18] K Simonyan and A Zisserman, Very deep convolutional networks for lare-scale imae reconition, CoRR, vol abs/ , 014 [19] J Son, Y Yan, Z Huan, H T Schen, and J Luo, Effective multiple feature hashin for lare-scale near-duplicate video retrieval, IEEE Trans on Multimedia, vol 15, no 8, pp , 013 [0] S Moran and V Lavrenko, Reularized cross-modal hashin, in Proc of SIGIR, 015 [1] S Xu, S Wan, and YZhan, Summarizin complex events: a crossmodal solution of storylines extraction and reconstruction, in Proc of EMNLP, 013, pp [] P Jain, S Vijayanarasimhan, and K Grauman, Hashin hyperplane queries to near points with applications to lare-scale active learnin, in Advances in NIPS, We assume 4 bytes per inteer and 8 bytes per double here

On Symmetric and Asymmetric LSHs for Inner Product Search

On Symmetric and Asymmetric LSHs for Inner Product Search Behnam Neyshabur Nathan Srebro Toyota Technological Institute at Chicago, Chicago, IL 6637, USA BNEYSHABUR@TTIC.EDU NATI@TTIC.EDU Abstract We consider the problem of designing locality sensitive hashes