A Randomized Approximate Nearest Neighbors Algorithm - a short version

Size: px

Start display at page:

Download "A Randomized Approximate Nearest Neighbors Algorithm - a short version"

Mervin Benson
5 years ago
Views:

1 We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where k is a user-specifie integer parameter. The algorithm is iterative, an its CPU time requirements are proportional to T N ( (log )+ k ( + log k) (log N)) + N k 2 ( + log k), with T the number of iterations performe. The memory requirements of the proceure are of the orer N ( + k). A byprouct of the scheme is a ata structure, permitting a rapi search for the k nearest neighbors among {x } for an arbitrary point x R. The cost of each such query is proportional to T ( (log ) + log(n/k) k ( + log k)), an the memory requirements for the requisite ata structure are of the orer N ( + k) + T ( + N). The algorithm utilizes ranom rotations an a basic ivie-an-conquer scheme, followe by a local graph search. We analyze the scheme s behavior for certain types of istributions of {x }, an illustrate its performance via several numerical examples. A Ranomize Approximate Nearest Neighbors Algorithm - a short version Peter W. Jones, Anrei Osipov, Vlaimir Rokhlin Research Report YALEU/DCS/TR-1439 Yale University January 13, 2011 This author s research was supporte in part by the DMS grant # an the ONR grants #N , #N ; this author s research was supporte in part by the AFOSR grant #FA ; this author s research was supporte in part by the ONR grant #N an the AFOSR grant #FA Approve for public release: istribution is unlimite. Keywors: Approximate nearest neighbors, ranomize algorithms, fast ranom rotations 1

2 1 Introuction In this paper, we escribe an algorithm for fining approximate nearest neighbors (ANN) in -imensional Eucliean space for each of N user-specifie points {x }. For each point x, the scheme prouces a list of k suspects, that have high probability of being the k closest points (nearest neighbors) in the Eucliean metric. Those of the suspects that are not among the true nearest neighbors, are close to being so. We present several measures of performance (in terms of statistics of the k chosen suspecte nearest neighbors), for ifferent types of ranomly generate ata sets consisting of N points in R. Unlike other ANN algorithms that have been recently propose (see e.g. [1]), the metho of this paper oes not use locality-sensitive hashing. Instea we use a simple ranomize ivie-an-conquer approach. The basic algorithm is iterate several times, an then followe by a local graph search. The performance of any fast ANN algorithm must eteriorate as the imension increases. While the running time of our algorithm only grows as log, the statistics of the selecte approximate nearest neighbors eteriorate as the imension increases. We provie bouns for this eterioration (both analytically an empirically), which occurs reasonably slowly as increases. While the actual estimates are fairly complicate, it is reasonable to say that in 20 imensions the scheme performs extremely well, an the performance oes not seriously eteriorate until is approximately 60. At = 100, the egraation of the statistics isplaye by the algorithm is quite noticeable. An outline of our algorithm is as follows: 1. Choose a ranom rotation, acting on R, an rotate the N given points. 2. Take the first coorinate, an ivie the ata set into two boxes, where the boxes are ivie by fining the meian in the first coorinate. 3. On each box from Step 2, we repeat the subivision on the secon coorinate, obtaining four boxes in total. 4. We repeat this on coorinates 3, 4, etc., until each of the boxes has approximately k points. 5. We o a local search on the tree of boxes to obtain approximately k suspects, for each point x. 6. The above proceure is iterate T times, an for each point x, we select from the T k suspects the k closest iscovere points for x. 7. Perform a local graph search on the collections of suspects, obtaine in Step 6 (we call this local graph search supercharging ). Among k 2 caniates obtaine from the local graph search, we select the best k points an eclare these the suspecte approximate nearest neighbors, or suspects. The ata structure generate by this algorithm allows one to fin, for a new ata point y, the k suspecte approximate nearest neighbors in the original ataset. This search is quite rapi, as we nee only follow the alreay generate tree structure of the boxes, obtaine in 2

3 the steps liste above. One can easily see that the epth of the binary tree, generate by Steps 1 through 4, is log 2 (N/k). This means that we can use the T trees generate, an then pass to Step 7. Almost all known techniques for solving ANN problems use tree structures (see e.g. [1], [2], [3]). Two apparently novel features of our metho are the use of fast ranom rotations (Step 1), an the local graph search (Step 7), which ramatically increases the accuracy of the scheme. We use the Fast Fourier Transform to generate our ranom rotations, an this accounts for the factor of log that appears in the running time. Our use of ranom rotations replaces the usual proection argument use in other ANN algorithms, where one proects the ata on a ranom subspace. As far as we know, the use of fast rotations for applications of this type appears first in [3] (see [4] an the references therein for a brief history). The use of ranom rotations (as in our paper) or ranom proections (as use elsewhere in ANN algorithms) takes avantage of the same unerlying phenomenon; namely the Johnson-Linenstrauss Lemma. (The JL Lemma roughly states that proection of N points on a ranom subspace of imension C(ε) (log N) has expecte istortion 1 + ε, see e.g. [5].) We have chosen to use ranom rotations in place of the usual ranom proections generate by selecting ranom Gaussian vectors. The fast ranom rotations require O( (log )) operations, which is an improvement over methos using ranom proections (see [6], [7]). The N k lookup table arising in Step 7 is the aacency matrix of a graph whose vertices are the points {x }. In Step 7 we perform a epth one search on this graph, an obtain k + k 2 caniates (of whom we select the suspects ). This accounts for the factor of k 2 in the running time. Due to egraation of the running time, we have chosen not to perform searches of epth greater than one. The algorithm has been teste on a number of artificially generate point istributions. Results of some of those tests are presente below. The paper is organize as follows. In the first section, we summarize the mathematical an numerical facts to be use in subsequent sections. In the secon section, we escribe the Ranomize Approximate Nearest Neighbors algorithm (RANN) an analyze its cost an performance. In the thir section, we illustrate the performance of the algorithm with several numerical examples. 2 Mathematical Preliminaries In this section, we introuce notation an summarize several facts to be use in the rest of the paper. 2.1 Degree of contact Suppose that L > 0 are positive integers. Suppose further that σ = σ 1... σ L, µ = µ 1...µ L, σ i, µ {+, } (1) are two wors of symbols +, of length L. We efine the egree of contact Con(σ, µ) between σ an µ to be the number of positions at which the corresponing symbols are 3

4 ifferent. In other wors, Con(σ, µ) = {i : 1 i L, σ i µ i }. (2) In a mil abuse of notation, we say that two isoint sets Aσ an Aµ (or their elements) have egree of contact if Con(σ, µ) =. For example, x an y have egree of contact 1 if x Aσ, y Aµ an σ, µ iffer at precisely one symbol. 2.2 Pseuoranom orthogonal transformations In this section, we escribe a fast metho (presente in [6], [7]) for the generation of ranom orthogonal transformations an their application to arbitrary vectors. Suppose that, M 1, M 2 > 0 are positive integers. We efine a pseuoranom - imensional orthogonal transformation Θ as a composition of M 1 +M 2 +1 linear operators M 1 Θ = =1 P () Q () P () F () M 1 +M 2 =M 1 +1 Q () P (). (3) The linear operators P () : R R with = 1,...,M 1 +M 2 are efine in the following manner. We generate permutations π 1,...,π M1 +M 2 of the numbers {1,...,}, uniformly at ranom an inepenent of each other. Then for all x R, we efine P () x by the formula ( ) x (i) = x(π (i)), i = 1,...,. (4) In other wors, P () permutes the coorinates of the vector x accoring to π. P () can be represente by a matrix P, efine by the formula { 1 l = π (k), P (k, l) = (5) 0 l π (k), for k, l = 1,...,. Obviously, the operators P () are orthogonal. The linear operators Q () : R R with = 1,...,M 1 +M 2 are efine as follows. We construct ( 1) (M 1 + M 2 ) inepenent pseuoranom numbers, θ (1),...,θ ( 1) with = 1,...,M 1 + M 2, uniformly istribute in (0, 2π). Then we efine the auxiliary linear operator Q,k : R R for k = 1,..., 1 to be the planar rotation of the coorinates k an k + 1 by the angle θ (k). In other wors, for all x R ( ) k (Q,k (x)) = k + 1 ( ) ( ) cos(θ (k)) sin(θ (k)) x(k), (6) sin(θ (k)) cos(θ (k)) x(k + 1) an the rest of the coorinates of Q,k (x) coincie with those of x. We efine Q () formula by the Q () = Q, 1 Q, 2 Q,1. (7) 4

5 Obviously, the operators Q () are orthogonal. The linear operator F () : R R is efine as follows. First suppose that is even an that 2 = /2. We efine the 2 2 iscrete Fourier transform matrix T by the formula T(k, l) = 1 2 exp [ 2πi(k 1)(l 1) 2 ], (8) where k, l = 1,..., 2 an i = 1. The matrix T represents a unitary operator C 2 C 2. We then efine the one-to-one linear operator Z : R C 2 by the formula x(1) + i x(2), Zx = x(3) + i x(4), (9) x(2 2 1) + i x(2 2 ) for all x R. Eventually, we efine F () by the formula F () = Z 1 T Z (10) for even. If is o, we efine F () x for all x R by applying F ( 1) to the first 1 coorinates of x an leaving its last coorinate unchange. Obviously, the operators T, Z, efine by (8), (9), respectively, preserve the norm of any vector x R. Therefore, F () is a real orthogonal transformation R R. The cost of the generation of a ranom permutation (see e.g. [8]) is O() operations. The cost of the application of each P () to a vector x R is obviously operations ue to (4). The cost of generation of 1 uniform ranom variables is O() operations. Also, the cost of application of each Q () to a vector x R is O() operations ue to (7). Finally, the cost of the fast iscrete Fourier transform is O( log ) operations, an the cost of the application of F () to a vector x R is O( log ) operations ue to (8), (9) an (10). Thus the cost of the generation of Θ efine via (3) is Cost(Θ) = O ( (M 1 + M 2 + log )). (11) Moreover, the cost of application of Θ to a vector x R is also given by the formula (11). Remark 1. It was observe that if M 1 an M 2 in (3) are chosen such that M 1 +M 2 log, then the istribution of Θ is close to the uniform istribution on the group O(, R) of orthogonal transformations from R to R. Remark 2. The use of the Haamar matrix (without 2 2 rotations) appears in a relate problem stuie by Ailon an Liberty [9]. 3 Ranomize Approximate Nearest Neighbors algorithm In this section, we escribe the Nearest Neighbor Problem an present a fast ranomize algorithm for its solution. 5

6 3.1 The Nearest Neighbor Problem Suppose that an k < N are positive integers an suppose that B = {x 1, x 2,...,x N } R (12) is a collection of N points in R. We are intereste in fining the k nearest neighbors of each point x i. The istance between the points is the stanar Eucliean istance. For each x i, one can compute in a straightforwar manner the istances to the rest of the points an thus fin the nearest neighbors. However, the total cost of the evaluation of the istances alone is O( N 2 ), which makes this naive approach prohibitively expensive when N is large. We propose a faster approximate algorithm for the solution of this problem. 3.2 Informal escription of the algorithm Initial selection The key iea of our algorithm is the following simple (an well known) observation. Suppose that for each x i we have foun a small subset V i of B such that a point insie V i is more likely to be among the k nearest neighbors of x i than a point outsie V i. Then it is reasonable to look for the nearest neighbors of each x i only insie V i an not among all the points. The nearest neighbors of x i in V i, which can be foun by irect scanning, are referre to as its suspecte approximate nearest neighbors, or suspects, as oppose to the true nearest neighbors { x t(i,) }. Of course, many of the k true nearest neighbors of x i might not be among its suspects. However, one can re-select V i to obtain another list of k suspects of x i. The initial guess is improve by taking the best k points out of the two lists. This scheme is iterate to successively improve the list of suspects of each x i. The performance of the resulting iterative ranomize algorithm amits the following crue analysis. Suppose that the size of V i is α N, with α 1. Suppose also that the number of the true nearest neighbors of x i insie V i is roughly β k, with α < β < 1. If the choice of V i is fairly ranom, then orer O(1/β) iterations of the algorithm are require to fin most of the true nearest neighbors of each x i. Temporarily neglecting the cost of the construction of V i, this results in O ( (α/β) N 2) operations instea of O ( N 2) operations for the naive algorithm. If α β, the improvement can be substantial. Our construction of V i s is base on geometric consierations. First, we shift all of the points to place their center of mass at the origin an apply a ranom orthogonal linear transformation on the resulting collection. Then, we choose a real number y(1) such that the first coorinate of half of the points in B is less than y(1). In other wors, We ivie all the points into two isoint sets {x B : x(1) < y(1)} = N/2. (13) B = {x B : x(1) < y(1)}, B + = {x B : x(1) y(1)}. (14) 6

7 Obviously, the sizes of B an B + are the same if N is even an iffer by one if N is o. Next, we set y + (2) to be a real number such that the secon coorinate of half of the points in B + is less than y + (2), i.e. {x B + : x(2) < y + (2)} = N/4. (15) We split B + into two isoint sets B + an B ++ by the same principle, e.g. B + = {x B + : x(2) < y + (2)}, B ++ = {x B + : x(2) y + (2)}. (16) We construct B an B + in a similar fashion by using a real number y (2) such that the secon coorinate of half of the points in B is less than y (2), i.e. {x B : x(2) < y (2)} = N/4. (17) Each of the four boxes B, B +, B +, B ++ contains either N/4 or N/4 + 1 points. Then we repeat the subivision by splitting each of the four boxes into two by using the thir coorinate, an so on. We procee until we en up with a collection of 2 L boxes {Bσ} with k or k + 1 points in each box. Here the box inex σ is a wor of symbols +, of length L as in (1) an L is a positive integer such that k 2 L N < k 2 L+1. In other wors, L is efine via the inequality k 2 L N < k 2 L+1. (18) Obviously, the sets { Bµ } constitute a complete binary tree of length L, whose noes are inexe by wors µ of symbols +, of length up to L. The set B is at the root of this tree, the sets B an B + are at the secon level, an so on. At the last level of the tree, there are 2 L boxes. The epth of the tree equals to L + 1. The notion of egree of contact (2) extens to the collection {Bσ} of boxes. Suppose that x i is in Bσ. Obviously, the higher egree of contact of two boxes Bσ an Bµ is, the less likely a point of Bµ will be among the k nearest neighbors of x i. Motivate by this observation, we efine the set V i as V i = { x Bµ : Con(σ, µ) 1 }. (19) In other wors, V i is the union of the box Bσ containing x i an L boxes whose egree of contact with Bσ is one. Thus for each i = 1,...,N, the set V i contains about k (L + 1) points Supercharging In the previous subsections, we have escribe an iterative scheme for the selection of suspects (suspecte approximate nearest neighbors) for each of the points x i in B. Suppose now that after T iterations of this scheme, the list x s(i,1),...,x s(1,k) of k suspects of each point x i has been generate. This list can be improve by a proceure we call supercharging. The iea of supercharging is base on the following observation. A true nearest neighbor of x i, misse by the scheme escribe above, might be among the suspects of one of x s(i,1),...,x s(1,k). This leas to the following obvious proceure. 7

8 For each x i, we enote by A i the list of suspects of all x s(i,1),...,x s(i,k). A i contains k 2 points, with possible repetitions. We compute the square of the istances from x i to each point in A i an fin the k nearest neighbors x t(i,1,ai ),...,x t(i,k,ai ) of x i in A i. Then we eclare the (upate) suspects of x i to be the best k points out of the two lists { } k x s(i,) =1 an { k x t(i,,ai )} =1. In other wors, supercharging is a epth one search on the irecte graph, whose vertices are the points {x i } an whose N k aacency matrix is the suspects inices {s(i, )}, with i = 1,...,N an = 1,...,k Overview We conclue this section with a list of the principal steps of the algorithm. Given the collection {x i } N i=1 of points in R, we perform the following operations. 1. Subtract from each x i the center of mass of the collection. 2. Choose a ranom orthogonal linear transformation Θ an set x i = Θ(x i ) for all i = 1,...,N. 3. Construct 2 L boxes {Bσ} as escribe above. 4. For each x i efine the set V i via (19). 5. Upate the suspects x s(i,1),...,x s(i,k) of x i by using its true nearest neighbors in V i. 6. Steps 2-5 are repeate T times. 7. For each x i, perform supercharging Query for a new point Suppose that we are given a new point y R, an we nee to fin its k nearest neighbors in B = {x 1,...,x N }. In this subsection, we escribe a rapi proceure to fin k approximate nearest neighbors of y. This proceure uses the following information, available on the th iteration of the algorithm, for = 1,...,T: 1. The orthogonal linear transformation Θ (), generate on the th iteration of the algorithm. 2. The collection of boxes { B σ () }, generate on the th iteration of the algorithm. To fin k approximate nearest neighbors of the new point y among the points of B, we perform the following operations. First, we apply Θ (1) on y, where Θ (1) is the orthogonal linear transformation of the first iteration of the algorithm. The resulting vector is enote by y (1), in other wors, y (1) = Θ (1) (y). (20) 8

9 { Next, in the collection of boxes B (1) σ }, generate on the first iteration of the algorithm, we fin the box B (1) σ(1) that has egree of contact zero with y(1). Note that if y ha belonge to B in the first place, then on the first iteration of the algorithm y (1) woul have belonge to B (1) σ(1). The box B (1) σ(1) has k or k + 1 points. Suppose that V (1) σ(1) having egree of contact zero or one with B (1) σ(1) (L + 1) k points, where L is efine via (18). is is the union of the boxes (see (19)). Recall that V (1) σ(1) has about We efine the set V (2) σ(2) in a similar manner, by using the ata of the secon iteration of the algorithm. We apply the orthogonal transformation Θ (2) of the secon iteration on y (1) to obtain y (2), i.e. ( y (2) = Θ (2) y (1)) ( ) = Θ (2) Θ (1) (y), (21) { ue to (20). In the boxes } σ B (2) of the secon iteration, we fin the box B (2) σ(2), having egree of contact zero with y (2). Similar to V (1) (2) σ(1), the set V σ(2) has about (L+1) k points. We repeat this proceure to construct the sets V () σ() for = 3, 4,...,T, where T is the number of the iterations of the algorithm. Each V () σ() contains roughly (L + 1) k points. Finally, we efine the set A to be the union of all the sets V () σ(), in other wors, A = T =1 V () σ(). (22) The set A contains about T k (L + 1) points. The k nearest neighbors of y insie A are eclare to be the approximate nearest neighbors of y insie B. We note that to construct A we nee to store the corresponing ata on each iteration of the algorithm. Once the k suspects of y in B have been foun, they can be improve by performing supercharging on y only. 3.3 Cost analysis In this subsection, we analyze the cost of the algorithm in terms of number of operations. Also, we analyze the memory requirements of the algorithm. We recall that x 1,...,x N is a collection of N points in R an N k 2 L. 1. Centralizing the points costs O(N ). 2. Generating a pseuoranom orthogonal transformation an applying it to N points costs O(N (log )) (see (11)). 3. Constructing the binary tree of boxes of epth L = log 2 (N/k) costs O(N L). 4. The suspects of each point x i are selecte by computing the istances from x i to k (L + 1) points in V i (see (19)) an fining k minimal istances. Thus the cost of this step is O(N L k ( + log k)). 9

10 5. In supercharging, k 2 points are scanne to upate the suspects of each point. Thus the cost of supercharging is O(N k 2 ( + log k)). We conclue that the total cost of the algorithm is C RANN = O (T N ( (log ) + k ( + log k) (log N))) + O(N k 2 ( + log k)), (23) where T is the number of iterations. We observe that for fixe imension an number of require nearest neighbors k, the cost is O(T N log N), as oppose to O(N 2 ) of the naive approach. Also, the cost of supercharging is quaratic in the number of nearest neighbors for fixe imension an number of points N, which makes supercharging expensive relative to a single iteration of the principal part of the algorithm even for moerate k. Query for a new point y consists of performing all the steps of the algorithm on one point only. Consequently, ue to (23) the total cost of query for a new point is C query = O (T ( (log ) + k ( + log k) (log N))) + O(k 2 ( + log k)). (24) To etermine the memory requirements of the algorithm, we observe that to store N points in R we nee O(N ) memory, to store the inices of k nearest neighbors of each of N points we nee O(N k) memory, an to store a tree of boxes (with the corresponing orthogonal transformation) we nee O(N) memory. We must istinguish between two cases. In the first case, given N points {x i } in R, we are intereste in fining k nearest neighbors for each x i only. In other wors, no query for a new point will ever be requeste. Then, the tree of boxes can be estroye after each iteration. Hence, in this case the algorithm uses O(N ( + k)) memory. In other wors, in this case the memory requirements are minimal, in the sense that most of the memory is spent on the storage of input an output of the algorithm only. In the secon case, we know in avance that queries for new points will be requeste. Therefore we nee to store T trees of boxes an the corresponing orthogonal transformations. Thus, when queries for new points are allowe, the total memory requirements are of the orer M query = O (N ( + k + T)). (25) Remark 3. The factor log k in (23), (24) can be omitte, if the suspects of every x i are not require to be sorte accoring to their istance to x i. 3.4 Performance analysis In this subsection, we iscuss the performance analysis of the Ranomize Approximate Nearest Neighbor algorithm in the case of stanar Gaussian istribution of the points. To be more specific, we consier the collection of N inepenent stanar Gaussian - imensional ranom vectors x 1,...,x N, where the number of points is given by the formula N = k 2 L (26) 10

11 for some positive integer L > 0. We recall that for each point x i, the algorithm approximates the k true nearest neighbors { } k x t(i,) =1 of x i by k suspects { } k x s(i,). In orer to analyze =1 the quality of this approximation, we introuce a number of statistical quantities. First, we efine the average square of the istance from x i to its k true nearest neighbors by the formula D true i = 1 k k x i x t(i,) 2. (27) =1 Next, we efine the average square of the istance from x i to its k suspects by the formula D susp i = 1 k k x i x s(i,) 2. (28) =1 Finally, we efine the proportion of the true nearest neighbors of x i among its suspects by the formula prop i = 1 { } k x k t(i,) =1 { } k x s(i,). (29) =1 We analyze a single iteration of the algorithm (selection of suspects without supercharging) by evaluating the expecte values of (27), (28), (29) numerically (see Theorems 1, 2, 3 below). On the other han, the performance of the algorithm can be stuie empirically, by running the algorithm on artificially generate sets of points. In the rest of this subsection, we outline the results of the analysis (see [10] for etails). We start with introucing some notation. Suppose that > 0 is a positive integer, an λ, α, β > 0 are positive real numbers. The istributions χ 2, χ2 (, λ) an B(α, β) are efine by their probability ensity functions, given respectively via the formulas f χ 2(t) = t/2 1 e t/2 2 /2, Γ(/2) t > 0, (30) f χ 2 (,λ)(t) = e λ/2 (λ/2) f! χ 2 (t), +2 t > 0, (31) f B(α,β) (t) = =0 Γ(α + β) Γ(α)Γ(β) tα 1 (1 t) β 1, 0 < t < 1. (32) Suppose that a > 0 is a positive real number. Suppose also that L > 0 are positive integers, an a R is a vector all of whose coorinates are positive. We efine the functions 11

12 h a, h + a, ha : R C via the formulas [ ia h 2 x a (x) = exp 1 2ix [ ia h + 2 x a (x) = exp 1 2ix ] [ ] erfc iax 2 1 2ix ] ha(x) = exp ix 1 2ix 1 1 2ix, (33) 2 h a (x), (34) 1 2ix ( ) a() 2 1 L 1 2ix =L+1 L L h + a() (x) + L h a(i) (x) h + a() (x). (35) =1 i=1 =1 i Also, we efine the function Ga : (0, ) R via the formula Ga(y) = 1 2π (L + 1) y 0 e ixt ha(x) x t. (36) The following theorem provies an analytical formula for the expectation E [ DN true ] (see (27)). Theorem 1. Suppose that, k, N > 0 are positive integers. Suppose further that DN true is efine by (27). Then its expectation is given by the formula E [ DN true ] = ( 1 k ) 1 F 1 k χ 2 (,λ) (t) f B(i,N i)(t) t f χ 2(λ) λ, (37) 0 i=1 0 where the functions f χ 2, f B(i,N i) are efine respectively by (30), (32), an F 1 is the χ 2 (,λ) inverse of the cf of χ 2 (, λ) (see (31)). The following theorem provies an approximation to the expectation E [ D susp ] N (see (28)). The error of this approximation is verifie via numerical experiments (see [10] for more etails). Theorem 2. Suppose that k > 0 an L > 0 are positive integers. We efine the real number D susp appr by the formula D susp appr = 0 Avg a S + (λ) [ 1 k k i=1 ( G 1 a i k L + k + 1 ) ] f χ 2(λ) λ, (38) where the function f χ 2 is efine via (30), the function G 1 a is the inverse of G a efine via (36), the set S + (λ) is the positive part of the -imensional hypersphere of raius λ, efine by the formula } S + {x (λ) = R : x 2 = λ, x() > 0, 1, (39) 12

13 an the average (Avg) is taken with respect to the ( 1)-imensional area. Then, E [ D susp ] ( N = D susp appr + O k 1/2), k. (40) In other wors, (40) hols, if we fix, L an let k. The following theorem provies an approximation to the expectation E[prop N ] (see (29)). The error of this approximation is verifie via numerical experiments (see [10] for more etails). Theorem 3. Suppose that k > 0 an L > 0 are positive integers. We efine the real number P appr by the formula P appr = (L + 1) 0 Avg a S + (λ) [ ( )] Ga F 1 χ 2 (,λ) (2 L ) f χ 2(λ) λ, (41) where the function f χ 2 is efine via (30), the function F 1 is the inverse of the cf of χ 2 (,λ) χ 2 (, λ), efine via (31), the function Ga is efine via (36), the set S + (λ) is efine via (39), an the average (Avg) is taken with respect to the ( 1)-imensional area. Then, ( E[prop N ] = P appr + O k 1/2), (42) where the real ranom variable prop N is efine via (29). In other wors, (42) hols, if we fix, L an let k. 4 Numerical Results This section has two principal purposes. First, we numerically evaluate the expectations of (27), (28), (29) by using Theorems 1, 2, 3 above. Secon, we emonstrate the performance of the algorithm empirically, by running it on sets of points, generate accoring to the Gaussian istribution. The choice of uniform or Hamming istributions instea of Gaussian results in very similar performance (see [10] for results an etails). The algorithm has been implemente in FORTRAN (Lahey 95 Linux version). The numerical experiments have been carrie out on a Lenovo laptop computer, with DualCore CPU 2.53 GHz an 2.9GB RAM. Experiment 1. In this experiment, we choose k = 30, an for L = 10, 15, 20, 25 set N via (26). Then, for ifferent values of between L an 110, we approximate the expectations E [ DN true ] [, E D susp] N, E[propN ] via the numerical evaluation of (37), (38), (41), respectively. The approximations are enote by Dnum, true Dnum, susp P num, respectively, an are accurate to roughly two ecimal igits. Also, we compute the ratio Ratio susp true = Dsusp num Dnum true. (43) 13

14 The results of Experiment 1 are shown in Figure 1. In Figure 1 (left), we plot Ratio susp true as a function of the imensionality, for each value of L. In Figure 1 (right), we plot P num as a function of the imensionality, for each value of L (the number of points N is etermine via (26)). This quantity estimates the average proportion of the suspects among the true nearest neighbors (foun by one iteration of RANN without supercharging). Several observations can be mae from Experiment 1 (see Figure 1) an from more etaile experiments by the authors. 1. The ratio Ratio susp true, efine via (43), slowly increases with the number of points N, for fixe values of number k of nearest neighbors an imensionality (see Figure 1, left). In other wors, the performance of RANN eteriorates with the number of points, if terms of Ratio susp true. However, as number of points is multiplie by 2, this ratio grows by a factor less than 1.1 for most values of on Figure 1 (left). 2. The ratio Ratio susp true actually ecreases with imensionality, for fixe k an N (see Figure 1, left). For example, for N 10 6, this ratio ecays from the value of about 1.57 at = 20 to roughly 1.3 at = 110. This observation is not surprising, since the number of points within the same fixe relative istance (e.g. twice the istance to true nearest neighbors) grows with the imensionality. 3. The proportion of true nearest neighbors among suspects ecays with for fixe k, N, as expecte (see Figure 1, right). However, even for = 40 an N 10 6, RANN correctly fins about 2.7% of true nearest neighbors, on merely one iteration without supercharging. In other wors, after 50 iterations without supercharging RANN will correctly etect about 75% = ( ) 50 (44) true nearest neighbors, for = 40 an N 10 6 (see also Experiment 2 below). Experiment 2. In this experiment, we run RANN with various parameters on ifferent sets of points an report on the resulting statistics. We choose the imensionality an the number of points N. Then we choose the number of nearest neighbors k, the number of iterations of the algorithm T > 0, an whether to perform the supercharging or not. Next, we generate N i.i.. stanar Gaussian vectors x 1,...,x N in R. We run RANN on x 1,...,x N. For each x i, RANN fins its k suspects, x s(i,1),...,x s(i,k). Also, for all i = 1,...,2000, we fin the list x t(i,1),...,x t(i,k) of k true nearest neighbors of x i, by irect scanning. Then, we compute the quantities Di true, D susp i, prop i, efine via (27), (28), (29), respectively, for i = 1,...,2000. We efine the average Dalgo true by the formula Dalgo true = Di true. (45) 2000 The averages D susp algo, prop algo are efine an compute by the same token. i=1 14

15 Generation of the points an invocation of RANN are repeate 20 times, to obtain the values Dalgo true (1),..., Dtrue algo (20), all efine via (45). Then, we efine the sample mean of via the formula D true algo E smpl [ D true ] = i=1 Dalgo true (i). (46) The sample means E smpl [D algo ] an E smpl [ propalgo ] are efine in a similar way. Finally, we compute the ratio ratio smpl = E smpl [D algo ] E smpl [D true ]. (47) The parameters for Experiment 2 were chosen in the following way. The number of points was N = The number of requeste nearest neighbors was k = 15, 30 or 60. The imensionality was chosen to be a multiple of 5 between 15 an 200. The number of iterations of the algorithm was either T = 1 or T = 10. The supercharging step was either skippe or performe once after T iterations. Most of the approximations have been compute with relative error up to 2%. The results are shown in Figures 2, 3 an in Table 1. In Figures 2, 3 (left), we plot ratio smpl (see (47)) as a function of the imensionality for k = 15 an k = 60, respectively. In Figures 2, 3 (right), we plot E smpl [ propalgo ] as a function of the imensionality for k = 15 an k = 60, respectively. Each figure contains four curves, corresponing to one iteration without supercharging, one iteration with supercharging, ten iterations without supercharging an ten iterations with supercharging. Table 1 has the following structure. The first column contains the imensionality. The next three columns contain the CPU time of 10 iterations of RANN without supercharging, with the requeste number of nearest neighbors being k = 15, 30, 60, respectively. The last three columns contain the CPU time of the supercharging only (after 10 iterations of RANN), with the requeste number of nearest neighbors being k = 15, 30, 60, respectively. The CPU time is shown in secons. Several observations can be mae from Table 1 an Figures 2, 3 an from more etaile experiments by the authors. In these observations, we refer to ratio smpl as ratio, an to E smpl [ propalgo ] as proportion. We recall that, roughly speaking, the proportion measures how many of the true nearest neighbors have been foun by RANN. On the other han, the ratio measures how much average istances to suspects iffer from the average istances to true nearest neighbors. 1. As expecte, for a fixe the performance of RANN improves as the number T of iterations increases, both in terms of ratio an proportion. 2. As expecte, for a fixe the performance of RANN improves if supercharging is performe, both in terms of ratio an proportion. 3. For a fixe, the performance of RANN improves as the number of requeste nearest neighbors k increases, both in terms of ratio an proportion (at the expense of running time). 15

16 4. For a fixe, the effects of supercharging (especially on proportion) increase as k grows. For example, in Figure 2, right (k = 15) we observe, that, for T = 10 an = 60, supercharging increases the proportion from 22% to 32%. On the other han, in Figure 3, right (k = 60) we observe that, for T = 10 an = 60, supercharging increases the proportion from 43% to 74%. 5. As expecte, the performance of RANN slowly eteriorates in terms of proportion, as increases. On the other han, there is no significant eterioration of performance in terms of ratio, as increases. 6. For T = 10 the ratio is always below 1.1, with or without supercharging. 7. Even for as high a imension as = 60, as few as ten iterations with supercharging correctly etermine at least 30% of the true nearest neighbors. Moreover, the error of this etection ecays exponentially with number of iterations T. 8. The running time of RANN, with or without supercharging, grows roughly linearly with imensionality, as expecte from (23) (see Table 1). 9. For fixe imensionality, the running time of RANN without supercharging grows roughly linearly with the requeste number of nearest neighbors k, as expecte from (23) (see Table 1). For example, 10 iterations of RANN in the case of = 200 take about 80, 124 an 208 secons for k = 15, 30, 60, respectively. 10. For fixe imensionality, the running time of the supercharging step grows roughly quaratically with the requeste number of nearest neighbors k, as expecte from (23) (see Table 1). For example, in the case of = 200 supercharging takes about 15, 57 an 235 secons for k = 15, 30, 60, respectively. 11. The algorithm has been teste on sets of points having non-gaussian istribution, e.g. the uniform istribution in the imensional cube [0, 1] or Hamming istribution (i.e. uniform istribution on the iscrete set of the vertices of [0, 1] ). For both uniform an Hamming istributions, the performance of the algorithm was very similar to that in the Gaussian case (see [10] for etails). 12. The algorithm has been teste on sets of points, having Gaussian istribution whose covariance matrix is not the ientity matrix. These tests seem to inicate that the performance of RANN improves as the conition number of the covariance matrix grows. As an extreme example, if the points belong to a q imensional linear subspace of R, the performance of the algorithm oes not epen on (though the running time obviously oes). References [1] A. Anoni, P. Inyk (2008) Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, Communications of the ACM, 51(1):

17 10 iterations, without supercharging supercharging after 10 iterations k = 15 k = 30 k = 60 k = 15 k = 30 k = E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E+03 Table 1: CPU time of RANN, in secons. Number of points: 122,880. [2] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu (1998) An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixe Dimensions, Journal of the ACM, 45(6): [3] N. Ailon, B. Chazelle (2009) The Fast Johnson Linenstrauss Transform an Approximate Nearest Neighbors, SIAM J. Comput., 39(1): [4] N. Ailon, B. Chazelle (2010) Faster Dimension Reuction, Commun. ACM, 53(2): [5] W. Johnson, J. Linenstrauss (1984) Extensions of Lipschitz mappings into a Hilbert space, Contemporary Mathematics, 26: [6] V. Rokhlin, M. Tygert (2008) A fast ranomize algorithm for overetermine linear least-squares regression, Proc Natl Aca Sci USA 105(36): [7] E. Liberty, F. Woolfe, P.G. Martinsson, V. Rokhlin, M. Tygert (2007) Ranomize algorithms for the low-rank approximation of matrices, Proc Natl Aca Sci USA 104: [8] D. Knuth (1969) in Seminumerical Algorithms, vol. 2 of The Art of Computer Programming, Reaing, Mass: Aison-Wesley. [9] N. Ailon, E. Liberty (2010) Almost Optimal Unrestricte Fast Johnson-Linenstrauss Transform, eprint arxiv: [10] P.W. Jones, A. Osipov, V. Rokhlin (2010) A Ranomize Approximate Nearest Neighbors Algorithm, Yale technical report. 17

18 1.6 k = k = 30 D susp / D true proportion Figure 1: Number of points: N = (blue ots), (re squares), (black circles), (green pluses). k = 15, N = 122, k = 15, N = 122, D susp / D true proportion Figure 2: Without supercharging: one iteration (blue ots), ten iterations (black cirlces). With supercharging: one iteration (re squares), ten iterations (green pluses). k = 60, N = 122, k = 60, N = 122, D susp / D true proportion Figure 3: Without supercharging: one iteration (blue ots), ten iterations (black cirlces). With supercharging: one iteration (re squares), ten iterations (green pluses). 18

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration