Lecture 8 January 30, 2014

Size: px

Start display at page:

Download "Lecture 8 January 30, 2014"

Angel Poole
5 years ago
Views:

1 MTH 995-3: Intro to CS and Big Data Spring 14 Inst. Mark Ien Lecture 8 January 3, 14 Scribe: Kishavan Bhola 1 Overvie In this lecture, e begin a probablistic method for approximating the Nearest Neighbor problem by ay of a locality sensitive hash function. We compute the to probabilities for the relevant hash function. Problem Given r R +, c > 1, and X := { x 1,..., x P } R D, compute f : [P ] [P ] { 1} such that 1. d( x j, x f(j) ) c r for all j [P ] such that j i [P ] ith d( x j, x i ) r; and. f(j) = 1 if there does not exist j i [P ] ith d( x j, x i ) c r. Remark 1 The above can easily be generalized to arbitrary metric spaces, here d is the metric, but in the folloing, e ill focus on R D ith d being the Euclidean -norm. Remark This is knon as the (c, r) - Nearest Neighbor Problem. Note that (1) and () do not uniquely determine a function, and moreover, it is possible that x f(j) is not the nearest neighbor to x j (see Figure 1). Hence, such an f is an approximation to the standard Nearest Neighbor problem. 3 Naive Solution 1. Compute every pairise distance x j x i, i j. This takes O(P D)-time.. Output the index of the closest point to each x j as f(j). Clearly such an f gives an exact solution to the Nearest-Neighbor problem, and thus also satisfies (1) and () in the Problem statement. We ish to approximate this Naive solution ith better runtime than O(P D). 1

2 Figure 1: Ambiguity of f 4 Idea Project x 1,..., x p onto a random vector or one dimensional subspace and then see ho far their projections are from one another (see Figure ): Runtime of Idea 1. Projecting all times is O(P D)-time (just inner products).. Finding close projected points is equivalent to sorting a list and thus has time-complexity O(P log(p )) (using, for example, merge-sort). Therefore, the total time-complexity is O(P (D + log(p ))). This is an approximation even to our approximated problem, and so e ould like error guarantees. 5 Solution Definition Call a random function h : R D Z a locality sensitive hash function if there is p 1, p (, 1) ith p 1 > p and such that the folloing holds for arbitrary x, y R D, (i) x y < r implies h( x) = h( y) ith probability at least p 1. (ii) if x y > c r, then h( x) = h( y) ith probability at most p.

Figure : Projection onto 1-Dimensional Subspace Remark A locality sensitive hash function h sends points that are close to the same integer and sends far points to different integers (this follos

3 Figure : Projection onto 1-Dimensional Subspace Remark A locality sensitive hash function h sends points that are close to the same integer and sends far points to different integers (this follos from (i) and (ii) and the fact that p 1 > p ). No consider the folloing random function: Pick R +. Then let G N(, I D D ) and U U([, ]). Finally, define h : R D Z as g, x + u h(x) = (*) here U = u and G = g are instances of the random variable and vector defined above. Remark The above notation means that g is a random vector ith independent, identically distributed, mean, and variance 1, Gaussian entries. Similarly, u is a random uniform variable from the closed interval [, ]. Theorem 1. The function h defined by (*) is a locality-sensitive hash function. Proof: Let x, y R D be arbitrary. Define the folloing to events A and B, A : h( x) = h( y) B : g, x y < Note, by the definition of h (*), if A occurs, then B occurs; that is P [B A] = 1. Therefore, Bayes La simplifies: P[A] P[B A] = P[B] P[A B] 3

4 P[A] = P[B] P[A B] (1) Then, by using the variable z := g, x y, and considering all possible values of z for hich event B is true, e may transform the right-hand side of (I) to an integral. Writing this all out, e get, P [h( x) = h( y)] = P [h( x) = h( y) z = g, x y ] P [z = g, x y ] dz () We no ish to simplify P [h( x) = h( y) z = g, x y ]. One can sho that for z, e have, P [h( x) = h( y) z = g, x y ] = z. This follos from considering the different values of u in (*) that ill either (i) shift the integer parts of g, x / and g, y / to be the same hen they are different, or (ii) shift them so that they stay the same hen they are already the same. So () becomes z P [ g, x y = z] dz = ( = x y π P [ g, x y = z] dz ( ) z exp x y dz z z exp P [ g, x y = z] dz ( z x y ) ) dz No riting n := x y, using the change of variables integral, e get, z n z, and integrating the second P [h( x) = h( y)] = π n exp( z )dz + [ ( ( ) ) ] n exp n 1 π (3) Define p (n) = P [h( x) = h( y)]. (3) shos that p (n) = so that p (n) < for all n (since n ). No, let s consider the to cases: (i) n r: this implies here e have defined p 1 above. ( ) e n 1 π ( ) ( ( ) ) r p (n) erf r + e r 1 =: p 1 4

5 (ii) n cr: this and p (n) < imply ( ) ( ( ) ) rc p (n) erf + e r 1 =: p rc here e have defined p above. Finally, e note p 1 and p satisfy the required properties in the definition of a locality sensitive hash function. In particular, p 1 > p. Therefore, h is a locality sensitive hash function for Euclidean distance. References [1] M. Datar, P. Indyk, N. Immorlica, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-stable Distributions. SCG 4 Proceedings of the tentieth annual symposium on Computational Geometry, pages 53-6, 4. [] P. Indyk and R. Motani. Approximate Nearest Neighbors: Toards Removing the Curse of Dimensionality. STOC 98 Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages ,

Lecture 14 October 16, 2014

CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 14 October 16, 2014 Scribe: Jao-ke Chin-Lee 1 Overview In the last lecture we covered learning topic models with guest lecturer Rong Ge.