Optimal Data-Dependent Hashing for Approximate Near Neighbors

Similar documents
Geometry of Similarity Search

Locality Sensitive Hashing

Lecture 14 October 16, 2014

Lecture 17 03/21, 2017

Approximate Nearest Neighbor (ANN) Search in High Dimensions

Proximity problems in high dimensions

Similarity searching, or how to find your neighbors efficiently

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Spectral Hashing: Learning to Leverage 80 Million Images

Similarity Search in High Dimensions II. Piotr Indyk MIT

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.

Algorithms for Nearest Neighbors

Distribution-specific analysis of nearest neighbor search and classification

Beyond Locality-Sensitive Hashing

4 Locality-sensitive hashing using stable distributions

Optimal Lower Bounds for Locality Sensitive Hashing (except when q is tiny)

On the Optimality of the Dimensionality Reduction Method

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Randomized Algorithms

arxiv: v2 [cs.ds] 21 May 2017

LSH Forest: Practical Algorithms Made Theoretical

Algorithms for Data Science: Lecture on Finding Similar Items

Locality-sensitive Hashing without False Negatives

arxiv: v2 [cs.ds] 17 Apr 2018

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Approximate Voronoi Diagrams

Linear Spectral Hashing

Nearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

Term Filtering with Bounded Error

Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas

Cell-Probe Lower Bounds for the Partial Match Problem

Tailored Bregman Ball Trees for Effective Nearest Neighbors

An efficient approximation for point-set diameter in higher dimensions

Making Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI

A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match

Locality-Sensitive Hashing for Chi2 Distance

Hash-based Indexing: Application, Impact, and Realization Alternatives

A Fast and Simple Algorithm for Computing Approximate Euclidean Minimum Spanning Trees

Optimal Las Vegas Locality Sensitive Data Structures

On Symmetric and Asymmetric LSHs for Inner Product Search

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search

Nearest-Neighbor Searching Under Uncertainty

Practical and Optimal LSH for Angular Distance

Parameter-free Locality Sensitive Hashing for Spherical Range Reporting

Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points

Data-Dependent Hashing via Nonlinear Spectral Gaps

Nearest Neighbor Preserving Embeddings

B490 Mining the Big Data

Lower bounds on Locality Sensitive Hashing

On Approximate Nearest Neighbors in. Non-Euclidean Spaces. Piotr Indyk. Department of Computer Science. Stanford University.

Fast Algorithms for Constant Approximation k-means Clustering

A Comparison of Extended Fingerprint Hashing and Locality Sensitive Hashing for Binary Audio Fingerprints

arxiv: v2 [cs.ds] 10 Sep 2016

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances

arxiv: v1 [cs.db] 2 Sep 2014

Succinct Data Structures for Approximating Convex Functions with Applications

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Algorithms for Calculating Statistical Properties on Moving Points

A New Algorithm for Finding Closest Pair of Vectors

6 Distances. 6.1 Metrics. 6.2 Distances L p Distances

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Navigating nets: Simple algorithms for proximity search

Fast Dimension Reduction

Overcoming the l 1 Non-Embeddability Barrier: Algorithms for Product Metrics

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Space-Time Tradeoffs for Approximate Nearest Neighbor Searching

Improved Consistent Sampling, Weighted Minhash and L1 Sketching

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Approximate Pattern Matching and the Query Complexity of Edit Distance

Set Similarity Search Beyond MinHash

Probabilistic clustering of high dimensional norms. Assaf Naor Princeton University SODA 17

43 NEAREST NEIGHBORS IN HIGH-DIMENSIONAL SPACES

Geometric Optimization Problems over Sliding Windows

Lattice-based Locality Sensitive Hashing is Optimal

arxiv: v1 [cs.cv] 3 Jun 2015

Locality Sensitive Hashing Revisited: Filling the Gap Between Theory and Algorithm Analysis

Distance-Sensitive Bloom Filters

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Geometric Problems in Moderate Dimensions

Optimal compression of approximate Euclidean distances

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract

Hardness of Embedding Metric Spaces of Equal Size

Lecture 12: Lower Bounds for Element-Distinctness and Collision

Data Mining und Maschinelles Lernen

arxiv: v1 [cs.ds] 21 Dec 2018

Sparse Johnson-Lindenstrauss Transforms

Sieving for shortest vectors in lattices using angular locality-sensitive hashing

Minimal Loss Hashing for Compact Binary Codes

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing

Super-Bit Locality-Sensitive Hashing

MIRA, SVM, k-nn. Lirong Xia

Reporting Neighbors in High-Dimensional Euclidean Space

LOCALITY SENSITIVE HASHING FOR BIG DATA

16 Embeddings of the Euclidean metric

Lecture 8 January 30, 2014

Isotropic Hashing. Abstract

Coresets for k-means and k-median Clustering and their Applications

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

Transcription:

Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30

Nearest Neighbor Search (NNS) Let P be an n-point subset of R d equipped with some metric Given a query report the closest point from P 2 / 30

Nearest Neighbor Search (NNS) Let P be an n-point subset of R d equipped with some metric Given a query report the closest point from P Space, query time Correct for a fixed query with probability 0.9 2 / 30

Nearest Neighbor Search (NNS) Let P be an n-point subset of R d equipped with some metric Given a query report the closest point from P Space, query time Correct for a fixed query with probability 0.9 Interesting metrics: ( d ) 1/2 Euclidean (l 2 ): x y 2 = i=1 x i y i 2 Manhattan, Hamming (l 1 ): x y 1 = d i=1 x i y i l, edit distance, Earth Mover s distance etc. 2 / 30

The case d = 2 with Euclidean distance Build Voronoi diagram of P Given q perform point location 3 / 30

The case d = 2 with Euclidean distance Build Voronoi diagram of P Given q perform point location Performance: Space O(n) Query time O(log n) 3 / 30

Applications Pattern recognition, statistical classification, computer vision, computational geometry, databases, recommendation systems, DNA sequencing, spell checking, plagiarism detection, clustering etc. 4 / 30

Applications Pattern recognition, statistical classification, computer vision, computational geometry, databases, recommendation systems, DNA sequencing, spell checking, plagiarism detection, clustering etc. Approximate string matching A text T and a query S; substring of T closest to S Dimension: S 4 / 30

Applications Pattern recognition, statistical classification, computer vision, computational geometry, databases, recommendation systems, DNA sequencing, spell checking, plagiarism detection, clustering etc. Approximate string matching A text T and a query S; substring of T closest to S Dimension: S Machine learning: nearest neighbor rule The closest labeled example Dimension: number of features 4 / 30

Applications Pattern recognition, statistical classification, computer vision, computational geometry, databases, recommendation systems, DNA sequencing, spell checking, plagiarism detection, clustering etc. Approximate string matching A text T and a query S; substring of T closest to S Dimension: S Machine learning: nearest neighbor rule The closest labeled example Dimension: number of features Near-duplicate document retrieval Bag of words Dimension: number of English words! 4 / 30

How to deal with many dimensions? Voronoi diagram takes n Ω(d) space 5 / 30

How to deal with many dimensions? Voronoi diagram takes n Ω(d) space All known data structures have space exponential in d 5 / 30

How to deal with many dimensions? Voronoi diagram takes n Ω(d) space All known data structures have space exponential in d (Dobkin, Lipton 1976), (Yao, Yao 1985), (Clarkson 1988), (Agrawal, Matoušek 1992), (Matoušek 1992), (Meister 1993) 5 / 30

How to deal with many dimensions? Voronoi diagram takes n Ω(d) space All known data structures have space exponential in d (Dobkin, Lipton 1976), (Yao, Yao 1985), (Clarkson 1988), (Agrawal, Matoušek 1992), (Matoušek 1992), (Meister 1993) Linear scan K-d trees, if d is relatively low 5 / 30

How to deal with many dimensions? Voronoi diagram takes n Ω(d) space All known data structures have space exponential in d (Dobkin, Lipton 1976), (Yao, Yao 1985), (Clarkson 1988), (Agrawal, Matoušek 1992), (Matoušek 1992), (Meister 1993) Linear scan K-d trees, if d is relatively low Simplify the problem! 5 / 30

Outline Approximate NNS Locality-Sensitive Hashing (LSH) and Beyond Known LSH Families New Data Structure 6 / 30

The Approximate Nearest Neighbor Problem Let P be an n-point subset of R d equipped with some metric, c > 1 Given a query q R d report a c-approximate nearest neighbor from P p q c min p P p q 7 / 30

Literature Exponential dependence on the dimension: (Arya, Mount 1993), (Clarkson 1994), (Arya, Mount, Netanyahu, Silverman, We 1998), (Kleinberg 1997), (Har-Peled 2002) Polynomial dependence on the dimension: (Indyk, Motwani 1998), (Kushilevitz, Ostrovsky, Rabani 1998), (Indyk 1998), (Indyk 2001), (Gionis, Indyk, Motwani 1999), (Charikar 2002), (Datar, Immorlica, Indyk, Mirrokni 2004), (Chakrabarti, Regev 2004), (Panigrahy 2006), (Ailon, Chazelle 2006), (Andoni, Indyk 2006), (Andoni, Indyk, Nguy ên, Razenshteyn 2014), (Bartal, Gottlieb 2014), (Andoni, Razenshteyn 2015) 8 / 30

The Approximate Near Neighbor Problem (ANN) Let P be an n-point subset of R d, r > 0, c > 1 For q R d find any point from P within cr, if one is promised to have a point within r 9 / 30

The Approximate Near Neighbor Problem (ANN) Let P be an n-point subset of R d, r > 0, c > 1 For q R d find any point from P within cr, if one is promised to have a point within r r cr 9 / 30

The Approximate Near Neighbor Problem (ANN) Let P be an n-point subset of R d, r > 0, c > 1 For q R d find any point from P within cr, if one is promised to have a point within r r cr 9 / 30

The Approximate Near Neighbor Problem (ANN) Let P be an n-point subset of R d, r > 0, c > 1 For q R d find any point from P within cr, if one is promised to have a point within r There is a (non-trivial!) reduction from Nearest to Near r cr 9 / 30

Outline Approximate NNS Locality-Sensitive Hashing (LSH) and Beyond Known LSH Families New Data Structure 10 / 30

Locality-Sensitive Hashing (LSH) The goal: ANN for R d with space and query time polynomial in d; space subquadratic in n and query time sublinear in n. 11 / 30

Locality-Sensitive Hashing (LSH) The goal: ANN for R d with space and query time polynomial in d; space subquadratic in n and query time sublinear in n. The only known technique (until recently): Locality-Sensitive Hashing (LSH) (Indyk, Motwani 1998) 11 / 30

Locality-Sensitive Hashing (LSH) The goal: ANN for R d with space and query time polynomial in d; space subquadratic in n and query time sublinear in n. The only known technique (until recently): Locality-Sensitive Hashing (LSH) (Indyk, Motwani 1998) Hash functions space partitions 11 / 30

Locality-Sensitive Hashing (LSH) The goal: ANN for R d with space and query time polynomial in d; space subquadratic in n and query time sublinear in n. The only known technique (until recently): Locality-Sensitive Hashing (LSH) (Indyk, Motwani 1998) Hash functions space partitions A distribution over partitions P of R d is (r, cr, p 1, p 2 )-sensitive, if for every p, q R d : if p q r, then Pr P [P(p) = P(q)] p 1 ; if p q cr, then Pr P [P(p) = P(q)] p 2 Want: large gap between p 1 and p 2 11 / 30

From LSH to ANN Let P be a reasonable (r, cr, p 1, p 2 )-sensitive (random) partition 12 / 30

From LSH to ANN Let P be a reasonable (r, cr, p 1, p 2 )-sensitive (random) partition Define quality of P as ρ = ln(1/p 1) ln(1/p 2 ) 12 / 30

From LSH to ANN Let P be a reasonable (r, cr, p 1, p 2 )-sensitive (random) partition Define quality of P as ρ = ln(1/p 1) ln(1/p 2 ) Then, can solve ANN with roughly O(n 1+ρ + dn) space and O(dn ρ ) query time (Indyk, Motwani 1998) 12 / 30

Proof idea Let H be a (r, cr, p 1, p 2 )-sensitive family: for every p, q X if p q r, then Pr h H [h(p) = h(q)] p 1 ; if p q cr, then Pr h H [h(p) = h(q)] p 2 13 / 30

Proof idea Let H be a (r, cr, p 1, p 2 )-sensitive family: for every p, q X if p q r, then Pr h H [h(p) = h(q)] p 1 ; if p q cr, then Pr h H [h(p) = h(q)] p 2 Hash the dataset P using a concatenation of k functions from H: x (h 1 (x), h 2 (x),..., h k (x)) 13 / 30

Proof idea Let H be a (r, cr, p 1, p 2 )-sensitive family: for every p, q X if p q r, then Pr h H [h(p) = h(q)] p 1 ; if p q cr, then Pr h H [h(p) = h(q)] p 2 Hash the dataset P using a concatenation of k functions from H: x (h 1 (x), h 2 (x),..., h k (x)) Locate a query q and enumerate all points from the bucket 13 / 30

Proof idea Let H be a (r, cr, p 1, p 2 )-sensitive family: for every p, q X if p q r, then Pr h H [h(p) = h(q)] p 1 ; if p q cr, then Pr h H [h(p) = h(q)] p 2 Hash the dataset P using a concatenation of k functions from H: x (h 1 (x), h 2 (x),..., h k (x)) Locate a query q and enumerate all points from the bucket The optimal choice of k leads to the need in n ρ hash tables Overall: n 1+ρ space, n ρ query time 13 / 30

Prior work on LSH (Indyk, Motwani 1998), (Andoni, Indyk 2006), (Motwani, Naor, Panigrahy 2007), (O Donnell, Wu, Zhou 2011) Bounds on ρ = ln(1/p 1 )/ ln(1/p 2 ) for various spaces: Distance Upper bound Lower bound c = 2 Hamming ρ 1/c ρ 1/c o(1) 1/2 Euclidean ρ 1/c 2 + o(1) ρ 1/c 2 o(1) 1/4 Can we do better for ANN? 14 / 30

Prior work on LSH (Indyk, Motwani 1998), (Andoni, Indyk 2006), (Motwani, Naor, Panigrahy 2007), (O Donnell, Wu, Zhou 2011) Bounds on ρ = ln(1/p 1 )/ ln(1/p 2 ) for various spaces: Distance Upper bound Lower bound c = 2 Hamming ρ 1/c ρ 1/c o(1) 1/2 Euclidean ρ 1/c 2 + o(1) ρ 1/c 2 o(1) 1/4 Can we do better for ANN? (Andoni, Indyk, Nguy ên, Razenshteyn 2014), (Andoni, Razenshteyn 2015): yes! ANN for l 2 with space O(n 1+τ + dn) and time O(dn τ ) with τ 1/(2c 2 1) + o(1) (1/7 for c = 2). The first data structures that overcome the lower bound for LSH. 14 / 30

How to do better than LSH? The main idea: data-dependent space partitioning 15 / 30

How to do better than LSH? The main idea: data-dependent space partitioning P is (r, cr, p 1, p 2 )-sensitive, if for every p, q R d if p q r, then Pr P [P(p) = P(q)] p 1 ; if p q cr, then Pr P [P(p) = P(q)] p 2 15 / 30

How to do better than LSH? The main idea: data-dependent space partitioning P is (r, cr, p 1, p 2 )-sensitive, if for every p, q R d if p q r, then Pr P [P(p) = P(q)] p 1 ; if p q cr, then Pr P [P(p) = P(q)] p 2 Too strong! Enough to satisfy these for p P and q R d. Can exploit the geometry of P to construct a partition 15 / 30

How to do better than LSH? The main idea: data-dependent space partitioning P is (r, cr, p 1, p 2 )-sensitive, if for every p, q R d if p q r, then Pr P [P(p) = P(q)] p 1 ; if p q cr, then Pr P [P(p) = P(q)] p 2 Too strong! Enough to satisfy these for p P and q R d. Can exploit the geometry of P to construct a partition Our bounds are optimal for data-dependent hashing 15 / 30

How to do better than LSH? The main idea: data-dependent space partitioning P is (r, cr, p 1, p 2 )-sensitive, if for every p, q R d if p q r, then Pr P [P(p) = P(q)] p 1 ; if p q cr, then Pr P [P(p) = P(q)] p 2 Too strong! Enough to satisfy these for p P and q R d. Can exploit the geometry of P to construct a partition Our bounds are optimal for data-dependent hashing Parallels with practice! PCA trees (Sproull 1991), (McNames 2001), (Verma, Kpotufe, Dasgupta 2009) Spectral Hashing (Weiss, Torralba, Fergus 2008) Semantic Hashing (Salakhutdinov, Hinton 2009) WTA Hashing (Yagnik, Strelow, Ross, Lin 2011) 15 / 30

Outline Approximate NNS Locality-Sensitive Hashing (LSH) and Beyond Known LSH Families New Data Structure 16 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} Partition {0, 1} d = {x i = 0} {x i = 1} 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} Partition {0, 1} d = {x i = 0} {x i = 1} P is (r, cr, p 1, p 2 )-sensitive, where p 1 = 1 r/d and p 2 = 1 cr/d 11101110 10111101 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} Partition {0, 1} d = {x i = 0} {x i = 1} P is (r, cr, p 1, p 2 )-sensitive, where p 1 = 1 r/d and p 2 = 1 cr/d As a result, ρ = ln(1/p 1) ln(1/p 2 ) r/d cr/d = 1 c 11101110 10111101 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} Partition {0, 1} d = {x i = 0} {x i = 1} P is (r, cr, p 1, p 2 )-sensitive, where p 1 = 1 r/d and p 2 = 1 cr/d As a result, ρ = ln(1/p 1) ln(1/p 2 ) r/d cr/d = 1 c Overall, ANN with space O(n 1+1/c + dn) and query time O(dn 1/c ) 11101110 10111101 17 / 30

Hamming distance Hypercube {0, 1} d with Hamming distance x y = d i=1 x i y i Choose random i {1, 2,..., d} Partition {0, 1} d = {x i = 0} {x i = 1} P is (r, cr, p 1, p 2 )-sensitive, where p 1 = 1 r/d and p 2 = 1 cr/d As a result, ρ = ln(1/p 1) ln(1/p 2 ) r/d cr/d = 1 c Overall, ANN with space O(n 1+1/c + dn) and query time O(dn 1/c ) Can be generalized to the whole l 1 11101110 10111101 17 / 30

Full algorithm for the Hamming distance 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k S 1 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k S 1 S 2 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k S 1 S 2... S L 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k On a query q retrieve all the data points p such that there exists S i s.t. q Si = p Si S 1 S 2... S L 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k On a query q retrieve all the data points p such that L n 1/c, k log n there exists S i s.t. q Si = p Si S 1 S 2... S L 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k On a query q retrieve all the data points p such that there exists S i s.t. q Si = p Si L n 1/c, k log n The new data structure is the first improvement upon bit sampling S 1 S 2... S L 18 / 30

Full algorithm for the Hamming distance Sample L subsets of coordinates S 1, S 2,..., S L, each of size k On a query q retrieve all the data points p such that there exists S i s.t. q Si = p Si L n 1/c, k log n The new data structure is the first improvement upon bit sampling In particular, for c = 2 improve from query time O(n 1/2 ) to O(n 1/3 ) S 1 S 2... S L 18 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line Partition the line starting from a random place in segments of length w 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line Partition the line starting from a random place in segments of length w Hash value is the segment that the projection falls into 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line Partition the line starting from a random place in segments of length w Hash value is the segment that the projection falls into Optimizing w = w(r, c), we get ρ < 1/c 19 / 30

Euclidean distance LSH for l 2 (with suboptimal ρ: 1/c instead of 1/c 2 ) from (Datar, Immorlica, Indyk, Mirrokni 2004) Project R d on a random line Partition the line starting from a random place in segments of length w Hash value is the segment that the projection falls into Optimizing w = w(r, c), we get ρ < 1/c Simple and practical: E 2 LSH is based on this hashing scheme (Andoni, Indyk 2005) 19 / 30

Optimal LSH for Euclidean distance For l 2 there is LSH with ρ 1/c 2 + o(1) (Andoni, Indyk 2006) 20 / 30

Optimal LSH for Euclidean distance For l 2 there is LSH with ρ 1/c 2 + o(1) (Andoni, Indyk 2006) Unfortunately, fairly complicated 20 / 30

Optimal LSH for Euclidean distance For l 2 there is LSH with ρ 1/c 2 + o(1) (Andoni, Indyk 2006) Unfortunately, fairly complicated Later, will see a simple family for a special case 20 / 30

Outline Approximate NNS Locality-Sensitive Hashing (LSH) and Beyond Known LSH Families New Data Structure 21 / 30

The plan A data structure based on decision trees 22 / 30

The plan A data structure based on decision trees First, show how to solve nice instances 22 / 30

The plan A data structure based on decision trees First, show how to solve nice instances Data points lie on a sphere of radius cr/ 2 r cr 2 cr 2 cr 22 / 30

The plan A data structure based on decision trees First, show how to solve nice instances Data points lie on a sphere of radius cr/ 2 Intuition: this corresponds to a random instance r cr 2 cr 2 cr 22 / 30

The plan A data structure based on decision trees First, show how to solve nice instances Data points lie on a sphere of radius cr/ 2 Intuition: this corresponds to a random instance Show how to reduce the general case to the nice case Iterative clustering r cr 2 cr 2 cr 22 / 30

The plan A data structure based on decision trees First, show how to solve nice instances Data points lie on a sphere of radius cr/ 2 Intuition: this corresponds to a random instance Show how to reduce the general case to the nice case Iterative clustering Data-dependent r cr 2 cr 2 cr 22 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) 23 / 30

The low-diameter case All points and queries lie on a sphere of radius cr/ 2 Ball carving (Karger, Motwani, Sudan 1998), (Andoni, Indyk, Nguyen, Razenshteyn 2014) Achieves ρ = 1 2c 2 1 23 / 30

A glimpse of the analysis Sample g 1, g 2,... N(0, 1) d ; the hash value h(p) = min i : p, g i η d 24 / 30

A glimpse of the analysis Sample g 1, g 2,... N(0, 1) d ; the hash value h(p) = min i : p, g i η d Set η = d 1/4 : exp( d) caps in total 24 / 30

A glimpse of the analysis Sample g 1, g 2,... N(0, 1) d ; the hash value h(p) = min i : p, g i η d Set η = d 1/4 : exp( d) caps in total One has Pr[p and q collide] = Pr g[ p, g η d and q, g η d] Pr g [ p, g η d or q, g η d] 24 / 30

A glimpse of the analysis Sample g 1, g 2,... N(0, 1) d ; the hash value h(p) = min i : p, g i η d Set η = d 1/4 : exp( d) caps in total One has Pr[p and q collide] = Pr g[ p, g η d and q, g η d] Pr g [ p, g η d or q, g η d] X = d 1/4 X cos α Y sin α = d 1/4 y y 0 α x 0(y) 24 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 25 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 Find dense clusters of radius ( 2 ε)r with centers on the sphere ( 2 ε)r ( 2 ε)r ( 2 ε)r 25 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 Find dense clusters of radius ( 2 ε)r with centers on the sphere Treat them separately (see the next slide) 25 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 Find dense clusters of radius ( 2 ε)r with centers on the sphere Treat them separately (see the next slide) For the remaining points, apply one iteration of ball carving 25 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 Find dense clusters of radius ( 2 ε)r with centers on the sphere Treat them separately (see the next slide) For the remaining points, apply one iteration of ball carving Recurse on the parts (clusters may show up again) 25 / 30

The general case Everything lies on a sphere of radius R > cr/ 2 Find dense clusters of radius ( 2 ε)r with centers on the sphere Treat them separately (see the next slide) For the remaining points, apply one iteration of ball carving Recurse on the parts (clusters may show up again) To answer a query q, query all the dense clusters, navigate q in the partition, and query the corresponding part (only one!) 25 / 30

Treating dense clusters A cluster of radius ( 2 ε)r with center on the sphere can be covered with a ball of radius (1 Ω(ε 2 ))R (in spirit of the Ellipsoid Method) ( 2 ε)r 26 / 30

Treating dense clusters A cluster of radius ( 2 ε)r with center on the sphere can be covered with a ball of radius (1 Ω(ε 2 ))R (in spirit of the Ellipsoid Method) ( 2 ε)r (1 Ω(ε 2 ))R 26 / 30

Treating dense clusters A cluster of radius ( 2 ε)r with center on the sphere can be covered with a ball of radius (1 Ω(ε 2 ))R (in spirit of the Ellipsoid Method) Improve the geometry by reducing the radius ( 2 ε)r (1 Ω(ε 2 ))R 26 / 30

How do we make progress? For the clusters we decrease the radius non-trivially (by 1 Ω(ε 2 )) 27 / 30

How do we make progress? For the clusters we decrease the radius non-trivially (by 1 Ω(ε 2 )) For the remaining points, the ball carving is great: almost all the points are at distance ( 2 ε)r 27 / 30

Ball carving for triples The analysis crucially relies on upper bounding Pr P [P(u) = P(w) P(u) = P(v)] w u v 28 / 30

Ball carving for triples The analysis crucially relies on upper bounding Pr P [P(u) = P(w) P(u) = P(v)] The naïve bound Pr P [P(u)=P(w)] Pr P [P(u)=P(v)] merely gives the exponent 1/(2c 2 2) + o(1) w u v 28 / 30

Ball carving for triples The analysis crucially relies on upper bounding Pr P [P(u) = P(w) P(u) = P(v)] The naïve bound Pr P [P(u)=P(w)] Pr P [P(u)=P(v)] merely gives the exponent 1/(2c 2 2) + o(1) Unfortunately, tight for some cases w u v 28 / 30

Ball carving for triples The analysis crucially relies on upper bounding Pr P [P(u) = P(w) P(u) = P(v)] The naïve bound Pr P [P(u)=P(w)] Pr P [P(u)=P(v)] merely gives the exponent 1/(2c 2 2) + o(1) Unfortunately, tight for some cases Fortunately, for fixed u, v and almost all w can improve to essentially Pr P [P(u) = P(w)] w u v 28 / 30

Ball carving for triples The analysis crucially relies on upper bounding Pr P [P(u) = P(w) P(u) = P(v)] The naïve bound Pr P [P(u)=P(w)] Pr P [P(u)=P(v)] merely gives the exponent 1/(2c 2 2) + o(1) Unfortunately, tight for some cases Fortunately, for fixed u, v and almost all w can improve to essentially Pr P [P(u) = P(w)] No dense clusters can control the number of bad w s w u v 28 / 30

Fast preprocessing Time n 2+o(1) per decision tree: find only clusters with centers in data points (non-trivial analysis) 29 / 30

Fast preprocessing Time n 2+o(1) per decision tree: find only clusters with centers in data points (non-trivial analysis) To achieve n 1+o(1) : observe that we only care about clusters with n 1 o(1) points, thus, can subsample 29 / 30

Open problems Make the data structure dynamic 30 / 30

Open problems Make the data structure dynamic For l the best data structure (Indyk 1998) is also based on decision trees; unify? 30 / 30

Open problems Make the data structure dynamic For l the best data structure (Indyk 1998) is also based on decision trees; unify? Questions? 30 / 30