Locality Sensitive Hashing
|
|
- Annabella Sherman
- 6 years ago
- Views:
Transcription
1 Locality Sensitive Hashing February 1, LSH in Hamming space The following discussion focuses on the notion of Locality Sensitive Hashing which was first introduced in [5]. We focus in the case of Hamming metric but LSH can be seen as a general framework which applies to several metrics e.g. the l metric. Definition 1 (Hamming distance). Given two strings x, y {0, 1} d, the Hamming distance d H (x, y) is the number of positions at which x and y differ. For example, let x = and y = Then, d H (x, y) =. We focus on the problem of Approximate Nearest Neighbor search in subsets of ({0, 1} d, d H ) when the dimension is high (assume d log n). Instead of solving directly the Approximate Nearest Neighbor problem, we solve the Approximate Near Neighbor problem which is defined as follows. Definition (Approximate Near Neighbor problem). Let P {0, 1} d. Given r > 0 and ɛ > 0 build a data structure s.t. for any query q {0, 1} d do the following: if p P s.t. d H (p, q) r then report point p P s.t. d H (p, q) (1 + ɛ) r, if p P d H (p, q) > (1 + ɛ) r then report no. The data structure that we present here is randomized and there is a probability of failure. More preciselly, the following will be proven. Theorem 3. Let P {0, 1} d. Given r > 0 and ɛ > 0 the LSH data structure satisfies the following: fix any query q {0, 1} d, if p P s.t. d H (p, q) r then if the preprocessing succeeds for q then the data structure reports point p P s.t. d H (p, q) (1 + ɛ) r, if p P d H (p, q) > (1 + ɛ) r then report no. The preprocessing succeeds for q with constant probability. The space required is O(dn + n ln p 1 1+ɛ log n), the preprocessing time is O(dn ln p log n) and the query time O(dn 1 1+ɛ log n) The method is based on the idea of using hash functions which have the nice property that they probably map similar strings (or generally points) to the same buckets. Definition 4. Let r 1 < r and p 1 > p. We call a family H of hash functions (r 1, r, p 1, p )- sensitive if for any x, y {0, 1} d, d H (x, y) r 1 = P r[h(x) = h(y)] p 1, 1
2 d H (x, y) r = P r[h(x) = h(y)] p. In the Hamming metric case we define the following family of functions. Definition 5 (Family of hash functions). Let H = {h i (x) = x i x = (x 1,, x d ), i {1,, d}}. Obviously, H = d. Pick uniformly at random h H. Then P r[h(x) = h(y)] = 1 d H(x,y) d. Corollary 6. The family H is (r, cr, 1 r d, 1 cr d )-sensitive, where r > 0, c > 1. However probabilities 1 r d, 1 cr d can be close to each other. Definition 7. Given parameter k, define new family G(H): G(H) = {g : {0, 1} d {0, 1} k g(x) = h 1 (x),, h k (x)}. In other words, a function g chosen uniformly at random from G(H) projects point p {0, 1} d into k randomly and independently chosen coordinates. Obviously, G(F ) = d k. Now, we choose uniformly at random L functions g 1,, g L G(F ). Preprocessing algorithm. for i from 1 to L do Pick uniformly at random g i G(F ). For each p P, assign p in bucket g i (p) (in hash table T i ). The preprocessing time is O(L n d k). The space usage: L hash tables and n pointers to strings per table = O(L n). In order to store the n points we need O(d n) space. Query algorithm. for i from 1 to L do for each string p in bucket g i (q) do if number of retrieved strings > 3L then return no end if if d H (q, p) < cr then return p end if The query time is O(L(K + d)). Let p any r-near neighbor of q. The execution of our algorithm is successful if both events happen: A: i {1,, L} s.t. g i (p ) = g i (q). B: Less than 3L useless strings lie in g i (q), i {1,, L}. Let p 1 = 1 r d, p = 1 cr d. Given j, P r[g j(p ) = g j (q)] p k 1. Setting k = log 1/p n yields P r[g j (p ) = g j (q)] n ln 1/p 1 ln 1/p.
3 Hence, P r[a] (1 n ln 1/p 1 ln 1/p ) L. Setting L = n ln 1/p 1 ln 1/p P r[a] (1 1 L )L 1 e. we obtain: Let p P s.t. d H (p, q) c r. Given j, P r[g j (p ) = g j (q)] p k = 1 n. The expected number of strings p P s.t. d H (p, q) cr and also lie in the same bucket with q, is L. Hence 1, P r[b] 1 3. After setting the parameters we conclude: Query time: O(dn ln 1/p 1 ln 1/p log n), Space: O(dn + n 1+ ln 1/p 1 ln 1/p log n), Preprocessing time O(dn 1+ ln 1/p 1 ln 1/p log n). We finally notice that ln 1/p 1 c=1+ɛ = ln 1/p and we omit the technical details. ln(1 r/d) ln(1 (1 + ɛ)r/d) ɛ High probability. The probability can be amplified by repetition. We can achieve 1 n c for any constant c > 0 by building O(log n) data structures as in Theorem 3. Solving the Approximate Nearest Neighbor problem. The idea is to do binary search over the range of distances 1,..., d. Better complexity bounds can be obtained by binary search over the distances 1, (1 + ɛ), (1 + ɛ),..., d. However, in other metrics it is not obvious that someone can solve the Approximate Nearest Neighbor problem with Approximate Near Neighbor data structures. A solution to this problem is obtained in [4] and can be stated as follows. Theorem 8. Let P be a given set of n points in a metric space, and let c = 1 + ɛ > 1, f (0, 1), and γ (1/n, 1) be prescribed parameters. Assume that we are given a data structure for the (c, r)-approximate near neighbor that uses space S(n, c, f), has query time Q(n, c, f), and has failure probability f. Then there exists a data structure for answering c(1 + O(γ))-NN queries in time O(log n)q(n, c, f) with failure probability O(f log n). The resulting data structure uses O(S(n, c, f)/γ log n) space. LSH in l In the previous section we have seen an LSH family for the Hamming metric. It is known that the data structure obtained there can be used in order to solve the problem in l. This is obtained by a non-trivial reduction which translates the ANN problem in l to the ANN problem in the Hamming space [4]. The first LSH function directly applicable to the l metric can be described as follows. Definition 9 (LSH family for l ). Let p R d and v N(0, 1) d. Let also w a parameter (to be defined later) and t [0, w] chosen uniformly at random. Then, h(p) = p, v + t. w 1 Recall Markov s inquality: P r(x α) E[X], where α > 0. α Meaning the d-dimensional standard normal distribution. 3
4 Now let p, q R d. We have Pr[h(p) = h(q)] = Pr[ p, v q, v = x] (1 x w ) dx Now we have seen 3 that p, v q, v = p q, v N(0, p q ). Hence, Pr[h(p) = h(q) w] = exp( π p q p q ) (1 x w ) dx Notice that in l we can assume wlog that r = 1. Then for approximation ratio 1 + ɛ we need to make these two probabilities as distinct as possible: Pr[h(p) = h(q) p q = 1, w] = x π exp( x ) (1 x w ) dx Pr[h(p) = h(q) p q = 1 + ɛ, w] = exp( x π(1 + ɛ) (1 + ɛ) ) (1 x w ) dx. By the previous discussion on the LSH framework we can see that we need to focus on minimizing the term ρ w = Indeed in [3], they prove the following. Lemma 10. There exists w such that ρ w 1 1+ɛ. log(1/ Pr[h(p) = h(q) p q = 1, w]) log(1/ Pr[h(p) = h(q) p q = 1 + ɛ, w]). The above has been verified by numerical computations. Some intuition behind the LSH family. We will now try to give a more intuitive description of the LSH family defined above. First we randomly project the points 4 and then apply a randomly shifted grid 5 with cell-sidewidth w. Now the g : R d N k functions which are implied by the above discussion 6 is just the id of the corresponding cell in the randomly shifted grid. Better results. In [1], they achieve better exponent (roughly 1/(1 + ɛ) ) which is known to be nearly optimal. In [] they achieve even better exponent by designing an algorithmic sceme which depends on the dataset and it is no more oblivious to the points, namely data-dependent LSH. References [1] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117 1, jl.pdf 4 Target dimension log n/ɛ = appoximately preserve distances 5 Random shift implies positive probability of including two close points in the same cell. 6 Recall that in the LSH scheme we concatenate functions of the first family H. 4
5 [] A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. of the 47th Annual ACM on Symposium on Theory of Computing, STOC 15, pages , New York, NY, USA, 015. ACM. [3] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 04, pages 53 6, New York, NY, USA, 004. ACM. [4] Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 8(14):31 350, 01. [5] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symp. Theory of Computing, STOC 98, pages ,
Lecture 14 October 16, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 14 October 16, 2014 Scribe: Jao-ke Chin-Lee 1 Overview In the last lecture we covered learning topic models with guest lecturer Rong Ge.
More informationLecture 17 03/21, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Piotr Indyk Lecture 17 03/21, 2017 Scribe: Artidoro Pagnoni, Jao-ke Chin-Lee 1 Overview In the last lecture we saw semidefinite programming, Goemans-Williamson
More informationOptimal Data-Dependent Hashing for Approximate Near Neighbors
Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30 Nearest Neighbor Search (NNS) Let P be an n-point
More informationApproximate Nearest Neighbor (ANN) Search in High Dimensions
Chapter 17 Approximate Nearest Neighbor (ANN) Search in High Dimensions By Sariel Har-Peled, February 4, 2010 1 17.1 ANN on the Hypercube 17.1.1 Hypercube and Hamming distance Definition 17.1.1 The set
More informationLecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.
COMS 4995-3: Advanced Algorithms Feb 15, 2017 Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. Instructor: Alex Andoni Scribes: Weston Jackson, Edo Roth 1 Introduction Today s lecture is
More informationOptimal Lower Bounds for Locality Sensitive Hashing (except when q is tiny)
Innovations in Computer Science 20 Optimal Lower Bounds for Locality Sensitive Hashing (except when q is tiny Ryan O Donnell Yi Wu 3 Yuan Zhou 2 Computer Science Department, Carnegie Mellon University,
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationGeometry of Similarity Search
Geometry of Similarity Search Alex Andoni (Columbia University) Find pairs of similar images how should we measure similarity? Naïvely: about n 2 comparisons Can we do better? 2 Measuring similarity 000000
More informationSimilarity Search in High Dimensions II. Piotr Indyk MIT
Similarity Search in High Dimensions II Piotr Indyk MIT Approximate Near(est) Neighbor c-approximate Nearest Neighbor: build data structure which, for any query q returns p P, p-q cr, where r is the distance
More information16 Embeddings of the Euclidean metric
16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.
More informationProximity problems in high dimensions
Proximity problems in high dimensions Ioannis Psarros National & Kapodistrian University of Athens March 31, 2017 Ioannis Psarros Proximity problems in high dimensions March 31, 2017 1 / 43 Problem definition
More information4 Locality-sensitive hashing using stable distributions
4 Locality-sensitive hashing using stable distributions 4. The LSH scheme based on s-stable distributions In this chapter, we introduce and analyze a novel locality-sensitive hashing family. The family
More informationLecture 8 January 30, 2014
MTH 995-3: Intro to CS and Big Data Spring 14 Inst. Mark Ien Lecture 8 January 3, 14 Scribe: Kishavan Bhola 1 Overvie In this lecture, e begin a probablistic method for approximating the Nearest Neighbor
More informationBeyond Locality-Sensitive Hashing
Beyond Locality-Sensitive Hashing Alexandr Andoni Microsoft Research andoni@microsoft.com Piotr Indyk MIT indyk@mit.edu Huy L. Nguy ên Princeton hlnguyen@princeton.edu Ilya Razenshteyn MIT ilyaraz@mit.edu
More informationA Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match
A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match Rina Panigrahy Microsoft Research Silicon Valley rina@microsoft.com Udi Wieder Microsoft Research Silicon Valley
More informationBloom Filters and Locality-Sensitive Hashing
Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,
More informationA Comparison of Extended Fingerprint Hashing and Locality Sensitive Hashing for Binary Audio Fingerprints
A Comparison of Extended Fingerprint Hashing and Locality Sensitive Hashing for Binary Audio Fingerprints ABSTRACT Kimberly Moravec Department of Computer Science University College London Malet Place,
More informationLSH Forest: Practical Algorithms Made Theoretical
LSH Forest: Practical Algorithms Made Theoretical Alexandr Andoni Columbia University Ilya Razenshteyn MIT CSAIL February 6, 07 Negev Shekel Nosatzki Columbia University Abstract We analyze LSH Forest
More informationLOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA
LOCALITY PRESERVING HASHING Yi-Hsuan Tsai Ming-Hsuan Yang Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA ABSTRACT The spectral hashing algorithm relaxes
More informationNearest Neighbor Preserving Embeddings
Nearest Neighbor Preserving Embeddings Piotr Indyk MIT Assaf Naor Microsoft Research Abstract In this paper we introduce the notion of nearest neighbor preserving embeddings. These are randomized embeddings
More informationSet Similarity Search Beyond MinHash
Set Similarity Search Beyond MinHash ABSTRACT Tobias Christiani IT University of Copenhagen Copenhagen, Denmark tobc@itu.dk We consider the problem of approximate set similarity search under Braun-Blanquet
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationFaster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss
More informationOn Approximating the Depth and Related Problems
On Approximating the Depth and Related Problems Boris Aronov Polytechnic University, Brooklyn, NY Sariel Har-Peled UIUC, Urbana, IL 1: Motivation: Operation Inflicting Freedom Input: R - set red points
More informationOn Symmetric and Asymmetric LSHs for Inner Product Search
Behnam Neyshabur Nathan Srebro Toyota Technological Institute at Chicago, Chicago, IL 6637, USA BNEYSHABUR@TTIC.EDU NATI@TTIC.EDU Abstract We consider the problem of designing locality sensitive hashes
More informationAlgorithms for Data Science: Lecture on Finding Similar Items
Algorithms for Data Science: Lecture on Finding Similar Items Barna Saha 1 Finding Similar Items Finding similar items is a fundamental data mining task. We may want to find whether two documents are similar
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline
More informationDimension Reduction in Kernel Spaces from Locality-Sensitive Hashing
Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Alexandr Andoni Piotr Indy April 11, 2009 Abstract We provide novel methods for efficient dimensionality reduction in ernel spaces.
More informationDistribution-specific analysis of nearest neighbor search and classification
Distribution-specific analysis of nearest neighbor search and classification Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to information retrieval and classification.
More informationRandom Feature Maps for Dot Product Kernels Supplementary Material
Random Feature Maps for Dot Product Kernels Supplementary Material Purushottam Kar and Harish Karnick Indian Institute of Technology Kanpur, INDIA {purushot,hk}@cse.iitk.ac.in Abstract This document contains
More informationSuper-Bit Locality-Sensitive Hashing
Super-Bit Locality-Sensitive Hashing Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationTail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002
Tail Inequalities 497 - Randomized Algorithms Sariel Har-Peled December 0, 00 Wir mssen wissen, wir werden wissen (We must know, we shall know) David Hilbert 1 Tail Inequalities 1.1 The Chernoff Bound
More informationLocality-sensitive Hashing without False Negatives
Locality-sensitive Hashing without False Negatives Rasmus Pagh IT University of Copenhagen, Denmark Abstract We consider a new construction of locality-sensitive hash functions for Hamming space that is
More informationA New Algorithm for Finding Closest Pair of Vectors
A New Algorithm for Finding Closest Pair of Vectors Ning Xie Shuai Xu Yekun Xu Abstract Given n vectors x 0, x 1,..., x n 1 in {0, 1} m, how to find two vectors whose pairwise Hamming distance is minimum?
More informationOptimal Data-Dependent Hashing for Approximate Near Neighbors
Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni Ilya Razenshteyn January 7, 015 Abstract We show an optimal data-dependent hashing scheme for the approximate near neighbor
More informationParameter-free Locality Sensitive Hashing for Spherical Range Reporting
Parameter-free Locality Sensitive Hashing for Spherical Range Reporting Thomas D. Ahle, Martin Aumüller, and Rasmus Pagh IT University of Copenhagen, Denmark, {thdy, maau, pagh}@itu.dk December, 206 Abstract
More informationProximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search
Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled Sepideh Mahabadi November 24, 2015 Abstract We introduce a new variant of the nearest neighbor search problem,
More informationAn Algorithmist s Toolkit Nov. 10, Lecture 17
8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from
More informationIntroduction Long transparent proofs The real PCP theorem. Real Number PCPs. Klaus Meer. Brandenburg University of Technology, Cottbus, Germany
Santaló s Summer School, Part 3, July, 2012 joint work with Martijn Baartse (work supported by DFG, GZ:ME 1424/7-1) Outline 1 Introduction 2 Long transparent proofs for NP R 3 The real PCP theorem First
More informationCoupling. 2/3/2010 and 2/5/2010
Coupling 2/3/2010 and 2/5/2010 1 Introduction Consider the move to middle shuffle where a card from the top is placed uniformly at random at a position in the deck. It is easy to see that this Markov Chain
More informationLinear Spectral Hashing
Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to
More informationDistance-Sensitive Bloom Filters
Distance-Sensitive Bloom Filters Adam Kirsch Michael Mitzenmacher Abstract A Bloom filter is a space-efficient data structure that answers set membership queries with some chance of a false positive. We
More informationCell-Probe Proofs and Nondeterministic Cell-Probe Complexity
Cell-obe oofs and Nondeterministic Cell-obe Complexity Yitong Yin Department of Computer Science, Yale University yitong.yin@yale.edu. Abstract. We study the nondeterministic cell-probe complexity of static
More informationarxiv: v1 [cs.db] 2 Sep 2014
An LSH Index for Computing Kendall s Tau over Top-k Lists Koninika Pal Saarland University Saarbrücken, Germany kpal@mmci.uni-saarland.de Sebastian Michel Saarland University Saarbrücken, Germany smichel@mmci.uni-saarland.de
More informationApproximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points
Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Timothy M. Chan Zahed Rahmati Abstract Given a set of n moving points in R d, where each point moves
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationLattice-based Locality Sensitive Hashing is Optimal
Lattice-based Locality Sensitive Hashing is Optimal Kartheeyan Chandrasearan 1, Daniel Dadush 2, Venata Gandiota 3, and Elena Grigorescu 4 1 University of Illinois, Urbana-Champaign, USA arthe@illinois.edu
More informationOptimal Las Vegas Locality Sensitive Data Structures
Optimal Las Vegas Locality Sensitive Data Structures Full Version Thomas Dybdahl Ahle IT University of Copenhagen June 27 2018 Abstract We show that approximate similarity (near neighbour) search can be
More informationImproved Consistent Sampling, Weighted Minhash and L1 Sketching
Improved Consistent Sampling, Weighted Minhash and L1 Sketching Sergey Ioffe Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043, sioffe@google.com Abstract We propose a new Consistent Weighted
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationReporting Neighbors in High-Dimensional Euclidean Space
Reporting Neighbors in High-Dimensional Euclidean Space Dror Aiger Haim Kaplan Micha Sharir Abstract We consider the following problem, which arises in many database and web-based applications: Given a
More informationRandomness and Computation March 13, Lecture 3
0368.4163 Randomness and Computation March 13, 2009 Lecture 3 Lecturer: Ronitt Rubinfeld Scribe: Roza Pogalnikova and Yaron Orenstein Announcements Homework 1 is released, due 25/03. Lecture Plan 1. Do
More informationChapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.
Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More informationarxiv: v3 [cs.ds] 7 Jan 2016
0 CoveringLSH: Locality-sensitive Hashing without False Negatives RASMUS PAGH, IT University of Copenhagen arxiv:1507.03225v3 [cs.ds] 7 Jan 2016 We consider a new construction of locality-sensitive hash
More informationOn the Optimality of the Dimensionality Reduction Method
On the Optimality of the Dimensionality Reduction Method Alexandr Andoni MIT andoni@mit.edu Piotr Indyk MIT indyk@mit.edu Mihai Pǎtraşcu MIT mip@mit.edu Abstract We investigate the optimality of (1+)-approximation
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, and Sofya Vorotnikova 1 1 University of Massachusetts Amherst {mcgregor,svorotni}@cs.umass.edu 2 IBM Almaden Research Center ecprice@mit.edu
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationA list-decodable code with local encoding and decoding
A list-decodable code with local encoding and decoding Marius Zimand Towson University Department of Computer and Information Sciences Baltimore, MD http://triton.towson.edu/ mzimand Abstract For arbitrary
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationHigher Cell Probe Lower Bounds for Evaluating Polynomials
Higher Cell Probe Lower Bounds for Evaluating Polynomials Kasper Green Larsen MADALGO, Department of Computer Science Aarhus University Aarhus, Denmark Email: larsen@cs.au.dk Abstract In this paper, we
More informationApproximate Voronoi Diagrams
CS468, Mon. Oct. 30 th, 2006 Approximate Voronoi Diagrams Presentation by Maks Ovsjanikov S. Har-Peled s notes, Chapters 6 and 7 1-1 Outline Preliminaries Problem Statement ANN using PLEB } Bounds and
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationThe Tensor Product of Two Codes is Not Necessarily Robustly Testable
The Tensor Product of Two Codes is Not Necessarily Robustly Testable Paul Valiant Massachusetts Institute of Technology pvaliant@mit.edu Abstract. There has been significant interest lately in the task
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More information1 Distributional problems
CSCI 5170: Computational Complexity Lecture 6 The Chinese University of Hong Kong, Spring 2016 23 February 2016 The theory of NP-completeness has been applied to explain why brute-force search is essentially
More informationComputer Science Dept.
A NOTE ON COMPUTATIONAL INDISTINGUISHABILITY 1 Oded Goldreich Computer Science Dept. Technion, Haifa, Israel ABSTRACT We show that following two conditions are equivalent: 1) The existence of pseudorandom
More informationA Las Vegas approximation algorithm for metric 1-median selection
A Las Vegas approximation algorithm for metric -median selection arxiv:70.0306v [cs.ds] 5 Feb 07 Ching-Lueh Chang February 8, 07 Abstract Given an n-point metric space, consider the problem of finding
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationAnalysis of Algorithms I: Perfect Hashing
Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the
More informationRobust local testability of tensor products of LDPC codes
Robust local testability of tensor products of LDPC codes Irit Dinur 1, Madhu Sudan, and Avi Wigderson 3 1 Hebrew University, Jerusalem, Israel. dinuri@cs.huji.ac.il Massachusetts Institute of Technology,
More informationHash-based Indexing: Application, Impact, and Realization Alternatives
: Application, Impact, and Realization Alternatives Benno Stein and Martin Potthast Bauhaus University Weimar Web-Technology and Information Systems Text-based Information Retrieval (TIR) Motivation Consider
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationLecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane
Randomized Algorithms Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized
More informationLocality Sensitive Hashing Revisited: Filling the Gap Between Theory and Algorithm Analysis
Locality Sensitive Hashing Revisited: Filling the Gap Between Theory and Algorithm Analysis Hongya Wang Jiao Cao School of Computer Science and Technology Donghua University Shanghai, China {hywang@dhu.edu.cn,
More informationLearning convex bodies is hard
Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given
More informationHAMMING DISTANCE FROM IRREDUCIBLE POLYNOMIALS OVER F Introduction and Motivation
HAMMING DISTANCE FROM IRREDUCIBLE POLYNOMIALS OVER F 2 GILBERT LEE, FRANK RUSKEY, AND AARON WILLIAMS Abstract. We study the Hamming distance from polynomials to classes of polynomials that share certain
More informationLecture 9: List decoding Reed-Solomon and Folded Reed-Solomon codes
Lecture 9: List decoding Reed-Solomon and Folded Reed-Solomon codes Error-Correcting Codes (Spring 2016) Rutgers University Swastik Kopparty Scribes: John Kim and Pat Devlin 1 List decoding review Definition
More informationLocality-Sensitive Hashing for Chi2 Distance
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XXXX, NO. XXX, JUNE 010 1 Locality-Sensitive Hashing for Chi Distance David Gorisse, Matthieu Cord, and Frederic Precioso Abstract In
More informationLecture 12: Lower Bounds for Element-Distinctness and Collision
Quantum Computation (CMU 18-859BB, Fall 015) Lecture 1: Lower Bounds for Element-Distinctness and Collision October 19, 015 Lecturer: John Wright Scribe: Titouan Rigoudy 1 Outline In this lecture, we will:
More informationAlgortithms for the Min-Cut problem
Algortithms for the Min-Cut problem Hongwei Jin Department of Applied Mathematics Illinois Insititute of Technology April 30, 2013 Outline 1 Introduction Problem Definition Previous Works 2 Karger s Algorithm
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationLecture 4: Codes based on Concatenation
Lecture 4: Codes based on Concatenation Error-Correcting Codes (Spring 206) Rutgers University Swastik Kopparty Scribe: Aditya Potukuchi and Meng-Tsung Tsai Overview In the last lecture, we studied codes
More information2 Completing the Hardness of approximation of Set Cover
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 15: Set Cover hardness and testing Long Codes Nov. 21, 2005 Lecturer: Venkat Guruswami Scribe: Atri Rudra 1 Recap We will first
More informationHashing. Martin Babka. January 12, 2011
Hashing Martin Babka January 12, 2011 Hashing Hashing, Universal hashing, Perfect hashing Input data is uniformly distributed. A dynamic set is stored. Universal hashing Randomised algorithm uniform choice
More informationClosest String and Closest Substring Problems
January 8, 2010 Problem Formulation Problem Statement I Closest String Given a set S = {s 1, s 2,, s n } of strings each length m, find a center string s of length m minimizing d such that for every string
More information1 Approximate Counting by Random Sampling
COMP8601: Advanced Topics in Theoretical Computer Science Lecture 5: More Measure Concentration: Counting DNF Satisfying Assignments, Hoeffding s Inequality Lecturer: Hubert Chan Date: 19 Sep 2013 These
More informationApproximate Nearest Neighbor Problem in High Dimensions. Alexandr Andoni
Approximate Nearest Neighbor Problem in High Dimensions by Alexandr Andoni Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More informationLOCALITY SENSITIVE HASHING FOR BIG DATA
1 LOCALITY SENSITIVE HASHING FOR BIG DATA Wei Wang, University of New South Wales Outline 2 NN and c-ann Existing LSH methods for Large Data Our approach: SRS [VLDB 2015] Conclusions NN and c-ann Queries
More informationNavigating nets: Simple algorithms for proximity search
Navigating nets: Simple algorithms for proximity search [Extended Abstract] Robert Krauthgamer James R. Lee Abstract We present a simple deterministic data structure for maintaining a set S of points in
More informationPRGs for space-bounded computation: INW, Nisan
0368-4283: Space-Bounded Computation 15/5/2018 Lecture 9 PRGs for space-bounded computation: INW, Nisan Amnon Ta-Shma and Dean Doron 1 PRGs Definition 1. Let C be a collection of functions C : Σ n {0,
More informationPractical and Optimal LSH for Angular Distance
Practical and Optimal LSH for Angular Distance Alexandr Andoni Columbia University Piotr Indyk MIT Thijs Laarhoven TU Eindhoven Ilya Razenshteyn MIT Ludwig Schmidt MIT Abstract We show the existence of
More information