Nearest-Neighbor Searching Under Uncertainty
|
|
- Allyson Garrett
- 5 years ago
- Views:
Transcription
1 Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.
2 Nearest-Neighbor Searching S: a set of n points in R 2 q: any query point in R 2 Find the closest point p S to q S q p Applications Pattern Recognition, Data Compression Statistical Classification, Clustering Databases, Information Retrieval Computer Vision, etc. 2
3 Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc. What is the nearest neighbor of q now? q 3
4 Our Model and Problem Statement Uncertain point P: represented as a probability density function (pdf) P p 1 p 2 p 3 p 4 w p Expected distance: Ed P, Q = w p w q d p, q. p P q Q w p, w q : probabilities/weights d, : distance function Let P = P 1,, P n, find the expected nearest neighbor (ENN) of Q: P = argmin Ed(P, Q). P P Or an ε-enn P P: Ed P, Q 1 + ε Ed P, Q. 4
5 Previous work and Our contribution Previous work The expected k-nn under L 1 metric: ε-approximation [Ljosa2007] Aggregate nearest neighbor (ANN) under the SUM function [Li2011, Sharifzadeh2010, Lian2008, etc] All based on heuristics Our contribution First nontrivial methods for answering exact or ε- approximate ENN queries with provable performance guarantees 5
6 Summary of results Distance function Squared Euclidean distance Rectilinear metric Euclidean metric (ε-enn) Settings Preprocessing time Space Query time Uncertain data Uncertain query Uncertain data Uncertain query Uncertain data Uncertain query O(n log n + nk) O(n) O(log n) O(n log n) O(n) O(log n + k) O(k 2 n log 3 n) O(k 2 n log 2 n) O(log 3 (kn)) O(n log 2 n) O(n log 2 n) O(k 2 log 3 n) O( n/ε 2 log 2 n log(n/ε) log 1/ε ) O( n/ε 2 log 2 (1/ε)) O(log n/ε ) O(n log n) O(n) O( k/ε 2 log(1/ε) log n) Results in R 2, extends to higher dimensions 6
7 Voronoi Diagram P = p 1,, p n Voronoi cell: Vor p i = x R d d(p i, x) d(p j, x)} Voronoi diagram VD(P): decomposition induced by Vor p i Preprocessing time O(n log n) Space O(n) Query time O(log n) 7
8 Expected Voronoi Diagram P = P 1,, P n Expected Voronoi cell EVor P i = x R d Ed(P i, x) Ed(P j, x)} Expected Voronoi diagram EVD(P): induced by EVor P i An example in L 1 metric 8
9 Minimization diagram F = f 1,, f n The lower envelope of F: L F x = min f i(x) 1 i n M(F): the projection of the graph of L F 9
10 Squared Euclidean distance Uncertain data p = σ 2 = Then, P = R d R d xf P x dx: the centroid of P P 1,, P n x p 2 f P x dx Ed P, q = ǁx pǁ 2 + σ 2. Replace P i P by p i with weight σ i 2 EVD P same as the power diagram WPD P, P = p 1,, p n Preprocessing time Space Query time O(n log n + nk) O(n) O(log n) 10 Remarks: Works for any distribution
11 Rectilinear metric Uncertain data Assume L 1 metric: d p, q = x p x q + y p y q Size of EVD(P): O(k 2 n 2 α(n)) Lower bound Ω(n 2 ) construction 11 α n : the inverse Ackermann function Remarks: Extends to L metric
12 Rectilinear metric Uncertain data (cont.) A near-linear size index exists despite Ω(n 2 ) size of EVD P P i x pij x q y pij y q p ij Linear! x pij x q y pij y q p ij x pij x q + y pij y q x pij x q + y pij y q k k Ed P i, q = j=1 w ij d p ij, q = j=1 w ij x pij x q + y pij y q O(k 2 ) linear pieces! 12
13 Rectilinear metric Uncertain data (cont.) Preprocessing time Space Query time O(k 2 n log 3 n) O(k 2 n log 2 n) O(log 3 (kn)) 13 Remarks: Extends to higher dimensions
14 Euclidean metric (ε-enn) Uncertain data Approximate Ed P, x by g(x) Outside the grid: g x = x p + Ed(P, p ) Inside the gird: g x = Ed P, a + cell size Total # of cells: O(1/ε 2 log(1/ε)) B: the collection of squares C: outermost square 14 Remarks: Extends to any L p metric
15 Euclidean metric (ε-enn) Uncertain data (cont.) Quadtree: 4-way tree in q = P i q C i } and out [q] = P i q C i } B in = B i and B out : generated by Arya s data structure on P = p 1,, p n A linear size approximate EVD(P)! Preprocessing time Space Query time O( n/ε 2 log 2 n log(n/ε) log 1/ε ) O( n/ε 2 log 2 (1/ε)) O(log n/ε ) 15
16 Further work Is there a linear-size index to answer the following queries in sublinear time in the worst case? the nearest neighbor with highest probability the nearest neighbors with probability higher than τ 16
17 Squared Euclidean distance Uncertain query P = p 1,, p n q: the centroid of Q argmin Ed(p, Q) = argmin p P p P ǁq pǁ 2 Preprocessing Compute the Voronoi diagram VD(P) Query Given Q, compute q in O k, then query VD P with q Preprocessing time Space Query time O(n log n) O(n) O(log n + k) 17 Remarks: Extends to higher dimensions and works for any distribution
18 Rectilinear metric Uncertain query Similarly, Ed Q, p : O(k 2 ) linear pieces Preprocessing time O(n log 2 n) Space O(n log 2 n) Query time O(k 2 log 3 n) 18
19 Euclidean metric (ε-enn) Uncertain query P = p 1,, p n Preprocessing time O(n log n) Space O(n) Query time O( k/ε 2 log(1/ε) log n) 19 Remarks: Extends to higher dimensions
Nearest-Neighbor Searching Under Uncertainty
Nearest-Neighbor Searching Under Uncertainty Pankaj K. Agarwal Department of Computer Science Duke University pankaj@cs.duke.edu Alon Efrat Department of Computer Science The University of Arizona alon@cs.arizona.edu
More informationGeometric Computing over Uncertain Data
Geometric Computing over Uncertain Data by Wuzhou Zhang Department of Computer Science Duke University Date: Approved: Pankaj K. Agarwal, Supervisor Sayan Mukherjee Kamesh Munagala Jun Yang Dissertation
More informationGeometric Computing over Uncertain Data
Geometric Computing over Uncertain Data by Wuzhou Zhang Department of Computer Science Duke University Date: Approved: Pankaj K. Agarwal, Supervisor Sayan Mukherjee Kamesh Munagala Jun Yang Dissertation
More informationNearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University
Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching (NNS) S: a set of n points in R. q: any
More informationOptimal Data-Dependent Hashing for Approximate Near Neighbors
Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30 Nearest Neighbor Search (NNS) Let P be an n-point
More informationApproximate Voronoi Diagrams
CS468, Mon. Oct. 30 th, 2006 Approximate Voronoi Diagrams Presentation by Maks Ovsjanikov S. Har-Peled s notes, Chapters 6 and 7 1-1 Outline Preliminaries Problem Statement ANN using PLEB } Bounds and
More informationOn the Most Likely Voronoi Diagram and Nearest Neighbor Searching?
On the Most Likely Voronoi Diagram and Nearest Neighbor Searching? Subhash Suri and Kevin Verbeek Department of Computer Science, University of California, Santa Barbara, USA. Abstract. We consider the
More informationThe τ-skyline for Uncertain Data
CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation
More informationA Fast and Simple Algorithm for Computing Approximate Euclidean Minimum Spanning Trees
A Fast and Simple Algorithm for Computing Approximate Euclidean Minimum Spanning Trees Sunil Arya Hong Kong University of Science and Technology and David Mount University of Maryland Arya and Mount HALG
More informationarxiv:cs/ v1 [cs.cg] 7 Feb 2006
Approximate Weighted Farthest Neighbors and Minimum Dilation Stars John Augustine, David Eppstein, and Kevin A. Wortman Computer Science Department University of California, Irvine Irvine, CA 92697, USA
More informationFly Cheaply: On the Minimum Fuel Consumption Problem
Journal of Algorithms 41, 330 337 (2001) doi:10.1006/jagm.2001.1189, available online at http://www.idealibrary.com on Fly Cheaply: On the Minimum Fuel Consumption Problem Timothy M. Chan Department of
More informationApproximate Nearest Neighbor (ANN) Search in High Dimensions
Chapter 17 Approximate Nearest Neighbor (ANN) Search in High Dimensions By Sariel Har-Peled, February 4, 2010 1 17.1 ANN on the Hypercube 17.1.1 Hypercube and Hamming distance Definition 17.1.1 The set
More informationWELL-SEPARATED PAIR DECOMPOSITION FOR THE UNIT-DISK GRAPH METRIC AND ITS APPLICATIONS
WELL-SEPARATED PAIR DECOMPOSITION FOR THE UNIT-DISK GRAPH METRIC AND ITS APPLICATIONS JIE GAO AND LI ZHANG Abstract. We extend the classic notion of well-separated pair decomposition [10] to the unit-disk
More informationEstimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan
Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process
More informationSpace-Time Tradeoffs for Approximate Spherical Range Counting
Space-Time Tradeoffs for Approximate Spherical Range Counting Sunil Arya Theocharis Malamatos David M. Mount University of Maryland Technical Report CS TR 4842 and UMIACS TR 2006 57 November 2006 Abstract
More informationCoresets for k-means and k-median Clustering and their Applications
Coresets for k-means and k-median Clustering and their Applications Sariel Har-Peled Soham Mazumdar November 7, 2003 Abstract In this paper, we show the existence of small coresets for the problems of
More informationAn efficient approximation for point-set diameter in higher dimensions
CCCG 2018, Winnipeg, Canada, August 8 10, 2018 An efficient approximation for point-set diameter in higher dimensions Mahdi Imanparast Seyed Naser Hashemi Ali Mohades Abstract In this paper, we study the
More informationSpace-Time Tradeoffs for Approximate Nearest Neighbor Searching
1 Space-Time Tradeoffs for Approximate Nearest Neighbor Searching SUNIL ARYA Hong Kong University of Science and Technology, Kowloon, Hong Kong, China THEOCHARIS MALAMATOS University of Peloponnese, Tripoli,
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline
More informationTransforming Hierarchical Trees on Metric Spaces
CCCG 016, Vancouver, British Columbia, August 3 5, 016 Transforming Hierarchical Trees on Metric Spaces Mahmoodreza Jahanseir Donald R. Sheehy Abstract We show how a simple hierarchical tree called a cover
More informationSimilarity searching, or how to find your neighbors efficiently
Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques
More informationGeometric Facility Location Problems on Uncertain Data
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 8-2017 Geometric Facility Location Problems on Uncertain Data Jingru Zhang Utah State University Follow this
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationProximity problems in high dimensions
Proximity problems in high dimensions Ioannis Psarros National & Kapodistrian University of Athens March 31, 2017 Ioannis Psarros Proximity problems in high dimensions March 31, 2017 1 / 43 Problem definition
More informationExact and Approximate Flexible Aggregate Similarity Search
Noname manuscript No. (will be inserted by the editor) Exact and Approximate Flexible Aggregate Similarity Search Feifei Li, Ke Yi 2, Yufei Tao 3, Bin Yao 4, Yang Li 4, Dong Xie 4, Min Wang 5 University
More informationTesting Cluster Structure of Graphs. Artur Czumaj
Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund) Dealing with BigData in Graphs
More informationDistribution-specific analysis of nearest neighbor search and classification
Distribution-specific analysis of nearest neighbor search and classification Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to information retrieval and classification.
More informationHausdorff Distance under Translation for Points and Balls
Hausdorff Distance under Translation for Points and Balls Pankaj K. Agarwal Sariel Har-Peled Micha Sharir Yusu Wang February 6, 2006 Abstract We study the shape matching problem under the Hausdorff distance
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationEfficient geometric algorithms for preference top-k queries, stochastic line arrangements, and proximity problems
Efficient geometric algorithms for preference top-k queries, stochastic line arrangements, and proximity problems A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationMassive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering
Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering Jasjeet S. Sekhon UC Berkeley June 21, 2016 Jasjeet S. Sekhon (UC Berkeley) Methods for Massive
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationApproximate k-at Nearest Neighbor Search
Approximate k-at Nearest Neighbor Search Wolfgang Mulzer Huy L. Nguy ên Paul Seiferth Yannik Stein Abstract Let k be a nonnegative integer. In the approximate k-at nearest neighbor (k-ann) problem, we
More informationCluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li
77 Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 1) Introduction Cluster analysis deals with separating data into groups whose identities are not known in advance. In general, even the
More informationApproximate Geometric MST Range Queries
Approximate Geometric MST Range Queries Sunil Arya 1, David M. Mount 2, and Eunhui Park 2 1 Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water
More informationAlgorithms for Calculating Statistical Properties on Moving Points
Algorithms for Calculating Statistical Properties on Moving Points Dissertation Proposal Sorelle Friedler Committee: David Mount (Chair), William Gasarch Samir Khuller, Amitabh Varshney January 14, 2009
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationPivot Selection Techniques
Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor
More informationApproximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points
Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Timothy M. Chan Zahed Rahmati Abstract Given a set of n moving points in R d, where each point moves
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationGeometric Approximation via Coresets
Geometric Approximation via Coresets Pankaj K. Agarwal Sariel Har-Peled Kasturi R. Varadarajan October 22, 2004 Abstract The paradigm of coresets has recently emerged as a powerful tool for efficiently
More informationSemantics of Ranking Queries for Probabilistic Data and Expected Ranks
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)
More informationLecture 23: Hausdorff and Fréchet distance
CPS296.2 Geometric Optimization 05 April, 2007 Lecture 23: Hausdorff and Fréchet distance Lecturer: Pankaj K. Agarwal Scribe: Nihshanka Debroy 23.1 Introduction Shape matching is an important area of research
More informationNon-Uniform Graph Partitioning
Robert Seffi Roy Kunal Krauthgamer Naor Schwartz Talwar Weizmann Institute Technion Microsoft Research Microsoft Research Problem Definition Introduction Definitions Related Work Our Result Input: G =
More informationUncertain Time-Series Similarity: Return to the Basics
Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects;
More informationOrdinal Embedding: Approximation Algorithms and Dimensionality Reduction
Ordinal Embedding: Approximation Algorithms and Dimensionality Reduction Mihai Bădoiu 1, Erik D. Demaine 2, MohammadTaghi Hajiaghayi 3, Anastasios Sidiropoulos 2, and Morteza Zadimoghaddam 4 1 Google Inc.,
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationControl and synchronization in systems coupled via a complex network
Control and synchronization in systems coupled via a complex network Chai Wah Wu May 29, 2009 2009 IBM Corporation Synchronization in nonlinear dynamical systems Synchronization in groups of nonlinear
More informationGeometric Computations on Indecisive and Uncertain Points
Geometric Computations on Indecisive and Uncertain Points Allan Jørgensen Maarten Löffler Jeff M. Phillips Abstract We study computing geometric problems on uncertain points. An uncertain point is a point
More informationStreaming Property Testing of Visibly Pushdown Languages
Streaming Property Testing of Visibly Pushdown Languages Nathanaël François Frédéric Magniez Michel de Rougemont Olivier Serre SUBLINEAR Workshop - January 7, 2016 1 / 11 François, Magniez, Rougemont and
More informationA Fast and Simple Algorithm for Computing Approximate Euclidean Minimum Spanning Trees
A Fast and Simple Algorithm for Computing Approximate Euclidean Minimum Spanning Trees Sunil Arya Dept. of Computer Science and Engineering The Hong Kong University of Science and Technology David M. Mount
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationAPPROXIMATION ALGORITHMS FOR PROXIMITY AND CLUSTERING PROBLEMS YOGISH SABHARWAL
APPROXIMATION ALGORITHMS FOR PROXIMITY AND CLUSTERING PROBLEMS YOGISH SABHARWAL DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY DELHI DECEMBER 2006 c Indian Institute of
More informationMaking Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI
Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Making Nearest
More informationProbabilistic Nearest-Neighbor Query on Uncertain Objects
Probabilistic Nearest-Neighbor Query on Uncertain Objects Hans-Peter Kriegel, Peter Kunath, Matthias Renz University of Munich, Germany {kriegel, kunath, renz}@dbs.ifi.lmu.de Abstract. Nearest-neighbor
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationEvaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments
Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationAlgorithmic interpretations of fractal dimension. Anastasios Sidiropoulos (The Ohio State University) Vijay Sridhar (The Ohio State University)
Algorithmic interpretations of fractal dimension Anastasios Sidiropoulos (The Ohio State University) Vijay Sridhar (The Ohio State University) The curse of dimensionality Geometric problems become harder
More informationA Simple Linear Time (1 + ε)-approximation Algorithm for k-means Clustering in Any Dimensions
A Simple Linear Time (1 + ε)-approximation Algorithm for k-means Clustering in Any Dimensions Amit Kumar Dept. of Computer Science & Engg., IIT Delhi New Delhi-110016, India amitk@cse.iitd.ernet.in Yogish
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationNearest Neighbor Preserving Embeddings
Nearest Neighbor Preserving Embeddings Piotr Indyk MIT Assaf Naor Microsoft Research Abstract In this paper we introduce the notion of nearest neighbor preserving embeddings. These are randomized embeddings
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationarxiv: v1 [cs.cg] 5 Sep 2018
Randomized Incremental Construction of Net-Trees Mahmoodreza Jahanseir Donald R. Sheehy arxiv:1809.01308v1 [cs.cg] 5 Sep 2018 Abstract Net-trees are a general purpose data structure for metric data that
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationSublinear Time Algorithms for Earth Mover s Distance
Sublinear Time Algorithms for Earth Mover s Distance Khanh Do Ba MIT, CSAIL doba@mit.edu Huy L. Nguyen MIT hlnguyen@mit.edu Huy N. Nguyen MIT, CSAIL huyn@mit.edu Ronitt Rubinfeld MIT, CSAIL ronitt@csail.mit.edu
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationHandling imprecise and uncertain class labels in classification and clustering
Handling imprecise and uncertain class labels in classification and clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) COST Action IC 0702 Working group C, Mallorca,
More informationGeometric Optimization Problems over Sliding Windows
Geometric Optimization Problems over Sliding Windows Timothy M. Chan and Bashir S. Sadjad School of Computer Science University of Waterloo Waterloo, Ontario, N2L 3G1, Canada {tmchan,bssadjad}@uwaterloo.ca
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationImproved Submatrix Maximum Queries in Monge Matrices
Improved Submatrix Maximum Queries in Monge Matrices Pawe l Gawrychowski 1, Shay Mozes 2, and Oren Weimann 3 1 MPII, gawry@mpi-inf.mpg.de 2 IDC Herzliya, smozes@idc.ac.il 3 University of Haifa, oren@cs.haifa.ac.il
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationCompressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park
Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video
More informationOn the Locality of Distributed Sparse Spanner Constructions
On the Locality of Distributed Sparse Spanner Constructions B. Derbel, C. Gavoille, D. Peleg, L. Viennot University of Lille University of Bordeaux Weizmann Institute INRIA, Paris PODC 2008 Toronto What
More information5. DIVIDE AND CONQUER I
5. DIVIDE AND CONQUER I mergesort counting inversions closest pair of points randomized quicksort median and selection Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More informationRamsey partitions and proximity data structures
Ramsey partitions and proximity data structures Manor Mendel Assaf Naor Abstract This paper addresses two problems lying at the intersection of geometric analysis and theoretical computer science: The
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationApproximate Range Searching: The Absolute Model
Approximate Range Searching: The Absolute Model Guilherme D. da Fonseca Department of Computer Science University of Maryland College Park, Maryland 20742 fonseca@cs.umd.edu David M. Mount Department of
More informationFinding Frequent Items in Probabilistic Data
Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationHigh-Dimensional Indexing by Distributed Aggregation
High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas
More informationGeometric Approximation via Coresets
Discrete and Computational Geometry MSRI Publications Volume 52, 2005 Geometric Approximation via Coresets PANKAJ K. AGARWAL, SARIEL HAR-PELED, AND KASTURI R. VARADARAJAN Abstract. The paradigm of coresets
More informationCOMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM
COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM A metric space (M, d) is a set M with a metric d(x, y), x, y M that has the properties d(x, y) = d(y, x), x, y M d(x, y) d(x, z) + d(z, y), x,
More informationApplications of Chebyshev Polynomials to Low-Dimensional Computational Geometry
Applications of Chebyshev Polynomials to Low-Dimensional Computational Geometry Timothy M. Chan 1 1 Department of Computer Science, University of Illinois at Urbana-Champaign tmc@illinois.edu Abstract
More informationFast Algorithms for Constant Approximation k-means Clustering
Transactions on Machine Learning and Data Mining Vol. 3, No. 2 (2010) 67-79 c ISSN:1865-6781 (Journal), ISBN: 978-3-940501-19-6, IBaI Publishing ISSN 1864-9734 Fast Algorithms for Constant Approximation
More informationLecture 14 October 16, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 14 October 16, 2014 Scribe: Jao-ke Chin-Lee 1 Overview In the last lecture we covered learning topic models with guest lecturer Rong Ge.
More informationThe Fast Multipole Method and other Fast Summation Techniques
The Fast Multipole Method and other Fast Summation Techniques Gunnar Martinsson The University of Colorado at Boulder (The factor of 1/2π is suppressed.) Problem definition: Consider the task of evaluation
More informationLocality Sensitive Hashing
Locality Sensitive Hashing February 1, 016 1 LSH in Hamming space The following discussion focuses on the notion of Locality Sensitive Hashing which was first introduced in [5]. We focus in the case of
More informationComposite Quantization for Approximate Nearest Neighbor Search
Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from
More informationEstimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1
Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel
More informationNotion of Distance. Metric Distance Binary Vector Distances Tangent Distance
Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationThe Knapsack Problem. 28. April /44
The Knapsack Problem 20 10 15 20 W n items with weight w i N and profit p i N Choose a subset x of items Capacity constraint i x w i W wlog assume i w i > W, i : w i < W Maximize profit i x p i 28. April
More information