Tailored Bregman Ball Trees for Effective Nearest Neighbors

Similar documents
That was fast! Speeding up NN search of high dimensional distributions.

Journée Interdisciplinaire Mathématiques Musique

Fast Nearest Neighbor Retrieval for Bregman Divergences

Composite Quantization for Approximate Nearest Neighbor Search

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Single-tree GMM training

Scalable Algorithms for Distribution Search

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions

Making Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI

Riemannian Metric Learning for Symmetric Positive Definite Matrices


Nearest Neighbor Search with Keywords

Online Topic-aware Influence Maximization Queries

Clustering with k-means and Gaussian mixture distributions

Fantope Regularization in Metric Learning

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Pavel Zezula, Giuseppe Amato,

Gaussian Mixture Distance for Information Retrieval

Distribution-specific analysis of nearest neighbor search and classification

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Semi Supervised Distance Metric Learning

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Approximate Voronoi Diagrams

13 : Variational Inference: Loopy Belief Propagation and Mean Field

arxiv: v2 [cs.cv] 19 Dec 2011

Inference. Data. Model. Variates

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

Constraint-based Subspace Clustering

2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE A formal definition considers probability measures P and Q defined

Clustering Multivariate Normal Distributions

Machine Learning on temporal data

Information Projection Algorithms and Belief Propagation

Real Time Pattern Detection

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

Nearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University

Kernel Learning with Bregman Matrix Divergences

CS320 The knapsack problem

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning 3. week

13 : Variational Inference: Loopy Belief Propagation

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Final Exam, Machine Learning, Spring 2009

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

Classification Using Decision Trees

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Induction of Decision Trees

Holdout and Cross-Validation Methods Overfitting Avoidance

University of Florida CISE department Gator Engineering. Clustering Part 1

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008

SPATIAL INDEXING. Vaibhav Bajpai

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

6 Distances. 6.1 Metrics. 6.2 Distances L p Distances

Gaussian Process Vine Copulas for Multivariate Dependence

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Inderjit Dhillon The University of Texas at Austin

where X is the feasible region, i.e., the set of the feasible solutions.

Query Processing in Spatial Network Databases

Transforming Hierarchical Trees on Metric Spaces

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Distance Metric Learning

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Privacy-Preserving Data Mining

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Towards Real Time Pattern Detection

softened probabilistic

12 : Variational Inference I

Exact and Approximate Flexible Aggregate Similarity Search

Learning Time-Series Shapelets

CS Algorithms and Complexity

Evolutionary Tree Analysis. Overview

An Algorithmist s Toolkit Nov. 10, Lecture 17

Statistical Machine Learning Lectures 4: Variational Bayes

Recursive Estimation of the Stein Center of SPD Matrices & its Applications

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances

An efficient approximation for point-set diameter in higher dimensions

Introduction to Data Science Data Mining for Business Analytics

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Unsupervised machine learning

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Faster Algorithms for some Optimization Problems on Collinear Points

Informed Search. Day 3 of Search. Chap. 4, Russel & Norvig. Material in part from

Clustering Perturbation Resilient

Classification Based on Logical Concept Analysis

Proximity data visualization with h-plots

Transcription:

Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia Antipolis, France 25th European Workshop on Computational Geometry March 16, 2009 ULB, Brussels, Belgium

Outline Introduction Bregman Nearest Neighbor search Bregman Ball Trees (BB-trees) Improved Bregman Ball Trees Speeded-up construction Adaptive node degree Symmetrized Bregman divergences Experiments

Nearest Neighbor (NN) search Applications: computer vision, machine learning, data mining, etc. Nearest neighbor NN(q) Given: then a set S = {p 1,..., p n } of n d-dimensional points a query point q a dissimilarity measure D NN(q) = arg min i D(q, p i ) (1) For asymmetric D (like Bregman divergences): NN l F (q) = arg min i D(q, p i ) (left-sided) NN r F (q) = arg min i D(p i, q) (right-sided) NN F (q) = arg min i (D(p i q) + D(q p i ))/2 (symmetrized)

Bregman divergences D F F (x) : X R d R strictly convex and differentiable generator D F (p q) = F (p) F (q) (p q) T F (q) (2) Bregman sided NN queries are related by Legendre conjugates: D F ( F (q) F (p)) = D F (p q) (dual divergence) Widely used as distorsion measures between image features: Mahalanobis squared distances (symmetric) F (x) = Σ 1 x (Σ 0 is the covariance matrix) Kullback-Leibler (KL) divergence (asymmetric) F (x) = d x j log x j j=1

Naïve search methods Brute-force linear search: exhaustive brute-force O(dn) randomized sampling O(αdn), α (0, 1) Randomized sampling keep a point with probability α mean size of the sample: αn speed-up: 1 α mean rank of the approximated NN: 1 α

Data structures for improved NN search Two main sets of methods: mapping techniques (e.g. locality-sensitive hashing, random projections) tree-like space partitions with branch-and-bound queries (e.g. kd-trees, metric ball and vantage point trees) faster than brute-force (pruning sub-trees) approximate NN search Extensions from the Euclidean distance to: arbitrary metrics: vp-trees [Yianilos, SODA 1993] Bregman divergences: k-means [Banerjee et al., JMLR 2005] We focus on Bregman Ball trees [Cayton, ICML 2008]

Outline of BB-trees (I) BB-tree construction Recursive partitioning scheme 1. 2-means clustering (keep the two centroids c l, c r ) 2. Bregman Balls B(c l, R l ) and B(c r, R r ) (possibly overlapping) 3. continue recursively until matching a stop criterion Termination criteria: B 2 B 4 B 5 B 1 B 3 B 6 B 7 maximum number of points l 0 stored at a leaf maximum leaf radius r 0 B 2 B 3 B 4 B 5 B 6 B 7

Outline of BB-trees (II) Branch-and-bound search 1. Descend the tree from the root to the leaves At internal nodes, choose child whose ball is closer to q (the sibling is temporarily ignored) At leaves, search for the NN candidate p (brute force) 2. Traverse back up the tree (check ignored nodes) project q onto the ball B(c, R) (bisection search): q B = arg min x B D F (x q) if DF (q B q) > D F (p q) the node can be pruned out

Outline of BB-trees (III) Bregman annuli Lower/upper bounds to speed-up geodesic bisection search: B(c, R, R ) = {x R D F (x c) R } B(c,R,R ) D F (p q) > D F (x q) (explore) p D F (p q) < D F (y q) (prune out) B(c,R,R ) c R R c R R p x y q x y q

Our main contributions From BB-tree to BB-tree++: Speed up construction time (Bregman 2-means++) Learn the tree branching factor (G-means) Explore nearest nodes first (priority queue) Handle symmetrized/mixed Bregman divergences We mainly focus on approximate NN queries (stop the search once a few leaves have been explored)

Speed up construction time We replace Bregman 2-means by a careful light initialization of the two cluster centers [Arthur et al., SODA 2007] Bregman 2-means++ 1. pick the first seed c l uniformly at random 2. for each p i S compute D F (p i c l ) 3. pick the second seed c r according to the distribution: π i = D F (p i c l ) p j S D F (p j c l ) (3) Good approximation guaranties [Nock et al., ECML 2008]. Fast tree construction, nice splitting

Learning the tree branching factor (I) Goal Get as many as possible non-overlapping Bregman balls Example Three separated Gaussian samples. dataset 2 means 3 means c1 c2 c2 c1 c3 Method adapt the branching factor bf i of each internal node

Learning the tree branching factor (II) G-means assume Gaussian distribution of each group of points use Bregman 2-means++ inizialization to split a set apply the Anderson-Darling normality test to the two clusters if the test returns true, we keep the center, otherwise we split it into two repeat for each new cluster Ongoing work: generalization to goodness-of-fit tests for exponential family distributions (e.g. Stephens test).

Handling symmetrized Bregman divergences Why? required by content-based information retrieval (CBIR) systems technically are not Bregman divergences Example: SKL & JS(p; q) = 1 p+q 2KL(p 2 ) + 1 p+q 2KL(q 2 ) Proposed solutions: symmetrized Bregman centroid of B(c, R): geodesic-walk algorithm of [Nielsen et al., SODA 2007]. mixed BB-trees: store two centers for each ball B(l, r, R) mixed Bregman divergence [Nock et al., ECML 2008] D F,α (l x r) = (1 α)d F (l x) + αd F (x r), α [0, 1] (4) (for α = 1 2, l = r we find the symmetrized Bregman div.)

Nearest neighbors for Image Retrieval Task find similar images to a query S dataset of feature vectors (descriptors) q descriptor of a query image retrieve the most similar descriptor (image) NN(q) Example SIFT descriptors: [Lowe, IJCV 2005]. Dataset 10,000 images from PASCAL Visual Object Classes Challenge 2007 10,000 database points (for building the tree) 2,360 query points (for on-line search) dimension d = 1111

Performance evaluation Approximate search Find a good NN, i.e. a point close enough to the true NN explore a given amount of leaves from near-exact search to visiting one single leaf speed-up number of divergence computations (ratio of brute-force over BB-tree++) R avg average approximated NN rank NC number of points closer to the approximated NN (NC = R avg 1)

BB-tree construction performances iter number of k-means iterations bs maximum number of points in a leaf depth maximum tree depth depth avg average tree depth nleaves number of leaf nodes Bb-tree construction (bs = 50) method iter depth depth avg nleaves speed-up 2-means 10 53 28.57 594 1 2-means++ 10 58.33 31.18 647 1.03 2-means++ 0 20 10.76 362 19.71

Asymmetric NN queries BB-tree vs BB-tree++ 2 bb tree++ (BF=4, bs=100) bb tree Cayton SIFT dataset (KL) * Speed up 1 * 0 1 0 1 2 Number Closer

Symmetrized NN queries BB-tree++ vs Randomized Sampling 10 3 SIFT dataset (SKL) Speed up wrt. brute force 10 2 10 1 BB tree++ (BF=4, bs=100) Random Sampling * * 10 0 10 0 10 1 10 2 Average approximate NN rank

Conclusion BB-tree++: adapted to the inner geometric characteristics of data speed up construction (k-means careful initialization) speed up search (priority queue) handle symmetrized Bregman divergences promising results for image retrieval (SIFT histograms) Ongoing work: design the most appropriate divergence to a class of data extensive application to feature sets arising from image retrieval/classification