Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
|
|
- Benjamin Ross
- 6 years ago
- Views:
Transcription
1 Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
2 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees 13.2 M-Trees Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2
3 13.0 Indexes for Multimedia Data Multimedia databases Images Audio data Video data Description of multimedia objects Usually (multidimensional) real-valued feature vectors But also: skeletons, chain codes,... The sequential search for similar objects in databases is very inefficient How can we speed up the search? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3
4 13.0 Indexes for Multimedia Data Speed up search through indexing Efficient management of multidimensional information Pre-structuring of data for the subsequent search functionality Efficient data structures, combined with search and comparison algorithms Transition from set semantics to list semantics To which degree does the object from the database satisfy the query? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4
5 13.0 Indexes for Multimedia Data Requirements for a multidimensional index structure Correctness and completeness of the corresponding indexing algorithms Scalability with dimension growth Support objects which are not real-valued vectors Search efficiency (sublinear) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5
6 13.0 Indexes for Multimedia Data Different types of queries: Exact search Area search Nearest neighbors search... Efficient update operations Support for various distance functions Low memory requirements Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6
7 13.0 Indexes for Multimedia Data Fundamental problem: The more dimensions, the more comparisons are needed There is currently no truly scalable indexing Cause: Curse of Dimensionality (Richard Bellman) The volume of space grows exponentially with the number of its dimensions Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7
8 13.0 Query Types Exact search Point search Area search k-nearest-neighbor search (k - NN-search) Find the k objects that have the least distance to the object given as reference in the request k-nn search is usually only calculated on approximation basis (with a specified error) due to the high cost Reverse-nearest-neighbor search Find all the objects whose nearest neighbor is provided in the query Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8
9 13.0 Tree Structures Search in database systems B-tree structures allow exact search with logarithmic costs Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9
10 13.0 Tree Structures Search in multimedia databases The data is multidimensional, B-trees however, support only one-dimensional search Are there any possibilities to extend tree functionality for multidimensional data? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10
11 13.0 Tree Structures The basic idea of multidimensional trees Describe the sets of points through geometric regions, which comprise the points (clusters) The clusters are considered for the actual search and not the individual points Clusters can contain each other, resulting in a hierarchical structure Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11
12 13.0 Tree Structures Differentiating criterias for tree structures: Cluster construction: Completely fragmenting the space or Grouping data locally Cluster overlap: Overlapping or Disjoint Balance: Balanced or Unbalanced Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12
13 13.0 Tree Structures Object storage: Objects in leaves and nodes, or Objects only in the leaves Geometry: Hyper-spheres, Hyper-cube,... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13
14 13.1 R-Trees The R-tree (Guttman, 1984) is the prototype of a multi-dimensional extension of the classical B-trees Frequently used for low-dimensional applications (used to about 10 dimensions), such as geographic information systems More scalable versions: R + -Trees, R*-Trees and X- Trees (each up to 20 dimensions for uniform distributed data) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14
15 13.1 R-Tree Structure Dynamic Index Structure (insert, update and delete are possible) Data structure Data pages are leaf nodes and store clustered point data and data objects Directory pages are the internal nodes and store directory entries Multidimensional data are structured with the help of Minimum Bounding Rectangles (MBRs) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15
16 13.1 R-Tree Example R1 R4 R5 R6 R3 R10 R11 root R1 R2 R3 R2 R7 X Q R4 R5 R6 R7 R8 R9 R10 R11 R8 X O X p R9 root Q P O Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16
17 13.1 R-Tree Characteristics Local grouping for clustering Overlapping clusters (the more the clusters overlap the more inefficient is the index) Height balanced tree structure (therefore all the children of a node in the tree have about the same number of successors) Objects are stored, only in the leaves Internal nodes are used for navigation MBRs are used as a geometry Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17
18 13.1 R-Tree Properties The root has at least two children Each internal node has between m and M children M and m M / 2 are pre-defined parameters For each entry (I, child-pointer) in an internal node, I is the smallest rectangle that contains the rectangles of the child nodes Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18
19 13.1 R-Tree Properties For each index entry (I, tuple-id) in a leaf, I is the smallest bounding rectangle that contains the data object (with the ID tuple-id) All the leaves in the tree are on the same level All leaves have between m and M index records Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19
20 13.1 Operations of R-Trees The essential operations for the use and management of an R-tree are Search Insert Updates Delete Splitting Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20
21 13.1 Searching in R-Trees The tree is searched recursively from the root to the leaves One path is selected If the requested record has not been found in that sub-tree, the next path is traversed The path selection is arbitrary Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21
22 13.1 Searching in R-Trees No guarantee for good performance In the worst case, all paths must traversed (due to overlaps of the MBRs) Search algorithms try to exclude as many irrelevant regions as possible ( pruning ) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22
23 13.1 Search Algorithm All the index entries which intersect with the search rectangle S are traversed The search in internal nodes Check each object for intersection with S For all intersecting entries continue the search in their children The search in leaf nodes Check all the entries to determine whether they intersect S Take all the correct objects in the result set Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23
24 13.1 Example R1 R5 R3 R11 root R4 R6 R2 R7 R10 X R1 R2 R3 X R7 R8 R9 R8 S R9 root Check all the objects in node R8 Check only 7 nodes instead of 12 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24
25 13.1 Insert Procedure The best leaf page is chosen (ChooseLeaf) considering the spatial criteria Beast leaf: the leaf that needs the smallest volume growth to include the new object The object will be inserted there if there is enough room (number of objects in the node < M) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25
26 13.1 Insert If there is no more place left in the node, it is considered a case for overflow and the node is divided (SplitNode) Goal of the split is to result in minimal overlap and as small dead space as possible Interval of the parent node must be adapted to the new object (AdjustTree) If the root is reached by division, then create a new root whose children are the two split nodes of the old root Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26
27 13.1 R-Tree Insert Example R1 R5 R2 R3 R11 R2 R7 R2 R7 R4 R6 R10 R8 x P R9 R8 x P R9 R2 R8 R7 x P R9 root Inserting P either in R7 or R9 In R7, it needs more space, but does not overlap Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27
28 13.1 Heuristics An object is always inserted in the nodes, to which it produces the smallest increase in volume If it falls in the interior of a MBR no enlargement is need If there are several possible nodes, then select the one with the smallest volume Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28
29 13.1 Insert with Overflow R1 R5 R3 R11 R2 R7 R7b R4 R6 R10 R8 R9 X P R2 R7 X P root R8 R9 root R1 R2 R3 R4 R5 R6 R7 R7b R8 R9 R10 R11 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29
30 13.1 SplitNode If an object is inserted in a full node, then the M+1 objects will be divided among two new nodes The goal in splitting is that it should rarely be needed to traverse both resulting nodes on subsequent searches Therefore use small MBRs. This leads to minimal overlapping with other MBRs Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30
31 13.1 Split Example Calculate the minimum total area of two rectangles, and minimize the dead space Bad split Better Split Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31
32 13.1 Overflow Problem Deciding on how exactly to perform the splits is not trivial All objects of the old MBR can be divided in different ways on two new MBRs The volume of both resulting MBRs should remain as small as possible The naive approach of checking checks all splits and calculate the resulting volumes is not possible Two approaches With quadratic cost With linear cost Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32
33 13.1 Overflow Problem Procedure with quadratic cost Compute for each 2 objects the necessary MBR and choose the pair with the largest MBR Since these two objects should not occur in an MBR, they will be used as starting points for two new MBRs Compute for all other objects, the difference of the necessary volume increase with respect to both MBRs Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33
34 13.1 Overflow Problem Insert the object with the smallest difference in the corresponding MBR and compute the MBR again Repeat this procedure for all unallocated objects Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34
35 13.1 Overflow Problem Procedure with linear cost In each dimension: Find the rectangle with the highest minimum coordinates, and the rectangle with the smallest maximum coordinates Determine the distance between these two coordinates, and normalize it on the size of all the rectangles in this dimension Determine the two starting points of the new MBRs as the two objects with the highest normalized distance Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35
36 13.1 Example 5 A D B E 8 13 C 14 x-direction: select A and E, as d x = diff x /max x = 5 / 14 y-direction: select C and D, as d y = diff y /max y = 8 / 13 Since d x < d y, C and D are chosen for the split Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36
37 13.1 Overflow Problem Classify all remaining objects the MBR with the smallest volume growth The linear process is a simplification of the quadratic method It is usually sufficient providing similar quality of the split (minimal overlap of the resulting MBRs) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37
38 13.1 Delete Procedure Search the leaf node with the object to delete (FindLeaf) Delete the object (deleterecord) The tree is condensed (CondenseTree) if the resulting node has < m objects When condensing, a node is completely erased and the objects of the node which should have remained are reinserted If the root remains with just one child, the child will become the new root Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38
39 13.1 Example An object from R9 is deleted (1 object remains in R9, but m = 2) Due to few objects R9 is deleted, and R2 is reduced (condensetree) R2 R7 R2 R7 root R8 R9 R8 R1 R2 R3 R4 R5 R6 R7 R8 R10 R11 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39
40 13.1 Update If a record is updated, its surrounding rectangle can change The index entry must then be deleted updated and then re-inserted Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40
41 13.1 Block Access Cost The most efficient search in R-trees is performed when the overlap and the dead space are minimal root A D C N F M E E L G S H K J I B A B C D E F G H I J K L M N Avoiding overlapping is only possible if data points are known in advance Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41
42 13.1 Improved Versions of R-Trees Where are R-trees inefficient? They allow overlapping between neighboring MBRs R + -Trees (Sellis ua, 1987) Overlapping of neighboring MBRs are prohibited This may lead to identical leafs occurring more than once in the tree Improve search efficiency, but similar scalability as R-trees Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42
43 13.1 R + -Trees A D F E P G S H K J I B root A B C P C M N L D E F G I J K L M N G H Overlaps are not permitted (A and P) Data rectangles are divided and may be present (e.g., G) in several leafs Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43
44 13.1 Operations in R + -Trees Differences to the R-tree Insert Data object can be inserted into several leafs Splitting continues downwards, since no overlaps are allowed Delete There is no more minimum number of children Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44
45 13.1 Performance The main advantage of R + -trees is to improve the search performance Especially for point queries, this saves 50% of access time Drawback is the low occupancy of nodes resulting through many splits R + -trees often degenerate with the increasing number of changes Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45
46 13.1 More Versions R*- trees and X-trees improve the performance of the R + -trees (Kriegel and others, 1990/1996) Improved split algorithm in R*-trees Extended nodes in X-trees allow sequential search of larger objects Scalable up to 20 dimensions Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46
47 13.2 M-Trees M-tree (Ciaccia et al, 1997) allows the use of arbitrary metrics for comparison of objects ( metric trees ) R-trees only work with Euclidean metrics, but what about for example, the editing distance? Use the triangle inequality to check sub-trees Geometry is determined by the distance function Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47
48 13.2 Metric Space A metric space is a pair of M = (U, d) U is the universe of all possible values d is a metric For all x, y, z U: d(x, y) 0, d(x, y) = 0 iff. x = y d(x, y) = d(y, x) d(x, y) d(x, z) + d(z, y) (triangle inequality) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48
49 13.2 Triangle Inequality Precomputed: Distances for all pairs of points Task: Find the object with the smallest distance to Q Distance between Q and a is 2 Distance between Q and b is 7.81 Can C be the best object? d (Q, b) d (Q, c) + d ( b, c) 5,51 d (Q, c) No. Therefore a is better Q a b a b c a b 2.30 c c Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49
50 13.2 Partitioning The M-tree partitions the objects in ε-environments with certain radius A balanced partitioning is obtained by choosing P1 = { p d(p, v) r v } and P2 = { p d(p, v) > r v } so that P1 P2 P / 2 For a query q with d(q, x) < r only P2 must be considered Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50
51 13.2 M-Trees M-trees are similar to R-trees, but use the distance information Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51
52 13.2 M-Trees Each node N has a region Reg(N) Reg Reg(N) = {p p U, d(p, v N ) r N } With v N as so called routing object and r N as the radius of the area ( covering radius ) All the indexed points p have guaranteed distance of at most r N from v N Queries qwith d(q, v N ) > r N + r don t need to consider node N Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52
53 13.2 M-Trees Internal nodes have A routing object The radius of their region and A distance to the parent node Leaf nodes have The values of the indexed objects and Their distance from the parent node Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53
54 13.2 M-Trees Precomputed distances to the respective parent nodes allow fast searching ( fast pruning ) d(v P, v N ) is precomputed. We don t need d(q, v N ) if d(q, v P ) d(v P, v N ) > r N + r P P N N Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54
55 13.2 M-Trees Insert is performed as by R-trees with the smallest expansion of the region radius At overflow, a split is performed No volumes are however calculated (as in MBRs in the R-tree) Delete the node and choose two new routing objects Heuristic: Minimize the maximum of the two resulting region radiuses Attribute then the routing objects to the new regions alternating between their nearest neighbors (Balanced Split) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55
56 13.2 M-Trees M-Trees overview Allow a variety of distance functions Use triangle inequality for pruning The dimensionality is also very limited Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56
57 Next lecture Indexes for Multimedia Data Curse of Dimensionality Dimension Reduction GEMINI Indexing Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57
Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More information5 Spatial Access Methods
5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and
More information5 Spatial Access Methods
5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationAN EFFICIENT INDEXING STRUCTURE FOR TRAJECTORIES IN GEOGRAPHICAL INFORMATION SYSTEMS SOWJANYA SUNKARA
AN EFFICIENT INDEXING STRUCTURE FOR TRAJECTORIES IN GEOGRAPHICAL INFORMATION SYSTEMS By SOWJANYA SUNKARA Bachelor of Science (Engineering) in Computer Science University of Madras Chennai, TamilNadu, India
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationNearest Neighbor Search with Keywords
Nearest Neighbor Search with Keywords Yufei Tao KAIST June 3, 2013 In recent years, many search engines have started to support queries that combine keyword search with geography-related predicates (e.g.,
More informationEfficient Spatial Data Structure for Multiversion Management of Engineering Drawings
Efficient Structure for Multiversion Management of Engineering Drawings Yasuaki Nakamura Department of Computer and Media Technologies, Hiroshima City University Hiroshima, 731-3194, Japan and Hiroyuki
More informationSPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM
SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors
More informationCS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.
CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring 2006 Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. Latombe Scribe: Neda Nategh How do you update the energy function during the
More informationAdvanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More informationLAYERED R-TREE: AN INDEX STRUCTURE FOR THREE DIMENSIONAL POINTS AND LINES
LAYERED R-TREE: AN INDEX STRUCTURE FOR THREE DIMENSIONAL POINTS AND LINES Yong Zhang *, Lizhu Zhou *, Jun Chen ** * Department of Computer Science and Technology, Tsinghua University Beijing, China, 100084
More informationMultimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis
Previous Lecture Multimedia Databases Texture-Based Image Retrieval Low Level Features Tamura Measure, Random Field Model High-Level Features Fourier-Transform, Wavelets Wolf-Tilo Balke Silviu Homoceanu
More informationMaking Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI
Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Making Nearest
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationNearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationLecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane
Randomized Algorithms Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized
More informationSEARCHING THROUGH SPATIAL RELATIONSHIPS USING THE 2DR-TREE
SEARCHING THROUGH SPATIAL RELATIONSHIPS USING THE DR-TREE Wendy Osborn Department of Mathematics and Computer Science University of Lethbridge 0 University Drive W Lethbridge, Alberta, Canada email: wendy.osborn@uleth.ca
More informationMultimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis
4 Shape-based Features Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 4 Multiresolution Analysis
More informationTransforming Hierarchical Trees on Metric Spaces
CCCG 016, Vancouver, British Columbia, August 3 5, 016 Transforming Hierarchical Trees on Metric Spaces Mahmoodreza Jahanseir Donald R. Sheehy Abstract We show how a simple hierarchical tree called a cover
More informationApproximate Voronoi Diagrams
CS468, Mon. Oct. 30 th, 2006 Approximate Voronoi Diagrams Presentation by Maks Ovsjanikov S. Har-Peled s notes, Chapters 6 and 7 1-1 Outline Preliminaries Problem Statement ANN using PLEB } Bounds and
More informationMultimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 4 Previous Lecture Texture-Based Image Retrieval Low
More informationData Structures and Algorithms " Search Trees!!
Data Structures and Algorithms " Search Trees!! Outline" Binary Search Trees! AVL Trees! (2,4) Trees! 2 Binary Search Trees! "! < 6 2 > 1 4 = 8 9 Ordered Dictionaries" Keys are assumed to come from a total
More informationLocation-Based R-Tree and Grid-Index for Nearest Neighbor Query Processing
ISBN 978-93-84468-20-0 Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT'2015) Singapore, March 29-30, 2015, pp. 136-142 Location-Based R-Tree and Grid-Index for
More informationCS Data Structures and Algorithm Analysis
CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter 6, part 2 R. Paul Wiegand George Mason University, Department of Computer Science March 22, 2006 Outline 1 Balanced Trees 2 Heaps &
More informationBinary Search Trees. Motivation
Binary Search Trees Motivation Searching for a particular record in an unordered list takes O(n), too slow for large lists (databases) If the list is ordered, can use an array implementation and use binary
More informationSPATIAL INDEXING. Vaibhav Bajpai
SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial
More informationMultimedia Databases. Previous Lecture Video Abstraction Video Abstraction Example 6/20/2013
Previous Lecture Multimedia Databases Hidden Markov Models (continued from last lecture) Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More information12 Review and Outlook
12 Review and Outlook 12.1 Review 12.2 Outlook http://www-kdd.isti.cnr.it/nwa Spatial Databases and GIS Karl Neumann, Sarah Tauscher Ifis TU Braunschweig 926 What are the basic functions of a geographic
More informationHW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a.
HW #4 (mostly by) Salim Sarımurat 04.12.2009 S. 1. 1. a. 1) Insert 6 2) Insert 8 3) Insert 30 4) Insert 40 2 5) Insert 50 6) Insert 61 7) Insert 70 1. b. 1) Insert 12 2) Insert 29 3) Insert 30 4) Insert
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationSpatial Database. Ahmad Alhilal, Dimitris Tsaras
Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information
More informationLecture 10 September 27, 2016
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 10 September 27, 2016 Scribes: Quinten McNamara & William Hoza 1 Overview In this lecture, we focus on constructing coresets, which are
More informationContent-based Publish/Subscribe using Distributed R-trees
Content-based Publish/Subscribe using Distributed R-trees Silvia Bianchi, Pascal Felber and Maria Gradinariu 2 University of Neuchâtel, Switzerland 2 LIP6, INRIA-Université Paris 6, France Abstract. Publish/subscribe
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10 Video Retrieval Shot Detection 10 Video Retrieval
More informationOn multi-scale display of geometric objects
Data & Knowledge Engineering 40 2002) 91±119 www.elsevier.com/locate/datak On multi-scale display of geometric objects Edward P.F. Chan *, Kevin K.W. Chow Department of Computer Science, University of
More informationSearch and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012
Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Search and Lookahead Enforcing consistency is one way of solving constraint networks:
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationEach internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.
7.5 (a, b)-trees 7.5 (a, b)-trees Definition For b a an (a, b)-tree is a search tree with the following properties. all leaves have the same distance to the root. every internal non-root vertex v has at
More informationLecture 4: Divide and Conquer: van Emde Boas Trees
Lecture 4: Divide and Conquer: van Emde Boas Trees Series of Improved Data Structures Insert, Successor Delete Space This lecture is based on personal communication with Michael Bender, 001. Goal We want
More informationSpace-Time Tradeoffs for Approximate Spherical Range Counting
Space-Time Tradeoffs for Approximate Spherical Range Counting Sunil Arya Theocharis Malamatos David M. Mount University of Maryland Technical Report CS TR 4842 and UMIACS TR 2006 57 November 2006 Abstract
More informationCollision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST
Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory
More informationMid Term-1 : Practice problems
Mid Term-1 : Practice problems These problems are meant only to provide practice; they do not necessarily reflect the difficulty level of the problems in the exam. The actual exam problems are likely to
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationMultiple-Site Distributed Spatial Query Optimization using Spatial Semijoins
11 Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy OSBORN a, 1 and Saad ZAAMOUT a a Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge,
More informationAI Programming CS S-06 Heuristic Search
AI Programming CS662-2013S-06 Heuristic Search David Galles Department of Computer Science University of San Francisco 06-0: Overview Heuristic Search - exploiting knowledge about the problem Heuristic
More informationBinary Search Trees. Lecture 29 Section Robb T. Koether. Hampden-Sydney College. Fri, Apr 8, 2016
Binary Search Trees Lecture 29 Section 19.2 Robb T. Koether Hampden-Sydney College Fri, Apr 8, 2016 Robb T. Koether (Hampden-Sydney College) Binary Search Trees Fri, Apr 8, 2016 1 / 40 1 Binary Search
More informationData Warehousing & Data Mining
9. Business Intelligence Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 9. Business Intelligence
More informationMultimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Hidden Markov Models (continued from
More informationAVL Trees. Manolis Koubarakis. Data Structures and Programming Techniques
AVL Trees Manolis Koubarakis 1 AVL Trees We will now introduce AVL trees that have the property that they are kept almost balanced but not completely balanced. In this way we have O(log n) search time
More informationAn efficient 3D R-tree spatial index method for virtual geographic environments
ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 217 224 www.elsevier.com/locate/isprsjprs An efficient 3D R-tree spatial index method for virtual geographic environments Qing Zhu a,, Jun Gong
More informationProbabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford
Probabilistic Model Checking Michaelmas Term 2011 Dr. Dave Parker Department of Computer Science University of Oxford Probabilistic model checking System Probabilistic model e.g. Markov chain Result 0.5
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationData Warehousing & Data Mining
Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Association Rule Mining Apriori
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationPavel Zezula, Giuseppe Amato,
SIMILARITY SEARCH The Metric Space Approach Pavel Zezula Giuseppe Amato Vlastislav Dohnal Michal Batko Table of Content Part I: Metric searching in a nutshell Foundations of metric space searching Survey
More informationMultimedia Databases Video Abstraction Video Abstraction Example Example 1/8/ Video Retrieval Shot Detection
10 Video Retrieval Shot Detection Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10 Video Retrieval
More informationFinite Metric Spaces & Their Embeddings: Introduction and Basic Tools
Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools Manor Mendel, CMI, Caltech 1 Finite Metric Spaces Definition of (semi) metric. (M, ρ): M a (finite) set of points. ρ a distance function
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Background and Two Challenges Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura McGill University, July 2007 1 / 29 Outline
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationSearch Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM
Search Trees Chapter 1 < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 3 - Binary Search Trees Ø A binary search
More informationP Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.
Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this
More informationDatabase Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler
Database Theory Database Theory VU 181.140, SS 2018 5. Complexity of Query Evaluation Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 17 April, 2018 Pichler
More informationExact and Approximate Flexible Aggregate Similarity Search
Noname manuscript No. (will be inserted by the editor) Exact and Approximate Flexible Aggregate Similarity Search Feifei Li, Ke Yi 2, Yufei Tao 3, Bin Yao 4, Yang Li 4, Dong Xie 4, Min Wang 5 University
More informationData Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary How to build a DW The DW Project:
More informationCost models for distance joins queries using R-trees
Data & Knowledge Engineering xxx (2005) xxx xxx www.elsevier.com/locate/datak Cost models for distance joins queries using R-trees Antonio Corral a, Yannis Manolopoulos b, *, Yannis Theodoridis c, Michael
More informationComposite Quantization for Approximate Nearest Neighbor Search
Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from
More informationExecution time analysis of a top-down R-tree construction algorithm
Information Processing Letters 101 (2007) 6 12 www.elsevier.com/locate/ipl Execution time analysis of a top-down R-tree construction algorithm Houman Alborzi, Hanan Samet Department of Computer Science
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationarxiv: v2 [cs.ds] 3 Oct 2017
Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract
More informationErik G. Hoel. Hanan Samet. Center for Automation Research. University of Maryland. November 7, Abstract
Proc. of the 21st Intl. Conf. onvery Large Data Bases, Zurich, Sept. 1995, pp. 606{618 Benchmarking Spatial Join Operations with Spatial Output Erik G. Hoel Hanan Samet Computer Science Department Center
More informationAn Algorithmist s Toolkit Nov. 10, Lecture 17
8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from
More informationSearch Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015
Search Trees < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Learning Outcomes Ø From this lecture, you should be able to: q Define the properties of a binary search
More informationDistribution-specific analysis of nearest neighbor search and classification
Distribution-specific analysis of nearest neighbor search and classification Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to information retrieval and classification.
More informationDecision Procedures An Algorithmic Point of View
An Algorithmic Point of View ILP References: Integer Programming / Laurence Wolsey Deciding ILPs with Branch & Bound Intro. To mathematical programming / Hillier, Lieberman Daniel Kroening and Ofer Strichman
More informationRESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE
RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationDatabases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)
Databases DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) References Hashing Techniques: Elmasri, 7th Ed. Chapter 16, section 8. Cormen, 3rd Ed. Chapter 11. Inverted indexing: Elmasri,
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationarxiv: v1 [cs.cg] 5 Sep 2018
Randomized Incremental Construction of Net-Trees Mahmoodreza Jahanseir Donald R. Sheehy arxiv:1809.01308v1 [cs.cg] 5 Sep 2018 Abstract Net-trees are a general purpose data structure for metric data that
More informationThe Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationEvaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments
Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,
More informationNearest Neighbor Search with Keywords: Compression
Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points
More informationIndexing Objects Moving on Fixed Networks
Indexing Objects Moving on Fixed Networks Elias Frentzos Department of Rural and Surveying Engineering, National Technical University of Athens, Zographou, GR-15773, Athens, Hellas efrentzo@central.ntua.gr
More informationA Single-Exponential Fixed-Parameter Algorithm for Distance-Hereditary Vertex Deletion
A Single-Exponential Fixed-Parameter Algorithm for Distance-Hereditary Vertex Deletion Eduard Eiben a, Robert Ganian a, O-joung Kwon b a Algorithms and Complexity Group, TU Wien, Vienna, Austria b Logic
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationData Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Association Rule Mining Apriori algorithm,
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 4: Query Optimization Query Optimization Cost estimation Strategies for exploring plans Q min CS 347 Notes 4 2 Cost Estimation Based on
More informationIE418 Integer Programming
IE418: Integer Programming Department of Industrial and Systems Engineering Lehigh University 2nd February 2005 Boring Stuff Extra Linux Class: 8AM 11AM, Wednesday February 9. Room??? Accounts and Passwords
More informationSimilarity searching, or how to find your neighbors efficiently
Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More information