Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann
|
|
- Rosamond Wilkins
- 5 years ago
- Views:
Transcription
1 Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich
2 Moving Objects Intersection Finding Position at a future time instance t [x = r cos(ωt) y = r sin(wt)] [x = p + ut y = q + vt] Moving Object Database r, ω p, q, u, v 1/ 15
3 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v 1 = r 2 + p 2 + q 2 + 2rp + 2rq A = 1 2 = 2[u(r-p) + v(r-q)] B = t 3 = -2rp C = 1+ sin(ωt) 4 = -2rq D = 1+ cos(ωt) 5 = -2ru E = t[1+ sin(ωt)] 6 = -2rv F = t[1+ cos(ωt)] 7 = u 2 +v 2 G = t 2 2/ 15
4 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v 1 = r 2 + p 2 + q 2 + 2rp + 2rq A = 1 2 = 2[u(r-p) + v(r-q)] B = t 3 = -2rp C = 1+ sin(ωt) 4 = -2rq D = 1+ cos(ωt) 5 = -2ru E = t[1+ sin(ωt)] 6 = -2rv F = t[1+ cos(ωt)] 7 = u 2 +v 2 G = t 2 Function (known) Query Parameters (unknown) 2/ 15
5 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 2/ 15
6 Moving Objects Intersection Finding Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 2/ 15
7 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction ARIMA Time Series Prediction Model: Heart-Rate at time t = S t + B 3/ 15
8 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction ARIMA Time Series Prediction Model: Heart-Rate at time t = S t + B Find all patients for whom the predicted heart rate at time t is more than an input threshold H. CREATE FUNCTION Critical_Patient ( INPUT double Threshold, double Time RETURN PatientID FROM Patient WHERE S Time + B H ) Unknown 3/ 15
9 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction S Time + B H Unknown Scalar Product Query (Time, 1). (S, B) H Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 4/ 15
10 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction S Time + B H Unknown 4/ 15
11 Problem Statement Inequality Query Find all data points x that satisfy: (a, F(x)) b Applications: moving-object-intersection finding half-space range search complex SQL functions Top-k Nearest Neighbor Query Find the top-k data points x satisfying (a, F(x)) b, that also minimize: (a, F(x)) b / a Applications: top-k nearest points to hyper plane active learning 5/ 15
12 Related Work THEOR MACHINE LEARNING Half-space Range Searching - Agarwal et. al. [PODS 98], Matousek et. al. [Computational Geometry 92] Linear Constraint Queries - Goldstein et. al. [PODS 97] Nearest Neighbor Queries - Liu et. al. [ICML 12], Jain et. al. [NIPS 10] DATABASES Top-k Queries with Ranking Function - Chang et. al. [SIGMOD 00], in et. al. [SIGMOD 07], Li et. al. [SIGMOD 05], Ilyas et. al. [ACM Comp. Survey 08], Hristidis et. al. [SIGMOD 01], Ram et. al. [KDD 12] Index for Moving Objects - Nascimento et. al. [R-tree, SAC 98], Sistla et. al. [ICDE 97], Kollios et. al. [PODS 99], Saltenis et. al. [TPR-Tree, SIGMOD 00], Jensen et. al. [B x -Tree, VLDB 04], Tao et. al. [SIGMOD 02], Zhang et. al. [MBR Tree, VLDB J 12] 6/ 15
13 Related Work THEOR MACHINE LEARNING Half-space Range Searching - Agarwal et. al. [PODS 98], Matousek et. al. [Computational Geometry 92] Linear Constraint Queries - Goldstein et. al. [PODS 97] Nearest Neighbor Queries - Liu et. al. [ICML 12], Jain et. al. [NIPS 10] DATABASES Top-k Queries with Ranking Function - Chang et. al. [SIGMOD 00], in et. al. [SIGMOD 07], Li et. al. [SIGMOD 05], Ilyas et. al. [ACM Comp. Survey 08], Hristidis et. al. [SIGMOD 01], Ram et. al. [KDD 12] Index for Moving Objects - Nascimento et. al. [R-tree, SAC 98], Sistla et. al. [ICDE 97], Kollios et. al. [PODS 99], Saltenis et. al. [TPR-Tree, SIGMOD 00], Jensen et. al. [B x -Tree, VLDB 04], Tao et. al. [SIGMOD 02], Zhang et. al. [MBR Tree, VLDB J 12] 6/ 15
14 Planar Index: Geometrical Indexing x 1 x 4 x 3 x 2 x 6 x 5 Query Processing using Planar Index 7/ 15
15 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 7/ 15
16 Planar Index: Geometrical Indexing x 1 x 4 x 3 x 2 x 6 x 5 Query Processing using Planar Index 7/ 15
17 Planar Index: Geometrical Indexing r 1 One Planar Index = Collection of Parallel Hyper-planes Passing through the Data Points r 2 r 3 x 1 r 4 r 5 x 4 x 3 x 2 r 6 x 6 x 5 p 6 p 5 p 4 p 3 p 2 p 1 Query Processing using Planar Index 7/ 15
18 Planar Index: Geometrical Indexing Query: (a, x) b q 2 x 1 x 4 x 3 x 2 x 6 x 5 q 1 Query Processing using Planar Index 7/ 15
19 Planar Index: Geometrical Indexing r 1 Query: (a, x) b q 2 x 1 Accept x 2 q 1 p 1 Query Processing using Planar Index 7/ 15
20 Planar Index: Geometrical Indexing Query: (a, x) b q 2 Reject r 5 x 6 x 5 p 5 q 1 Query Processing using Planar Index 7/ 15
21 Planar Index: Geometrical Indexing Query: (a, x) b q 2 Verify r 3 x 4 x 3 q 1 p 3 Query Processing using Planar Index 7/ 15
22 Planar Index: Geometrical Indexing Accept Query: (a, x) b Verify x 1 x 4 x 2 Reject x 6 x 5 x 3 Query Processing using Planar Index 7/ 15
23 Planar Index: Time and Space Complexity Accept Verify Query: (a, x) b Reject Query Processing using Planar Index Index Time: O(n log n) Index Space: O(n) Query Processing Time: O(d log n + t) ~ O(d n) 8/ 15
24 Multiple Planar Indices Query: (a, x) b Multiple Planar Indices 9/ 15
25 Best Index Selection at Query Time Query: (a, x) b Accept Reject Accept Reject Verify Planar Index at Right is Better for the Given Query 9/ 15
26 Top-k Nearest Neighbor Query Query: (a, x) b; Top-k Closest Points to Query Hyper plane x 3 x 4 x 2 x 1 x 6 x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15
27 Top-k Nearest Neighbor Query Query: (a, x) b; Top-2 Closest Points to Query Hyper plane x 3 x 4 x 2 x 1 x 6 Reject x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15
28 Top-k Nearest Neighbor Query Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Verify x 3 x 4 x 6 x 2 x 1 Process 6, 5 5 Top-2 Buffer Reject x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15
29 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process 4 x x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15
30 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process 3 x x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15
31 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 4 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15
32 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 4 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Prune Process x 6 Top-2 Buffer Reject x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15
33 List of Experiments Datasets: - Real-World: CMoment, Ctexture, Electricity Consumption - Synthetic: Independent, Correlated, Anti-Correlated List of Experiments: - Efficiency vs. No of Index - Efficiency vs. No of Dimension - Efficiency vs. Randomness of Query - Efficiency vs. Query Selectivity - Pruning Capacity vs. No of Index - Pruning Capacity vs. No of Dimension - Pruning Capacity vs. Randomness of Query - Pruning Capacity vs. Query Selectivity - Scalability of Index Building, Query Processing - Dynamic Index Updating - Memory Usage of Planar Index Experimentally Evaluated Planar Index in: Moving-Object Intersection Top-k Nearest Neighbor Query 11/ 15
34 Dataset and Query Datasets: # Data Points # Dimension # Attribute Range CMoment (Real-World) 68,040 9 ( 4.15, 4.59) Independent (Synthetic) 1,000, (1, 100) Query: Q Q Q d d 75 (Q 1 + Q Q d ) Query Selectivity Randomness of Query (QR): Q i (1,n) 12/ 15
35 Efficiency (Real-World Dataset) # Dimension = 9 # Index = ~ 4.5 times better than Baseline 13/ 15
36 Efficiency (Synthetic Dataset) # Index = 100 Dimension = 2 Dimension = 6 12 ~ 170 times better than Baseline 1.6 ~ 400 times better than Baseline 14/ 15
37 Application: Moving Object Intersection Intersection Finding among 5K 5K Moving Objects Objects moving with uniform velocity 12 ~ 50 times better than Baseline Objects moving with acceleration 27 ~ 55 times better than Baseline 15/ 15
38 Conclusion Scalar product query widely applicable Planar index one generalized index for many problems Application in moving object intersection finding Future Work: Dynamic updates in planar indices based on past query workload Software and Dataset: (Publicly Available)
Nearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationIndexing Objects Moving on Fixed Networks
Indexing Objects Moving on Fixed Networks Elias Frentzos Department of Rural and Surveying Engineering, National Technical University of Athens, Zographou, GR-15773, Athens, Hellas efrentzo@central.ntua.gr
More informationTASM: Top-k Approximate Subtree Matching
TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,
More informationSemantics of Ranking Queries for Probabilistic Data and Expected Ranks
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)
More informationNearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University
Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching (NNS) S: a set of n points in R. q: any
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationPROBABILISTIC SKYLINE QUERIES OVER UNCERTAIN MOVING OBJECTS. Xiaofeng Ding, Hai Jin, Hui Xu. Wei Song
Computing and Informatics, Vol. 32, 2013, 987 1012 PROBABILISTIC SKYLINE QUERIES OVER UNCERTAIN MOVING OBJECTS Xiaofeng Ding, Hai Jin, Hui Xu Services Computing Technology and System Lab Cluster and Grid
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationk-selection Query over Uncertain Data
k-selection Query over Uncertain Data Xingjie Liu 1 Mao Ye 2 Jianliang Xu 3 Yuan Tian 4 Wang-Chien Lee 5 Department of Computer Science and Engineering, The Pennsylvania State University, University Park,
More informationComposite Quantization for Approximate Nearest Neighbor Search
Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from
More informationPivot Selection Techniques
Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor
More informationEstimating the Selectivity of tf-idf based Cosine Similarity Predicates
Estimating the Selectivity of tf-idf based Cosine Similarity Predicates Sandeep Tata Jignesh M. Patel Department of Electrical Engineering and Computer Science University of Michigan 22 Hayward Street,
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationK-Nearest Neighbor Temporal Aggregate Queries
Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More informationEnabling Accurate Analysis of Private Network Data
Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University
More informationSpatial Database. Ahmad Alhilal, Dimitris Tsaras
Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information
More informationConfiguring Spatial Grids for Efficient Main Memory Joins
Configuring Spatial Grids for Efficient Main Memory Joins Farhan Tauheed, Thomas Heinis, and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne (EPFL), Imperial College London Abstract. The performance
More informationApproximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points
Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Timothy M. Chan Zahed Rahmati Abstract Given a set of n moving points in R d, where each point moves
More informationGeometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat
Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationReverse Area Skyline in a Map
Vol., No., 7 Reverse Area Skyline in a Map Annisa Graduate School of Engineering, Hiroshima University, Japan Asif Zaman Graduate School of Engineering, Hiroshima University, Japan Yasuhiko Morimoto Graduate
More informationSPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM
SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors
More informationSingle-tree GMM training
Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique
More informationProbabilistic Nearest-Neighbor Query on Uncertain Objects
Probabilistic Nearest-Neighbor Query on Uncertain Objects Hans-Peter Kriegel, Peter Kunath, Matthias Renz University of Munich, Germany {kriegel, kunath, renz}@dbs.ifi.lmu.de Abstract. Nearest-neighbor
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationIdentifying Top k Dominating Objects over Uncertain Data
Identifying Top k Dominating Objects over Uncertain Data Liming Zhan, Ying Zhang, Wenjie Zhang, and Xuemin Lin University of New South Wales, Sydney NSW, Australia {zhanl,yingz,zhangw,lxue}@cse.unsw.edu.au
More informationA Survey on Spatial-Keyword Search
A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationTime Series Data Cleaning
Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.
More informationThe τ-skyline for Uncertain Data
CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation
More information12.5 Equations of Lines and Planes
12.5 Equations of Lines and Planes Equation of Lines Vector Equation of Lines Parametric Equation of Lines Symmetric Equation of Lines Relation Between Two Lines Equations of Planes Vector Equation of
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationThe Power of Asymmetry in Binary Hashing
The Power of Asymmetry in Binary Hashing Behnam Neyshabur Yury Makarychev Toyota Technological Institute at Chicago Russ Salakhutdinov University of Toronto Nati Srebro Technion/TTIC Search by Image Image
More informationFinding Frequent Items in Probabilistic Data
Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008
More informationMachine Learning on temporal data
Machine Learning on temporal data Learning Dissimilarities on Time Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan Time series structure and
More informationTracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules
Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Elaine P. M. de Sousa, Marcela X. Ribeiro, Agma J. M. Traina, and Caetano Traina Jr. Department of Computer Science
More informationarxiv: v2 [cs.db] 5 May 2011
Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 arxiv:113.172v2 [cs.db] May
More informationSPATIAL INDEXING. Vaibhav Bajpai
SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial
More informationAssignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran
Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A
More informationCost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty
Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Zhenyu Liu University of California Los Angeles, CA 995 vicliu@cs.ucla.edu Ka Cheung Sia University of California
More informationPredictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data
Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com
More informationWindow-aware Load Shedding for Aggregation Queries over Data Streams
Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental
More informationA Streaming Algorithm for 2-Center with Outliers in High Dimensions
CCCG 2015, Kingston, Ontario, August 10 12, 2015 A Streaming Algorithm for 2-Center with Outliers in High Dimensions Behnam Hatami Hamid Zarrabi-Zadeh Abstract We study the 2-center problem with outliers
More informationThe Truncated Tornado in TMBB: A Spatiotemporal Uncertainty Model for Moving Objects
The : A Spatiotemporal Uncertainty Model for Moving Objects Shayma Alkobaisi 1, Petr Vojtěchovský 2, Wan D. Bae 3, Seon Ho Kim 4, Scott T. Leutenegger 4 1 College of Information Technology, UAE University,
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationStreaming multiscale anomaly detection
Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming
More informationMachine Learning. for Image Retrieval. Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies
Machine Learning for Image Retrieval Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies 4/5/2004 JHU-APL 1 Are They Similar? 4/5/2004 JHU-APL 2 Are They Similar?
More informationcient and E ective Similarity Search based on Earth Mover s Distance
36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge
More informationProbabilistic Similarity Search for Uncertain Time Series
Probabilistic Similarity Search for Uncertain Time Series Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Matthias Renz Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 Munich, Germany
More informationEvaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments
Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,
More informationCMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /
CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from
More informationMaintaining Frequent Itemsets over High-Speed Data Streams
Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,
More informationClassification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2
Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2,
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationSBASS: Segment based approach for subsequence searches in sequence databases
Comput Syst Sci & Eng (2007) 1-2: 37-46 2007 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering SBASS: Segment based approach for subsequence searches in sequence databases
More informationData Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos
Accelerating Exploratory Statistical Analysis Abdul Wasay inding Wei Niv Dayan Stratos Idreos Statistics are everywhere! Algorithms Systems Analytic Pipelines 80 Temperature 60 40 20 0 May 2017 80 Temperature
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationSpatial Keyword Search by using Point Regional QuadTree Algorithm
Spatial Keyword Search by using Point Regional QuadTree Algorithm Ms Harshitha A.R 1, Smt. Christy Persya A 2 PG Scholar, Department of ISE (SSE), BNMIT, Bangalore 560085, Karnataka, India 1 Associate
More informationNearest Neighbor Search for Relevance Feedback
Nearest Neighbor earch for Relevance Feedbac Jelena Tešić and B.. Manjunath Electrical and Computer Engineering Department University of California, anta Barbara, CA 93-9 {jelena, manj}@ece.ucsb.edu Abstract
More informationA Hybrid Prediction Model for Moving Objects
A Hybrid Prediction Model for Moving Objects Hoyoung Jeung, Qing Liu, Heng Tao Shen, Xiaofang Zhou The University of Queensland, National ICT Australia (NICTA), Brisbane {hoyoung, shenht, zxf}@itee.uq.edu.au
More information2005 Journal of Software (, ) A Spatial Index to Improve the Response Speed of WebGIS Servers
1000-9825/2005/16(05)0819 2005 Journal of Software Vol.16, No.5 WebGIS +,, (, 410073) A Spatial Index to Improve the Response Speed of WebGIS Servers YE Chang-Chun +, LUO Jin-Ping, ZHOU Xing-Ming (Institute
More informationComputing Probability Threshold Set Similarity on Probabilistic Sets
Computing Probability Threshold Set Similarity on Probabilistic Sets Lei Wang, Ming Gao, Rong Zhang, Cheqing Jin, Aoying Zhou SeiLWang@163.com,{mgao, rzhang, cqjin, ayzhou}@sei.ecnu.edu.cn Institute for
More informationDot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics
Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is
More informationNearest Neighbor Search with Keywords: Compression
Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points
More informationLearning Time-Series Shapelets
Learning Time-Series Shapelets Josif Grabocka, Nicolas Schilling, Martin Wistuba and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany SIGKDD 14,
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationP leiades: Subspace Clustering and Evaluation
P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de
More informationOn Symmetric and Asymmetric LSHs for Inner Product Search
Behnam Neyshabur Nathan Srebro Toyota Technological Institute at Chicago, Chicago, IL 6637, USA BNEYSHABUR@TTIC.EDU NATI@TTIC.EDU Abstract We consider the problem of designing locality sensitive hashes
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More informationUniversity of California, Berkeley. ABSTRACT
Jagged Bite Problem NP-Complete Construction Kris Hildrum and Megan Thomas Computer Science Division University of California, Berkeley fhildrum,mctg@cs.berkeley.edu ABSTRACT Ahyper-rectangle is commonly
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More informationInitial Sampling for Automatic Interactive Data Exploration
Initial Sampling for Automatic Interactive Data Exploration Wenzhao Liu 1, Yanlei Diao 1, and Anna Liu 2 1 College of Information and Computer Sciences, University of Massachusetts, Amherst 2 Department
More informationa Short Introduction
Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong
More informationMaking Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI
Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Making Nearest
More informationAnomaly Detection via Over-sampling Principal Component Analysis
Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationRemoving trivial associations in association rule discovery
Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationGeodatabase Programming with Python John Yaist
Geodatabase Programming with Python John Yaist DevSummit DC February 26, 2016 Washington, DC Target Audience: Assumptions Basic knowledge of Python Basic knowledge of Enterprise Geodatabase and workflows
More informationQuery Optimization: Exercise
Query Optimization: Exercise Session 6 Bernhard Radke November 27, 2017 Maximum Value Precedence (MVP) [1] Weighted Directed Join Graph (WDJG) Weighted Directed Join Graph (WDJG) 1000 0.05 R 1 0.005 R
More informationMathematics 2203, Test 1 - Solutions
Mathematics 220, Test 1 - Solutions F, 2010 Philippe B. Laval Name 1. Determine if each statement below is True or False. If it is true, explain why (cite theorem, rule, property). If it is false, explain
More informationNormalization Techniques in Training of Deep Neural Networks
Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationConquering the Complexity of Time: Machine Learning for Big Time Series Data
Conquering the Complexity of Time: Machine Learning for Big Time Series Data Yan Liu Computer Science Department University of Southern California Mini-Workshop on Theoretical Foundations of Cyber-Physical
More informationUniversal Trajectory Queries for Moving Object Databases
Universal Trajectory Queries for Moving Object Databases Hoda M. O. Mokhtar Jianwen Su Department of Computer Science University of California at Santa Barbara {hmokhtar, su}@cs.ucsb.edu Abstract In this
More informationIBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces
RC24859 (W0909-03) September 0, 2009 Computer Science IBM Research Report A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces George Saon, Peder Olsen
More informationLocation-Based R-Tree and Grid-Index for Nearest Neighbor Query Processing
ISBN 978-93-84468-20-0 Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT'2015) Singapore, March 29-30, 2015, pp. 136-142 Location-Based R-Tree and Grid-Index for
More informationIssues Concerning Dimensionality and Similarity Search
Issues Concerning Dimensionality and Similarity Search Jelena Tešić, Sitaram Bhagavathy, and B. S. Manjunath Electrical and Computer Engineering Department University of California, Santa Barbara, CA 936-956
More informationLinear Spectral Hashing
Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to
More informationGreedy Maximization Framework for Graph-based Influence Functions
Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)
More informationOptimal Color Range Reporting in One Dimension
Optimal Color Range Reporting in One Dimension Yakov Nekrich 1 and Jeffrey Scott Vitter 1 The University of Kansas. yakov.nekrich@googlemail.com, jsv@ku.edu Abstract. Color (or categorical) range reporting
More informationFlexible Aggregate Similarity Search
Flexible Aggregate Similarity Search Yang Li Feifei Li 2 Ke Yi 3 Bin Yao 2 Min Wang 4 rainfallen@sjtu.edu.cn, {lifeifei, yao}@cs.fsu.edu 2, yike@cse.ust.hk 3, min.wang6@hp.com 4 Shanghai JiaoTong University,
More information