Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

Size: px
Start display at page:

Download "Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann"

Transcription

1 Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich

2 Moving Objects Intersection Finding Position at a future time instance t [x = r cos(ωt) y = r sin(wt)] [x = p + ut y = q + vt] Moving Object Database r, ω p, q, u, v 1/ 15

3 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v 1 = r 2 + p 2 + q 2 + 2rp + 2rq A = 1 2 = 2[u(r-p) + v(r-q)] B = t 3 = -2rp C = 1+ sin(ωt) 4 = -2rq D = 1+ cos(ωt) 5 = -2ru E = t[1+ sin(ωt)] 6 = -2rv F = t[1+ cos(ωt)] 7 = u 2 +v 2 G = t 2 2/ 15

4 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v 1 = r 2 + p 2 + q 2 + 2rp + 2rq A = 1 2 = 2[u(r-p) + v(r-q)] B = t 3 = -2rp C = 1+ sin(ωt) 4 = -2rq D = 1+ cos(ωt) 5 = -2ru E = t[1+ sin(ωt)] 6 = -2rv F = t[1+ cos(ωt)] 7 = u 2 +v 2 G = t 2 Function (known) Query Parameters (unknown) 2/ 15

5 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 2/ 15

6 Moving Objects Intersection Finding Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 2/ 15

7 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction ARIMA Time Series Prediction Model: Heart-Rate at time t = S t + B 3/ 15

8 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction ARIMA Time Series Prediction Model: Heart-Rate at time t = S t + B Find all patients for whom the predicted heart rate at time t is more than an input threshold H. CREATE FUNCTION Critical_Patient ( INPUT double Threshold, double Time RETURN PatientID FROM Patient WHERE S Time + B H ) Unknown 3/ 15

9 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction S Time + B H Unknown Scalar Product Query (Time, 1). (S, B) H Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 4/ 15

10 More Applications: Complex SQL Function Patient ID S B P P P P Patient Dataset for Heart-Rate Prediction S Time + B H Unknown 4/ 15

11 Problem Statement Inequality Query Find all data points x that satisfy: (a, F(x)) b Applications: moving-object-intersection finding half-space range search complex SQL functions Top-k Nearest Neighbor Query Find the top-k data points x satisfying (a, F(x)) b, that also minimize: (a, F(x)) b / a Applications: top-k nearest points to hyper plane active learning 5/ 15

12 Related Work THEOR MACHINE LEARNING Half-space Range Searching - Agarwal et. al. [PODS 98], Matousek et. al. [Computational Geometry 92] Linear Constraint Queries - Goldstein et. al. [PODS 97] Nearest Neighbor Queries - Liu et. al. [ICML 12], Jain et. al. [NIPS 10] DATABASES Top-k Queries with Ranking Function - Chang et. al. [SIGMOD 00], in et. al. [SIGMOD 07], Li et. al. [SIGMOD 05], Ilyas et. al. [ACM Comp. Survey 08], Hristidis et. al. [SIGMOD 01], Ram et. al. [KDD 12] Index for Moving Objects - Nascimento et. al. [R-tree, SAC 98], Sistla et. al. [ICDE 97], Kollios et. al. [PODS 99], Saltenis et. al. [TPR-Tree, SIGMOD 00], Jensen et. al. [B x -Tree, VLDB 04], Tao et. al. [SIGMOD 02], Zhang et. al. [MBR Tree, VLDB J 12] 6/ 15

13 Related Work THEOR MACHINE LEARNING Half-space Range Searching - Agarwal et. al. [PODS 98], Matousek et. al. [Computational Geometry 92] Linear Constraint Queries - Goldstein et. al. [PODS 97] Nearest Neighbor Queries - Liu et. al. [ICML 12], Jain et. al. [NIPS 10] DATABASES Top-k Queries with Ranking Function - Chang et. al. [SIGMOD 00], in et. al. [SIGMOD 07], Li et. al. [SIGMOD 05], Ilyas et. al. [ACM Comp. Survey 08], Hristidis et. al. [SIGMOD 01], Ram et. al. [KDD 12] Index for Moving Objects - Nascimento et. al. [R-tree, SAC 98], Sistla et. al. [ICDE 97], Kollios et. al. [PODS 99], Saltenis et. al. [TPR-Tree, SIGMOD 00], Jensen et. al. [B x -Tree, VLDB 04], Tao et. al. [SIGMOD 02], Zhang et. al. [MBR Tree, VLDB J 12] 6/ 15

14 Planar Index: Geometrical Indexing x 1 x 4 x 3 x 2 x 6 x 5 Query Processing using Planar Index 7/ 15

15 Moving Objects Intersection Finding Find all object pairs that will be within distance S at time instance t A 1 + B 2 + C 3 + D 4 + E 5 + F 6 + G 7 S² Moving Object Database r, ω p, q, u, v Scalar Product Query: (A B C D E F G). ( ) S² Query Parameters (unknown) Function (known) Inequality Parameter (unknown) 7/ 15

16 Planar Index: Geometrical Indexing x 1 x 4 x 3 x 2 x 6 x 5 Query Processing using Planar Index 7/ 15

17 Planar Index: Geometrical Indexing r 1 One Planar Index = Collection of Parallel Hyper-planes Passing through the Data Points r 2 r 3 x 1 r 4 r 5 x 4 x 3 x 2 r 6 x 6 x 5 p 6 p 5 p 4 p 3 p 2 p 1 Query Processing using Planar Index 7/ 15

18 Planar Index: Geometrical Indexing Query: (a, x) b q 2 x 1 x 4 x 3 x 2 x 6 x 5 q 1 Query Processing using Planar Index 7/ 15

19 Planar Index: Geometrical Indexing r 1 Query: (a, x) b q 2 x 1 Accept x 2 q 1 p 1 Query Processing using Planar Index 7/ 15

20 Planar Index: Geometrical Indexing Query: (a, x) b q 2 Reject r 5 x 6 x 5 p 5 q 1 Query Processing using Planar Index 7/ 15

21 Planar Index: Geometrical Indexing Query: (a, x) b q 2 Verify r 3 x 4 x 3 q 1 p 3 Query Processing using Planar Index 7/ 15

22 Planar Index: Geometrical Indexing Accept Query: (a, x) b Verify x 1 x 4 x 2 Reject x 6 x 5 x 3 Query Processing using Planar Index 7/ 15

23 Planar Index: Time and Space Complexity Accept Verify Query: (a, x) b Reject Query Processing using Planar Index Index Time: O(n log n) Index Space: O(n) Query Processing Time: O(d log n + t) ~ O(d n) 8/ 15

24 Multiple Planar Indices Query: (a, x) b Multiple Planar Indices 9/ 15

25 Best Index Selection at Query Time Query: (a, x) b Accept Reject Accept Reject Verify Planar Index at Right is Better for the Given Query 9/ 15

26 Top-k Nearest Neighbor Query Query: (a, x) b; Top-k Closest Points to Query Hyper plane x 3 x 4 x 2 x 1 x 6 x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15

27 Top-k Nearest Neighbor Query Query: (a, x) b; Top-2 Closest Points to Query Hyper plane x 3 x 4 x 2 x 1 x 6 Reject x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15

28 Top-k Nearest Neighbor Query Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Verify x 3 x 4 x 6 x 2 x 1 Process 6, 5 5 Top-2 Buffer Reject x 8 x 7 x 5 Top-k Nearest Neighbor Query 10/ 15

29 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process 4 x x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15

30 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process 3 x x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15

31 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 4 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Process x 6 Top-2 Buffer x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15

32 Top-k Nearest Neighbor Query Lower Bound Distance x 3 x 4 x 2 x 1 Query: (a, x) b; Top-2 Closest Points to Query Hyper plane Prune Process x 6 Top-2 Buffer Reject x 7 x 5 x 8 Top-k Nearest Neighbor Query 10/ 15

33 List of Experiments Datasets: - Real-World: CMoment, Ctexture, Electricity Consumption - Synthetic: Independent, Correlated, Anti-Correlated List of Experiments: - Efficiency vs. No of Index - Efficiency vs. No of Dimension - Efficiency vs. Randomness of Query - Efficiency vs. Query Selectivity - Pruning Capacity vs. No of Index - Pruning Capacity vs. No of Dimension - Pruning Capacity vs. Randomness of Query - Pruning Capacity vs. Query Selectivity - Scalability of Index Building, Query Processing - Dynamic Index Updating - Memory Usage of Planar Index Experimentally Evaluated Planar Index in: Moving-Object Intersection Top-k Nearest Neighbor Query 11/ 15

34 Dataset and Query Datasets: # Data Points # Dimension # Attribute Range CMoment (Real-World) 68,040 9 ( 4.15, 4.59) Independent (Synthetic) 1,000, (1, 100) Query: Q Q Q d d 75 (Q 1 + Q Q d ) Query Selectivity Randomness of Query (QR): Q i (1,n) 12/ 15

35 Efficiency (Real-World Dataset) # Dimension = 9 # Index = ~ 4.5 times better than Baseline 13/ 15

36 Efficiency (Synthetic Dataset) # Index = 100 Dimension = 2 Dimension = 6 12 ~ 170 times better than Baseline 1.6 ~ 400 times better than Baseline 14/ 15

37 Application: Moving Object Intersection Intersection Finding among 5K 5K Moving Objects Objects moving with uniform velocity 12 ~ 50 times better than Baseline Objects moving with acceleration 27 ~ 55 times better than Baseline 15/ 15

38 Conclusion Scalar product query widely applicable Planar index one generalized index for many problems Application in moving object intersection finding Future Work: Dynamic updates in planar indices based on past query workload Software and Dataset: (Publicly Available)

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

Indexing Objects Moving on Fixed Networks

Indexing Objects Moving on Fixed Networks Indexing Objects Moving on Fixed Networks Elias Frentzos Department of Rural and Surveying Engineering, National Technical University of Athens, Zographou, GR-15773, Athens, Hellas efrentzo@central.ntua.gr

More information

TASM: Top-k Approximate Subtree Matching

TASM: Top-k Approximate Subtree Matching TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,

More information

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)

More information

Nearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University

Nearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching (NNS) S: a set of n points in R. q: any

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2

More information

PROBABILISTIC SKYLINE QUERIES OVER UNCERTAIN MOVING OBJECTS. Xiaofeng Ding, Hai Jin, Hui Xu. Wei Song

PROBABILISTIC SKYLINE QUERIES OVER UNCERTAIN MOVING OBJECTS. Xiaofeng Ding, Hai Jin, Hui Xu. Wei Song Computing and Informatics, Vol. 32, 2013, 987 1012 PROBABILISTIC SKYLINE QUERIES OVER UNCERTAIN MOVING OBJECTS Xiaofeng Ding, Hai Jin, Hui Xu Services Computing Technology and System Lab Cluster and Grid

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

k-selection Query over Uncertain Data

k-selection Query over Uncertain Data k-selection Query over Uncertain Data Xingjie Liu 1 Mao Ye 2 Jianliang Xu 3 Yuan Tian 4 Wang-Chien Lee 5 Department of Computer Science and Engineering, The Pennsylvania State University, University Park,

More information

Composite Quantization for Approximate Nearest Neighbor Search

Composite Quantization for Approximate Nearest Neighbor Search Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from

More information

Pivot Selection Techniques

Pivot Selection Techniques Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor

More information

Estimating the Selectivity of tf-idf based Cosine Similarity Predicates

Estimating the Selectivity of tf-idf based Cosine Similarity Predicates Estimating the Selectivity of tf-idf based Cosine Similarity Predicates Sandeep Tata Jignesh M. Patel Department of Electrical Engineering and Computer Science University of Michigan 22 Hayward Street,

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

K-Nearest Neighbor Temporal Aggregate Queries

K-Nearest Neighbor Temporal Aggregate Queries Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information

Enabling Accurate Analysis of Private Network Data

Enabling Accurate Analysis of Private Network Data Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University

More information

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

Spatial Database. Ahmad Alhilal, Dimitris Tsaras Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information

More information

Configuring Spatial Grids for Efficient Main Memory Joins

Configuring Spatial Grids for Efficient Main Memory Joins Configuring Spatial Grids for Efficient Main Memory Joins Farhan Tauheed, Thomas Heinis, and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne (EPFL), Imperial College London Abstract. The performance

More information

Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points

Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Timothy M. Chan Zahed Rahmati Abstract Given a set of n moving points in R d, where each point moves

More information

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Reverse Area Skyline in a Map

Reverse Area Skyline in a Map Vol., No., 7 Reverse Area Skyline in a Map Annisa Graduate School of Engineering, Hiroshima University, Japan Asif Zaman Graduate School of Engineering, Hiroshima University, Japan Yasuhiko Morimoto Graduate

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

Single-tree GMM training

Single-tree GMM training Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique

More information

Probabilistic Nearest-Neighbor Query on Uncertain Objects

Probabilistic Nearest-Neighbor Query on Uncertain Objects Probabilistic Nearest-Neighbor Query on Uncertain Objects Hans-Peter Kriegel, Peter Kunath, Matthias Renz University of Munich, Germany {kriegel, kunath, renz}@dbs.ifi.lmu.de Abstract. Nearest-neighbor

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Identifying Top k Dominating Objects over Uncertain Data

Identifying Top k Dominating Objects over Uncertain Data Identifying Top k Dominating Objects over Uncertain Data Liming Zhan, Ying Zhang, Wenjie Zhang, and Xuemin Lin University of New South Wales, Sydney NSW, Australia {zhanl,yingz,zhangw,lxue}@cse.unsw.edu.au

More information

A Survey on Spatial-Keyword Search

A Survey on Spatial-Keyword Search A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Time Series Data Cleaning

Time Series Data Cleaning Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.

More information

The τ-skyline for Uncertain Data

The τ-skyline for Uncertain Data CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation

More information

12.5 Equations of Lines and Planes

12.5 Equations of Lines and Planes 12.5 Equations of Lines and Planes Equation of Lines Vector Equation of Lines Parametric Equation of Lines Symmetric Equation of Lines Relation Between Two Lines Equations of Planes Vector Equation of

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

The Power of Asymmetry in Binary Hashing

The Power of Asymmetry in Binary Hashing The Power of Asymmetry in Binary Hashing Behnam Neyshabur Yury Makarychev Toyota Technological Institute at Chicago Russ Salakhutdinov University of Toronto Nati Srebro Technion/TTIC Search by Image Image

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Learning Dissimilarities on Time Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan Time series structure and

More information

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Elaine P. M. de Sousa, Marcela X. Ribeiro, Agma J. M. Traina, and Caetano Traina Jr. Department of Computer Science

More information

arxiv: v2 [cs.db] 5 May 2011

arxiv: v2 [cs.db] 5 May 2011 Inverse Queries For Multidimensional Spaces Thomas Bernecker 1, Tobias Emrich 1, Hans-Peter Kriegel 1, Nikos Mamoulis 2, Matthias Renz 1, Shiming Zhang 2, and Andreas Züfle 1 arxiv:113.172v2 [cs.db] May

More information

SPATIAL INDEXING. Vaibhav Bajpai

SPATIAL INDEXING. Vaibhav Bajpai SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Zhenyu Liu University of California Los Angeles, CA 995 vicliu@cs.ucla.edu Ka Cheung Sia University of California

More information

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com

More information

Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental

More information

A Streaming Algorithm for 2-Center with Outliers in High Dimensions

A Streaming Algorithm for 2-Center with Outliers in High Dimensions CCCG 2015, Kingston, Ontario, August 10 12, 2015 A Streaming Algorithm for 2-Center with Outliers in High Dimensions Behnam Hatami Hamid Zarrabi-Zadeh Abstract We study the 2-center problem with outliers

More information

The Truncated Tornado in TMBB: A Spatiotemporal Uncertainty Model for Moving Objects

The Truncated Tornado in TMBB: A Spatiotemporal Uncertainty Model for Moving Objects The : A Spatiotemporal Uncertainty Model for Moving Objects Shayma Alkobaisi 1, Petr Vojtěchovský 2, Wan D. Bae 3, Seon Ho Kim 4, Scott T. Leutenegger 4 1 College of Information Technology, UAE University,

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Streaming multiscale anomaly detection

Streaming multiscale anomaly detection Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming

More information

Machine Learning. for Image Retrieval. Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies

Machine Learning. for Image Retrieval. Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies Machine Learning for Image Retrieval Edward Chang Associate Professor, Electrical Engineering, UC Santa Barbara CTO, VIMA Technologies 4/5/2004 JHU-APL 1 Are They Similar? 4/5/2004 JHU-APL 2 Are They Similar?

More information

cient and E ective Similarity Search based on Earth Mover s Distance

cient and E ective Similarity Search based on Earth Mover s Distance 36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge

More information

Probabilistic Similarity Search for Uncertain Time Series

Probabilistic Similarity Search for Uncertain Time Series Probabilistic Similarity Search for Uncertain Time Series Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Matthias Renz Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 Munich, Germany

More information

Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments

Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments Reynold Cheng, Dmitri V. Kalashnikov Sunil Prabhakar The Hong Kong Polytechnic University, Hung Hom, Kowloon,

More information

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang / CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from

More information

Maintaining Frequent Itemsets over High-Speed Data Streams

Maintaining Frequent Itemsets over High-Speed Data Streams Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,

More information

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2 Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2,

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

SBASS: Segment based approach for subsequence searches in sequence databases

SBASS: Segment based approach for subsequence searches in sequence databases Comput Syst Sci & Eng (2007) 1-2: 37-46 2007 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering SBASS: Segment based approach for subsequence searches in sequence databases

More information

Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos

Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos Accelerating Exploratory Statistical Analysis Abdul Wasay inding Wei Niv Dayan Stratos Idreos Statistics are everywhere! Algorithms Systems Analytic Pipelines 80 Temperature 60 40 20 0 May 2017 80 Temperature

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

Spatial Keyword Search by using Point Regional QuadTree Algorithm

Spatial Keyword Search by using Point Regional QuadTree Algorithm Spatial Keyword Search by using Point Regional QuadTree Algorithm Ms Harshitha A.R 1, Smt. Christy Persya A 2 PG Scholar, Department of ISE (SSE), BNMIT, Bangalore 560085, Karnataka, India 1 Associate

More information

Nearest Neighbor Search for Relevance Feedback

Nearest Neighbor Search for Relevance Feedback Nearest Neighbor earch for Relevance Feedbac Jelena Tešić and B.. Manjunath Electrical and Computer Engineering Department University of California, anta Barbara, CA 93-9 {jelena, manj}@ece.ucsb.edu Abstract

More information

A Hybrid Prediction Model for Moving Objects

A Hybrid Prediction Model for Moving Objects A Hybrid Prediction Model for Moving Objects Hoyoung Jeung, Qing Liu, Heng Tao Shen, Xiaofang Zhou The University of Queensland, National ICT Australia (NICTA), Brisbane {hoyoung, shenht, zxf}@itee.uq.edu.au

More information

2005 Journal of Software (, ) A Spatial Index to Improve the Response Speed of WebGIS Servers

2005 Journal of Software (, ) A Spatial Index to Improve the Response Speed of WebGIS Servers 1000-9825/2005/16(05)0819 2005 Journal of Software Vol.16, No.5 WebGIS +,, (, 410073) A Spatial Index to Improve the Response Speed of WebGIS Servers YE Chang-Chun +, LUO Jin-Ping, ZHOU Xing-Ming (Institute

More information

Computing Probability Threshold Set Similarity on Probabilistic Sets

Computing Probability Threshold Set Similarity on Probabilistic Sets Computing Probability Threshold Set Similarity on Probabilistic Sets Lei Wang, Ming Gao, Rong Zhang, Cheqing Jin, Aoying Zhou SeiLWang@163.com,{mgao, rzhang, cqjin, ayzhou}@sei.ecnu.edu.cn Institute for

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

Nearest Neighbor Search with Keywords: Compression

Nearest Neighbor Search with Keywords: Compression Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points

More information

Learning Time-Series Shapelets

Learning Time-Series Shapelets Learning Time-Series Shapelets Josif Grabocka, Nicolas Schilling, Martin Wistuba and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany SIGKDD 14,

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

On Symmetric and Asymmetric LSHs for Inner Product Search

On Symmetric and Asymmetric LSHs for Inner Product Search Behnam Neyshabur Nathan Srebro Toyota Technological Institute at Chicago, Chicago, IL 6637, USA BNEYSHABUR@TTIC.EDU NATI@TTIC.EDU Abstract We consider the problem of designing locality sensitive hashes

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

University of California, Berkeley. ABSTRACT

University of California, Berkeley. ABSTRACT Jagged Bite Problem NP-Complete Construction Kris Hildrum and Megan Thomas Computer Science Division University of California, Berkeley fhildrum,mctg@cs.berkeley.edu ABSTRACT Ahyper-rectangle is commonly

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

Initial Sampling for Automatic Interactive Data Exploration

Initial Sampling for Automatic Interactive Data Exploration Initial Sampling for Automatic Interactive Data Exploration Wenzhao Liu 1, Yanlei Diao 1, and Anna Liu 2 1 College of Information and Computer Sciences, University of Massachusetts, Amherst 2 Department

More information

a Short Introduction

a Short Introduction Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong

More information

Making Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI

Making Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Making Nearest

More information

Anomaly Detection via Over-sampling Principal Component Analysis

Anomaly Detection via Over-sampling Principal Component Analysis Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Geodatabase Programming with Python John Yaist

Geodatabase Programming with Python John Yaist Geodatabase Programming with Python John Yaist DevSummit DC February 26, 2016 Washington, DC Target Audience: Assumptions Basic knowledge of Python Basic knowledge of Enterprise Geodatabase and workflows

More information

Query Optimization: Exercise

Query Optimization: Exercise Query Optimization: Exercise Session 6 Bernhard Radke November 27, 2017 Maximum Value Precedence (MVP) [1] Weighted Directed Join Graph (WDJG) Weighted Directed Join Graph (WDJG) 1000 0.05 R 1 0.005 R

More information

Mathematics 2203, Test 1 - Solutions

Mathematics 2203, Test 1 - Solutions Mathematics 220, Test 1 - Solutions F, 2010 Philippe B. Laval Name 1. Determine if each statement below is True or False. If it is true, explain why (cite theorem, rule, property). If it is false, explain

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Conquering the Complexity of Time: Machine Learning for Big Time Series Data

Conquering the Complexity of Time: Machine Learning for Big Time Series Data Conquering the Complexity of Time: Machine Learning for Big Time Series Data Yan Liu Computer Science Department University of Southern California Mini-Workshop on Theoretical Foundations of Cyber-Physical

More information

Universal Trajectory Queries for Moving Object Databases

Universal Trajectory Queries for Moving Object Databases Universal Trajectory Queries for Moving Object Databases Hoda M. O. Mokhtar Jianwen Su Department of Computer Science University of California at Santa Barbara {hmokhtar, su}@cs.ucsb.edu Abstract In this

More information

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces RC24859 (W0909-03) September 0, 2009 Computer Science IBM Research Report A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces George Saon, Peder Olsen

More information

Location-Based R-Tree and Grid-Index for Nearest Neighbor Query Processing

Location-Based R-Tree and Grid-Index for Nearest Neighbor Query Processing ISBN 978-93-84468-20-0 Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT'2015) Singapore, March 29-30, 2015, pp. 136-142 Location-Based R-Tree and Grid-Index for

More information

Issues Concerning Dimensionality and Similarity Search

Issues Concerning Dimensionality and Similarity Search Issues Concerning Dimensionality and Similarity Search Jelena Tešić, Sitaram Bhagavathy, and B. S. Manjunath Electrical and Computer Engineering Department University of California, Santa Barbara, CA 936-956

More information

Linear Spectral Hashing

Linear Spectral Hashing Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to

More information

Greedy Maximization Framework for Graph-based Influence Functions

Greedy Maximization Framework for Graph-based Influence Functions Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)

More information

Optimal Color Range Reporting in One Dimension

Optimal Color Range Reporting in One Dimension Optimal Color Range Reporting in One Dimension Yakov Nekrich 1 and Jeffrey Scott Vitter 1 The University of Kansas. yakov.nekrich@googlemail.com, jsv@ku.edu Abstract. Color (or categorical) range reporting

More information

Flexible Aggregate Similarity Search

Flexible Aggregate Similarity Search Flexible Aggregate Similarity Search Yang Li Feifei Li 2 Ke Yi 3 Bin Yao 2 Min Wang 4 rainfallen@sjtu.edu.cn, {lifeifei, yao}@cs.fsu.edu 2, yike@cse.ust.hk 3, min.wang6@hp.com 4 Shanghai JiaoTong University,

More information