Anticipatory DTW for Efficient Similarity Search in Time Series Databases

Size: px
Start display at page:

Download "Anticipatory DTW for Efficient Similarity Search in Time Series Databases"

Transcription

1 Anticipatory DTW for Efficient Similarity Search in Time Series Databases Ira Assent Marc Wichterich Ralph Krieger Hardy Kremer Thomas Seidl RWTH Aachen University, Germany Aalborg University, Denmark VLDB 2009 Lyon, France

2 Overview 1 Introduction 2 Dynamic Time Warping 3 Anticipatory pruning 4 Experiments 5 Conclusion Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 2 / 21

3 Time series similarity search Time series Sequence of time related values Stock data, sensor data, EEG measurements, climate data,... value (t i ) [mv] time series from EEG data set point in time (i) Similarity search Find time series with similar patterns over time Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 3 / 21

4 Dynamic Time Warping (DTW) Euclidean Distance Most widely used distance functions: Euclidean distance and Dynamic Time Warping Dynamic Time Warping allows scaling and stretching for better alignment, but computationally costly Euclidean Distance Dynamic Time Warping Figure: Euclidean distance (left) and Dynamic Time Warping (right) Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 4 / 21

5 DTW definition k-band DTW DTW ([s 1,..., s n ], [t 1,..., t m ]) = DTW ([s 1,..., s n 1 ], [t 1,..., t m 1 ]) dist band (s n, t m ) + min DTW ([s 1,..., s n ], [t 1,..., t m 1 ]) DTW ([s 1,..., s n 1 ], [t 1,..., t m ]) with dist band (s i, t j ) = { dist(s i, t j ) i j n m k else DTW (, ) = 0, DTW (x, ) =, DTW (, y) = Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 5 / 21

6 DTW Computation t j s i dist(s i, t j ) + min{c i-1,j-1,c i,j-1,c i-1,j } Figure: Cumulative warping matrix Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 6 / 21

7 Existing DTW algorithms DTW is computationally expensive Many approaches use multistep filter-and-refine architecture If filter lower bounds DTW lossless Different filters have been proposed and achieve substantial speed-ups query index (filter) candidates refinement database (exact) result Figure: Multistep filter-and-refine architecture Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 7 / 21

8 DTW properties for speed-up DTW is incremental. For any cumulative DTW matrix C = [c i,j ], the column minima are monotonically non-decreasing: min i=1,...,n {c i,x } min i=1,...,n {c i,y } for x < y. Existing approach: early stopping /early abandon compute DTW cumulative matrix after each filled column (band), check threshold for pruning Not as tight as it could be new anticipatory pruning tj si dist(s i,t j) + min{c i-1,j-1,c i,j-1,c i-1,j} Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 8 / 21

9 Anticpatory pruning query q data DTW filter 1 AP n candidatesap result anticipatory pruning AP 2 Figure: Anticipatory pruning in multistep filter-and-refine Multistep with anticipatory pruning Benefit from work already done in filter step Low overhead, substantial speed-up Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 9 / 21

10 General idea Anticipatory pruning: As in early stopping, compute DTW incrementally Additionally: re-use filter information to anticipate full DTW (no additional computation necessary!) Requires: filter for remainder of time series We characterize a class of filters to construct such anticipation: piecewise DTW property: reversible DTW estimate DTW estimate DTW estimate Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 10 / 21

11 Piecewise filter Piecewise DTW lower bound. A piecewise lower bounding filter for the DTW distance is a set f = {f 0,..., f m } with the following property: j = 0 : f j (s, t) = 0 j > 0 : f j (s, t) min DTW ([s 1,..., s i ], [t 1,..., t j ]) (i,j) band j Piecewise is the property of the filter that complements the incremental nature of DTW DTW estimate DTW estimate DTW estimate Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 11 / 21

12 DTW is reversible DTW is reversible. For any two time series [s 1,..., s n ] and [t 1,..., t m ] their DTW distance is the same as for the reversed time series: DTW ([s 1,..., s n ],[t 1,..., t m ]) = DTW ([s n,..., s 1 ],[t m,..., t 1 ]) Reversible is the DTW property that allows alteration of the time series direction between filter and DTW DTW estimate DTW estimate DTW estimate Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 12 / 21

13 Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion [5, 3, 4, 5, 3, 4, 8, 3, 9, Anticipatory pruning Anticipatory Pruning Distance. q Given two time series s and t of length 9 n and m, a cumulative distance matrix C = [c i,j ], a piecewise lower bounding filter f for reversed time series, the j th anticipatory pruning is AP j (s, step t) j: 0 := 1 2 min 3 4 {c 5 6 i,j} + 7 f 8 m j (s 9 10, t ). min{}: 0 2 5i=1,...,n f : AP: DTW AP=min{} + f distance pruning threshold 10 f: anticipatory part 5 on inverted time series min{}: early stopping part 0 step j: Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 13 / 21

14 Example [5, 3, 4, 5, 3, 4, 8, 3, 9, 10] step j: min{}: f : AP: DTW 35 Figure: Anticipatory pruning example 30 q t [3, 8, 2, 0, 2, 6, 6, 4, 0, 2] AP=min{} + f Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 14 / 21 35

15 Anticipatory pruning is lossless Theorem: Anticipatory Pruning lower bounds DTW. Anticipatory pruning between two time series s, t with respect to a bandwidth k and a lower bounding filter f lower bounds the DTW: Proof sketch AP j (s, t) DTW (s, t) j {1,..., n} Anticipatory pruning: series of partial DTW paths plus lower bound estimate of the remainder from the previous filter step (1) For each possible step r, 1 r n, column minima lower bound the true path (2) DTW is reversible (3) Combine (1),(2): lower bound of DTW Lower bounding means: lossless pruning, i.e. speed-up + no loss of accuracy Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 15 / 21

16 Existing piecewise lower bounds Linearization LB Keogh : Euclidean distance to envelopes (upper, lower bound of segments) [Keogh, VLDB 2002; Zhu/Shasha, SIGMOD 2003] Corner boundaries LB Hybrid : piecewise corner-like shapes in the warping matrix through which every warping path has to pass [Zhou/Wong, ICDE 2007] Path Approximation FTW (Fast search method for Dynamic Time Warping): go from coarser DTW (less costly) to finer as needed: [Sakurai/Yoshikawa/Faloutsos, PODS 2005] are all piecewise lower bounds as required for anticipatory pruning (details in paper) Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 16 / 21

17 Experimental setup Synthetic and real world data sets SignLanguage: 1, 400 multivariate (11 attributes) of sign language finger tracking data [1], length 64 to 512 TRECVid: 650 to 2, 000 benchmark data [2] NEWSVid: 2, 000 to 8, 000 TV news recorded at 30 fps (20 attributes), length 64 to 2048 Random walk RW1/RW2: zero normalized time series of length 512 and of cardinality 10, 000 (1 to 50 attributes) 40 time series generated via RW time series generated via RW2 value (t i ) value (t i ) (a) RW1 t (i+1)j point = in time t ij + (i) N(0, 1) (b) RW2 t (i+1)j = tpoint ij + N(t in time ij (i) t (i 1)j, 1) [1] waleed/tml/data/ [2] Smeaton/Over/Kraaij, Evaluation campaigns and TRECVid, MIR 2006 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 17 / 21

18 relative Improvement [percent] Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments number of attributes AP_FTW ES_FTW AP_LBKeogh ES_LBKeogh AP_LBHybrid ES_LBHybrid Figure: Relative improvement (#calc.) for varying number of attributes on RW2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 18 / 21

19 relative Improvement [percent] Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments AP_LBKeogh ES_LBKeogh AP_FTW ES_FTW AP_LBHybrid ES_LBHybrid k Figure: Efficiency improvement (#calc.) for varying DTW bandwidths on NEWSVid Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 19 / 21

20 average query time [s] log. scale average query time [s] log. scale average query time [s] log. scale Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments LBKeogh ES_LBKeogh AP_LBKeogh reduced length FTW ES_FTW AP_FTW reduced length LBHybrid ES_LBHybrid AP_LBHybrid reduced length Figure: (log. scale) Absolute improvement (average query time), reduction, RW2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 20 / 21

21 Conclusion Anticipatory pruning Speed up DTW (Dynamic Time Warping) Widely used distance function for time series similarity search Our novel anticipatory pruning makes best use of A family of multistep filter-and-refine approaches Compute an estimated overall DTW distance from already available filter information: series of lower bounds of the DTW query q data DTW filter 1 AP n candidatesap result anticipatory pruning Experiments demonstrate substantially reduced runtime AP can be flexibly combined with existing and future DTW lower bounds AP is orthogonal to speed-up via indexing etc. AP 2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 21 / 21

22 Conclusion Anticipatory pruning Speed up DTW (Dynamic Time Warping) Widely used distance function for time series similarity search Our novel anticipatory pruning makes best use of A family of multistep filter-and-refine approaches Compute an estimated overall DTW distance from already available filter information: series of lower bounds of the DTW query q data DTW filter 1 AP n candidatesap result anticipatory pruning Experiments demonstrate substantially reduced runtime AP can be flexibly combined with existing and future DTW lower bounds AP is orthogonal to speed-up via indexing etc. AP 2 Thank you for your attention. Questions? Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 21 / 21

Less is More: Non-Redundant Subspace Clustering

Less is More: Non-Redundant Subspace Clustering Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel Müller Stephan Günnemann Ralph Krieger Thomas Seidl Aalborg University, Denmark DATA MANAGEMENT AND DATA EXPLORATION GROUP RWTH Prof.

More information

cient and E ective Similarity Search based on Earth Mover s Distance

cient and E ective Similarity Search based on Earth Mover s Distance 36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge

More information

Dynamic Time Warping

Dynamic Time Warping Dynamic Time Warping Steven Elsworth School of Mathematics, University of Manchester Friday 29 th September, 2017 S. Elsworth Dynamic Time Warping 1 / 19 What is a time series? A time series is a sequence

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

Probabilistic Similarity Search for Uncertain Time Series

Probabilistic Similarity Search for Uncertain Time Series Probabilistic Similarity Search for Uncertain Time Series Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Matthias Renz Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 Munich, Germany

More information

Configuring Spatial Grids for Efficient Main Memory Joins

Configuring Spatial Grids for Efficient Main Memory Joins Configuring Spatial Grids for Efficient Main Memory Joins Farhan Tauheed, Thomas Heinis, and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne (EPFL), Imperial College London Abstract. The performance

More information

Efficient search of the best warping window for Dynamic Time Warping

Efficient search of the best warping window for Dynamic Time Warping Efficient search of the best warping window for Dynamic Time Warping Chang Wei Tan 1 Matthieu Herrmann 1 Germain Geoffrey I. Forestier 2,1 Webb 1 François Petitjean 1 1 Faculty of IT, Monash University,

More information

Time-Series Analysis Prediction Similarity between Time-series Symbolic Approximation SAX References. Time-Series Streams

Time-Series Analysis Prediction Similarity between Time-series Symbolic Approximation SAX References. Time-Series Streams Time-Series Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Time-Series Analysis 2 Prediction Filters Neural Nets 3 Similarity between Time-series Euclidean Distance

More information

ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information

ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information Anca Maria Ivanescu, Marc Wichterich, and Thomas Seidl Data Management and Data Exploration Group Informatik

More information

Uncertain Time-Series Similarity: Return to the Basics

Uncertain Time-Series Similarity: Return to the Basics Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects;

More information

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series

More information

Constraint-based Subspace Clustering

Constraint-based Subspace Clustering Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Nearest Neighbor Search for Relevance Feedback

Nearest Neighbor Search for Relevance Feedback Nearest Neighbor earch for Relevance Feedbac Jelena Tešić and B.. Manjunath Electrical and Computer Engineering Department University of California, anta Barbara, CA 93-9 {jelena, manj}@ece.ucsb.edu Abstract

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

ECE414/514 Electronics Packaging Spring 2012 Lecture 6 Electrical D: Transmission lines (Crosstalk) Lecture topics

ECE414/514 Electronics Packaging Spring 2012 Lecture 6 Electrical D: Transmission lines (Crosstalk) Lecture topics ECE414/514 Electronics Packaging Spring 2012 Lecture 6 Electrical D: Transmission lines (Crosstalk) James E. Morris Dept of Electrical & Computer Engineering Portland State University 1 Lecture topics

More information

Satisfiability Checking

Satisfiability Checking Satisfiability Checking Interval Constraint Propagation Prof. Dr. Erika Ábrahám RWTH Aachen University Informatik 2 LuFG Theory of Hybrid Systems WS 14/15 Satisfiability Checking Prof. Dr. Erika Ábrahám

More information

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future

More information

SBASS: Segment based approach for subsequence searches in sequence databases

SBASS: Segment based approach for subsequence searches in sequence databases Comput Syst Sci & Eng (2007) 1-2: 37-46 2007 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering SBASS: Segment based approach for subsequence searches in sequence databases

More information

Searching Dimension Incomplete Databases

Searching Dimension Incomplete Databases IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO., JANUARY 3 Searching Dimension Incomplete Databases Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang Abstract

More information

1 Vectors. c Kun Wang. Math 151, Fall Vector Supplement

1 Vectors. c Kun Wang. Math 151, Fall Vector Supplement Vector Supplement 1 Vectors A vector is a quantity that has both magnitude and direction. Vectors are drawn as directed line segments and are denoted by boldface letters a or by a. The magnitude of a vector

More information

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Section 7.5: Nonstationary Poisson Processes

Section 7.5: Nonstationary Poisson Processes Section 7.5: Nonstationary Poisson Processes Discrete-Event Simulation: A First Course c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation: A First Course Section 7.5: Nonstationary Poisson

More information

Probabilistic Similarity Query on Dimension Incomplete Data

Probabilistic Similarity Query on Dimension Incomplete Data 2009 Ninth IEEE International Conference on Data Mining Probabilistic Similarity Query on Dimension Incomplete Data Wei Cheng School of Software Tsinghua University chengw07@mails.tsinghua.edu.cn Xiaoming

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Structure and Reaction querying in Reaxys

Structure and Reaction querying in Reaxys Structure and Reaction querying in Reaxys A short history Dr. Jürgen Swienty-Busch, Derrick Umali April 5 2017 1 2 Agenda The History: where do we come from? The Present: Reaxys content today Indexing

More information

Indexing of Variable Length Multi-attribute Motion Data

Indexing of Variable Length Multi-attribute Motion Data Indexing of Variable Length Multi-attribute Motion Data Chuanjun Li Gaurav Pradhan S.Q. Zheng B. Prabhakaran Department of Computer Science University of Texas at Dallas, Richardson, TX 7583 {chuanjun,

More information

x 2 + 6x 18 x + 2 Name: Class: Date: 1. Find the coordinates of the local extreme of the function y = x 2 4 x.

x 2 + 6x 18 x + 2 Name: Class: Date: 1. Find the coordinates of the local extreme of the function y = x 2 4 x. 1. Find the coordinates of the local extreme of the function y = x 2 4 x. 2. How many local maxima and minima does the polynomial y = 8 x 2 + 7 x + 7 have? 3. How many local maxima and minima does the

More information

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14 LOCAL SEARCH Today Reading AIMA Chapter 4.1-4.2, 5.1-5.2 Goals Local search algorithms n hill-climbing search n simulated annealing n local beam search n genetic algorithms n gradient descent and Newton-Rhapson

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Time Series Shapelets

Time Series Shapelets Time Series Shapelets Lexiang Ye and Eamonn Keogh This file contains augmented versions of the figures in our paper, plus additional experiments and details that were omitted due to space limitations Please

More information

Issues Concerning Dimensionality and Similarity Search

Issues Concerning Dimensionality and Similarity Search Issues Concerning Dimensionality and Similarity Search Jelena Tešić, Sitaram Bhagavathy, and B. S. Manjunath Electrical and Computer Engineering Department University of California, Santa Barbara, CA 936-956

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories

Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 10(3):428-440, June, 2015. Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories S. Wang, L. Wu, F. Zhou, C.

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data

Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data VLDB Journal manuscript No. (will be inserted by the editor) Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data Jinchuan Chen Reynold Cheng Mohamed

More information

Environment (Parallelizing Query Optimization)

Environment (Parallelizing Query Optimization) Advanced d Query Optimization i i Techniques in a Parallel Computing Environment (Parallelizing Query Optimization) Wook-Shin Han*, Wooseong Kwak, Jinsoo Lee Guy M. Lohman, Volker Markl Kyungpook National

More information

A Survey on Spatial-Keyword Search

A Survey on Spatial-Keyword Search A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rule. Lecturer: Dr. Bo Yuan. LOGO Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations

More information

Cumulative Review. Name. 13) 2x = -4 13) SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Cumulative Review. Name. 13) 2x = -4 13) SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Cumulative Review Name SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Evaluate the algebraic expression for the given value or values of the variable(s).

More information

Information About Ellipses

Information About Ellipses Information About Ellipses David Eberly, Geometric Tools, Redmond WA 9805 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view

More information

Mining State Dependencies Between Multiple Sensor Data Sources

Mining State Dependencies Between Multiple Sensor Data Sources Mining State Dependencies Between Multiple Sensor Data Sources C. Robardet Co-Authored with Marc Plantevit and Vasile-Marian Scuturici April 2013 1 / 27 Mining Sensor data A timely challenge? Why is it

More information

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and

More information

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering

More information

Extracting Optimal Performance from Dynamic Time Warping

Extracting Optimal Performance from Dynamic Time Warping Extracting Optimal Performance from Dynamic Time Warping Abdullah Mueen mueen@unm.edu Eamonn Keogh eamonn@cs.ucr.edu While you are waiting: Please download the slides at: www.cs.unm.edu/~mueen/dtw1.pdf

More information

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Efficient Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Yu Jiang,, Jiannan Wang, Guoliang Li, and Jianhua Feng Tsinghua University Similarity Search&Join Competition

More information

The τ-skyline for Uncertain Data

The τ-skyline for Uncertain Data CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation

More information

Probabilistic Frequent Itemset Mining in Uncertain Databases

Probabilistic Frequent Itemset Mining in Uncertain Databases Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD'9), Paris, France, 29. Probabilistic Frequent Itemset Mining in Uncertain Databases Thomas Bernecker, Hans-Peter Kriegel, Matthias

More information

Multi-Dimensional Top-k Dominating Queries

Multi-Dimensional Top-k Dominating Queries Noname manuscript No. (will be inserted by the editor) Multi-Dimensional Top-k Dominating Queries Man Lung Yiu Nikos Mamoulis Abstract The top-k dominating query returns k data objects which dominate the

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al. 6.869 Advances in Computer Vision Prof. Bill Freeman March 3, 2005 Image and shape descriptors Affine invariant features Comparison of feature descriptors Shape context Readings: Mikolajczyk and Schmid;

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Zhenyu Liu University of California Los Angeles, CA 995 vicliu@cs.ucla.edu Ka Cheung Sia University of California

More information

A framework for detailed multiphase cloud modeling on HPC systems

A framework for detailed multiphase cloud modeling on HPC systems Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,

More information

Computing the Bernstein-Sato polynomial

Computing the Bernstein-Sato polynomial Computing the Bernstein-Sato polynomial Daniel Andres Kaiserslautern 14.10.2009 Daniel Andres (RWTH Aachen) Computing the Bernstein-Sato polynomial Kaiserslautern 2009 1 / 21 Overview 1 Introduction 2

More information

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries To appear in Data Mining and Knowledge Discovery, an International Journal c Kluwer Academic Publishers

More information

Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental

More information

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions The Registration Problem Most analyzes only account for variation in amplitude. Frequently, observed data exhibit features that vary in time. Option 1: Landmark Registration We can try to align specific

More information

Lecture 14 More on Digital Signatures and Variants. COSC-260 Codes and Ciphers Adam O Neill Adapted from

Lecture 14 More on Digital Signatures and Variants. COSC-260 Codes and Ciphers Adam O Neill Adapted from Lecture 14 More on Digital Signatures and Variants COSC-260 Codes and Ciphers Adam O Neill Adapted from http://cseweb.ucsd.edu/~mihir/cse107/ Setting the Stage We will cover in more depth some issues for

More information

' Characteristics of DSS Queries: read-mostly, complex, adhoc, Introduction Tremendous growth in Decision Support Systems (DSS). with large foundsets

' Characteristics of DSS Queries: read-mostly, complex, adhoc, Introduction Tremendous growth in Decision Support Systems (DSS). with large foundsets ' & Bitmap Index Design and Evaluation Chan Chee-Yong of Wisconsin-Madison University Ioannidis Yannis of Wisconsin-Madison University University of Athens 1 ' Characteristics of DSS Queries: read-mostly,

More information

A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases

A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker #, Tobias Emrich #, Hans-Peter Kriegel #, Nikos Mamoulis, Matthias Renz #, Andreas Züfle #

More information

The sequential decoding metric for detection in sensor networks

The sequential decoding metric for detection in sensor networks The sequential decoding metric for detection in sensor networks B. Narayanaswamy, Yaron Rachlin, Rohit Negi and Pradeep Khosla Department of ECE Carnegie Mellon University Pittsburgh, PA, 523 Email: {bnarayan,rachlin,negi,pkk}@ece.cmu.edu

More information

A Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints

A Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints A Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints Christoph Schwindt Institut für Wirtschaftstheorie und Operations Research University

More information

Probabilistic Nearest-Neighbor Query on Uncertain Objects

Probabilistic Nearest-Neighbor Query on Uncertain Objects Probabilistic Nearest-Neighbor Query on Uncertain Objects Hans-Peter Kriegel, Peter Kunath, Matthias Renz University of Munich, Germany {kriegel, kunath, renz}@dbs.ifi.lmu.de Abstract. Nearest-neighbor

More information

Estimating the Selectivity of tf-idf based Cosine Similarity Predicates

Estimating the Selectivity of tf-idf based Cosine Similarity Predicates Estimating the Selectivity of tf-idf based Cosine Similarity Predicates Sandeep Tata Jignesh M. Patel Department of Electrical Engineering and Computer Science University of Michigan 22 Hayward Street,

More information

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces David Schultz and Brijnesh Jain Technische Universität Berlin, Germany arxiv:1701.06393v1 [cs.cv] 23 Jan 2017 Abstract.

More information

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database

More information

Impulse and Change in Momentum

Impulse and Change in Momentum Activity 15 PS-2826 Impulse and Change in Momentum Mechanics: momentum, impulse, change in momentum GLX setup file: impulse Qty Equipment and Materials Part Number 1 PASPORT Xplorer GLX PS-2002 1 PASPORT

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

How we wanted to revolutionize X-ray radiography, and how we then "accidentally" discovered single-photon CMOS imaging

How we wanted to revolutionize X-ray radiography, and how we then accidentally discovered single-photon CMOS imaging How we wanted to revolutionize X-ray radiography, and how we then "accidentally" discovered single-photon CMOS imaging Stanford University EE Computer Systems Colloquium February 23 rd, 2011 EE380 Peter

More information

Progressive Algebraic Soft-Decision Decoding of Reed-Solomon Codes

Progressive Algebraic Soft-Decision Decoding of Reed-Solomon Codes Progressive Algebraic Soft-Decision Decoding of Reed-Solomon Codes Li Chen ( 陈立 ), PhD, MIEEE Associate Professor, School of Information Science and Technology Sun Yat-sen University, China Joint work

More information

4 Locality-sensitive hashing using stable distributions

4 Locality-sensitive hashing using stable distributions 4 Locality-sensitive hashing using stable distributions 4. The LSH scheme based on s-stable distributions In this chapter, we introduce and analyze a novel locality-sensitive hashing family. The family

More information

arxiv: v1 [math.st] 1 Dec 2014

arxiv: v1 [math.st] 1 Dec 2014 HOW TO MONITOR AND MITIGATE STAIR-CASING IN L TREND FILTERING Cristian R. Rojas and Bo Wahlberg Department of Automatic Control and ACCESS Linnaeus Centre School of Electrical Engineering, KTH Royal Institute

More information

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces RC24859 (W0909-03) September 0, 2009 Computer Science IBM Research Report A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces George Saon, Peder Olsen

More information

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

TASM: Top-k Approximate Subtree Matching

TASM: Top-k Approximate Subtree Matching TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,

More information

The effect of environmental and operational variabilities on damage detection in wind turbine blades

The effect of environmental and operational variabilities on damage detection in wind turbine blades The effect of environmental and operational variabilities on damage detection in wind turbine blades More info about this article: http://www.ndt.net/?id=23273 Thomas Bull 1, Martin D. Ulriksen 1 and Dmitri

More information

Applications of Submodular Functions in Speech and NLP

Applications of Submodular Functions in Speech and NLP Applications of Submodular Functions in Speech and NLP Jeff Bilmes Department of Electrical Engineering University of Washington, Seattle http://ssli.ee.washington.edu/~bilmes June 27, 2011 J. Bilmes Applications

More information

Table Probabilities and Independence

Table Probabilities and Independence Table Probabilities and Independence Dr Tom Ilvento Department of Food and Resource Economics Overview This lecture will focus on working with categorical data and building tables It will walk you through

More information

Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries Data Mining and Knowledge Discovery, 7, 5 22, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

Spatial Database. Ahmad Alhilal, Dimitris Tsaras Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview Algorithmic Methods of Data Mining, Fall 2005, Course overview 1 Course overview lgorithmic Methods of Data Mining, Fall 2005, Course overview 1 T-61.5060 Algorithmic methods of data mining (3 cp) P T-61.5060

More information

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases Xiang Lian and Lei Chen Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear

More information

Numerical Methods I: Eigenvalues and eigenvectors

Numerical Methods I: Eigenvalues and eigenvectors 1/25 Numerical Methods I: Eigenvalues and eigenvectors Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu November 2, 2017 Overview 2/25 Conditioning Eigenvalues and eigenvectors How hard are they

More information

Efficient Rank Join with Aggregation Constraints

Efficient Rank Join with Aggregation Constraints Efficient Ran Join with Aggregation Constraints Min Xie Dept. of Computer Science, Univ. of British Columbia minxie@cs.ubc.ca Las V.S. Lashmanan Dept. of Computer Science, Univ. of British Columbia las@cs.ubc.ca

More information

Tuple Lattice Sieving

Tuple Lattice Sieving Tuple Lattice Sieving Shi Bai, Thijs Laarhoven and Damien Stehlé IBM Research Zurich and ENS de Lyon August 29th 2016 S. Bai, T. Laarhoven & D. Stehlé Tuple Lattice Sieving 29/08/2016 1/25 Main result

More information

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com

More information

Developing ICON Data Assimilation and Ensemble Data Assimilation

Developing ICON Data Assimilation and Ensemble Data Assimilation Data Assimilation at DWD Developing ICON Data Assimilation and Ensemble Data Assimilation Roland Potthast Deutscher Wetterdienst, Offenbach, Germany Sept 2, 2013 Data Assimilation at DWD Roland.potthast@dwd.de

More information

Time Series Classification

Time Series Classification Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED

More information

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing Nov 12 2014 Hien To, Gabriel Ghinita, Cyrus Shahabi VLDB 2014 1 Mo/va/on Ubiquity of mobile users 6.5 billion mobile subscrip/ons,

More information

Introduction to Linear Programming (LP) Mathematical Programming (MP) Concept (1)

Introduction to Linear Programming (LP) Mathematical Programming (MP) Concept (1) Introduction to Linear Programming (LP) Mathematical Programming Concept LP Concept Standard Form Assumptions Consequences of Assumptions Solution Approach Solution Methods Typical Formulations Massachusetts

More information

Association Analysis. Part 1

Association Analysis. Part 1 Association Analysis Part 1 1 Market-basket analysis DATA: A large set of items: e.g., products sold in a supermarket A large set of baskets: e.g., each basket represents what a customer bought in one

More information

Vehicle Routing with Traffic Congestion and Drivers Driving and Working Rules

Vehicle Routing with Traffic Congestion and Drivers Driving and Working Rules Vehicle Routing with Traffic Congestion and Drivers Driving and Working Rules A.L. Kok, E.W. Hans, J.M.J. Schutten, W.H.M. Zijm Operational Methods for Production and Logistics, University of Twente, P.O.

More information