Anticipatory DTW for Efficient Similarity Search in Time Series Databases

Similar documents
Less is More: Non-Redundant Subspace Clustering

cient and E ective Similarity Search based on Earth Mover s Distance

Dynamic Time Warping

P leiades: Subspace Clustering and Evaluation

Probabilistic Similarity Search for Uncertain Time Series

Configuring Spatial Grids for Efficient Main Memory Joins

Efficient search of the best warping window for Dynamic Time Warping

Time-Series Analysis Prediction Similarity between Time-series Symbolic Approximation SAX References. Time-Series Streams

ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information

Uncertain Time-Series Similarity: Return to the Basics

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Constraint-based Subspace Clustering

Design and Implementation of Speech Recognition Systems

Nearest Neighbor Search for Relevance Feedback

Mining Positive and Negative Fuzzy Association Rules

ECE414/514 Electronics Packaging Spring 2012 Lecture 6 Electrical D: Transmission lines (Crosstalk) Lecture topics

Satisfiability Checking

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

SBASS: Segment based approach for subsequence searches in sequence databases

Searching Dimension Incomplete Databases

1 Vectors. c Kun Wang. Math 151, Fall Vector Supplement

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

TinySR. Peter Schmidt-Nielsen. August 27, 2014

Section 7.5: Nonstationary Poisson Processes

Probabilistic Similarity Query on Dimension Incomplete Data

Finding Frequent Items in Probabilistic Data

Structure and Reaction querying in Reaxys

Indexing of Variable Length Multi-attribute Motion Data

x 2 + 6x 18 x + 2 Name: Class: Date: 1. Find the coordinates of the local extreme of the function y = x 2 4 x.

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Time Series Shapelets

Issues Concerning Dimensionality and Similarity Search

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

CS6220: DATA MINING TECHNIQUES

Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Multi-Approximate-Keyword Routing Query

Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data

Environment (Parallelizing Query Optimization)

A Survey on Spatial-Keyword Search

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Cumulative Review. Name. 13) 2x = -4 13) SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Information About Ellipses

Mining State Dependencies Between Multiple Sensor Data Sources

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

Extracting Optimal Performance from Dynamic Time Warping

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints

The τ-skyline for Uncertain Data

Probabilistic Frequent Itemset Mining in Uncertain Databases

Multi-Dimensional Top-k Dominating Queries

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty

A framework for detailed multiphase cloud modeling on HPC systems

Computing the Bernstein-Sato polynomial

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Window-aware Load Shedding for Aggregation Queries over Data Streams

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions

Lecture 14 More on Digital Signatures and Variants. COSC-260 Codes and Ciphers Adam O Neill Adapted from

' Characteristics of DSS Queries: read-mostly, complex, adhoc, Introduction Tremendous growth in Decision Support Systems (DSS). with large foundsets

A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases

The sequential decoding metric for detection in sensor networks

A Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints

Probabilistic Nearest-Neighbor Query on Uncertain Objects

Estimating the Selectivity of tf-idf based Cosine Similarity Predicates

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Impulse and Change in Momentum

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

6.842 Randomness and Computation Lecture 5

How we wanted to revolutionize X-ray radiography, and how we then "accidentally" discovered single-photon CMOS imaging

Progressive Algebraic Soft-Decision Decoding of Reed-Solomon Codes

4 Locality-sensitive hashing using stable distributions

arxiv: v1 [math.st] 1 Dec 2014

IBM Research Report. A Lower Bound on the Euclidean Distance for Fast Nearest Neighbor Retrieval in High-dimensional Spaces

Nearest Neighbor Search with Keywords in Spatial Databases

TASM: Top-k Approximate Subtree Matching

The effect of environmental and operational variabilities on damage detection in wind turbine blades

Applications of Submodular Functions in Speech and NLP

Table Probabilities and Independence

Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

An Introduction to Wavelets and some Applications

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

CS6220: DATA MINING TECHNIQUES

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases

Numerical Methods I: Eigenvalues and eigenvectors

Efficient Rank Join with Aggregation Constraints

Tuple Lattice Sieving

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data

Developing ICON Data Assimilation and Ensemble Data Assimilation

Time Series Classification

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

Introduction to Linear Programming (LP) Mathematical Programming (MP) Concept (1)

Association Analysis. Part 1

Vehicle Routing with Traffic Congestion and Drivers Driving and Working Rules

Transcription:

Anticipatory DTW for Efficient Similarity Search in Time Series Databases Ira Assent Marc Wichterich Ralph Krieger Hardy Kremer Thomas Seidl RWTH Aachen University, Germany Aalborg University, Denmark VLDB 2009 Lyon, France

Overview 1 Introduction 2 Dynamic Time Warping 3 Anticipatory pruning 4 Experiments 5 Conclusion Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 2 / 21

Time series similarity search Time series Sequence of time related values Stock data, sensor data, EEG measurements, climate data,... value (t i ) [mv] time series from EEG data set 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 0 128 256 384 512 point in time (i) Similarity search Find time series with similar patterns over time Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 3 / 21

Dynamic Time Warping (DTW) Euclidean Distance Most widely used distance functions: Euclidean distance and Dynamic Time Warping Dynamic Time Warping allows scaling and stretching for better alignment, but computationally costly Euclidean Distance Dynamic Time Warping Figure: Euclidean distance (left) and Dynamic Time Warping (right) Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 4 / 21

DTW definition k-band DTW DTW ([s 1,..., s n ], [t 1,..., t m ]) = DTW ([s 1,..., s n 1 ], [t 1,..., t m 1 ]) dist band (s n, t m ) + min DTW ([s 1,..., s n ], [t 1,..., t m 1 ]) DTW ([s 1,..., s n 1 ], [t 1,..., t m ]) with dist band (s i, t j ) = { dist(s i, t j ) i j n m k else DTW (, ) = 0, DTW (x, ) =, DTW (, y) = Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 5 / 21

DTW Computation t j s i dist(s i, t j ) + min{c i-1,j-1,c i,j-1,c i-1,j } Figure: Cumulative warping matrix Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 6 / 21

Existing DTW algorithms DTW is computationally expensive Many approaches use multistep filter-and-refine architecture If filter lower bounds DTW lossless Different filters have been proposed and achieve substantial speed-ups query index (filter) candidates refinement database (exact) result Figure: Multistep filter-and-refine architecture Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 7 / 21

DTW properties for speed-up DTW is incremental. For any cumulative DTW matrix C = [c i,j ], the column minima are monotonically non-decreasing: min i=1,...,n {c i,x } min i=1,...,n {c i,y } for x < y. Existing approach: early stopping /early abandon compute DTW cumulative matrix after each filled column (band), check threshold for pruning Not as tight as it could be new anticipatory pruning tj si dist(s i,t j) + min{c i-1,j-1,c i,j-1,c i-1,j} Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 8 / 21

Anticpatory pruning query q data DTW filter 1 AP n candidatesap result anticipatory pruning AP 2 Figure: Anticipatory pruning in multistep filter-and-refine Multistep with anticipatory pruning Benefit from work already done in filter step Low overhead, substantial speed-up Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 9 / 21

General idea Anticipatory pruning: As in early stopping, compute DTW incrementally Additionally: re-use filter information to anticipate full DTW (no additional computation necessary!) Requires: filter for remainder of time series We characterize a class of filters to construct such anticipation: piecewise DTW property: reversible...... DTW estimate DTW estimate DTW estimate 1..5 6..15 1..6 7..15 1..7 8..15 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 10 / 21

Piecewise filter Piecewise DTW lower bound. A piecewise lower bounding filter for the DTW distance is a set f = {f 0,..., f m } with the following property: j = 0 : f j (s, t) = 0 j > 0 : f j (s, t) min DTW ([s 1,..., s i ], [t 1,..., t j ]) (i,j) band j Piecewise is the property of the filter that complements the incremental nature of DTW....... DTW estimate DTW estimate DTW estimate 1..5 6..15 1..6 7..15 1..7 8..15 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 11 / 21

DTW is reversible DTW is reversible. For any two time series [s 1,..., s n ] and [t 1,..., t m ] their DTW distance is the same as for the reversed time series: DTW ([s 1,..., s n ],[t 1,..., t m ]) = DTW ([s n,..., s 1 ],[t m,..., t 1 ]) Reversible is the DTW property that allows alteration of the time series direction between filter and DTW....... DTW estimate DTW estimate DTW estimate 1..5 6..15 1..6 7..15 1..7 8..15 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 12 / 21

Introduction Dynamic Time Warping Anticipatory pruning 23 27 28 Experiments Conclusion [5, 3, 4, 5, 3, 4, 8, 3, 9, Anticipatory pruning Anticipatory Pruning Distance. q Given two time series s and t of length 9 n and m, a cumulative distance matrix 6 8 10 C = [c i,j ], a piecewise lower bounding filter f for reversed time series, the j th 2 7 6 anticipatory pruning is 2 5 14 15 17 12 13 16 13 AP j (s, step t) j: 0 := 1 2 min 3 4 {c 5 6 i,j} + 7 f 8 m j (s 9 10, t ). min{}: 0 2 5i=1,...,n 6 10 13 15 17 18 21 28 13 19 18 21 16 17 21 f : 18 18 15 14 11 10 10 10 10 7 0 AP: 18 20 20 20 21 23 25 27 28 28 28 40 DTW 35 30 AP=min{} + f distance 25 20 15 pruning threshold 10 f: anticipatory part 5 on inverted time series min{}: early stopping part 0 step j: 0 1 2 3 4 5 6 7 8 9 10 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 13 / 21

Example [5, 3, 4, 5, 3, 4, 8, 3, 9, 10] 6 2 7 9 8 6 12 13 10 14 13 13 16 15 16 19 17 17 23 18 21 33 27 28 21 2 5 step j: 0 1 2 3 4 5 6 7 8 9 10 min{}: 0 2 5 6 10 13 15 17 18 21 28 f : 18 18 15 14 11 10 10 10 10 7 0 AP: 18 20 20 20 21 23 25 27 28 28 28 40 DTW 35 Figure: Anticipatory pruning example 30 q t [3, 8, 2, 0, 2, 6, 6, 4, 0, 2] AP=min{} + f Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 14 / 21 35

Anticipatory pruning is lossless Theorem: Anticipatory Pruning lower bounds DTW. Anticipatory pruning between two time series s, t with respect to a bandwidth k and a lower bounding filter f lower bounds the DTW: Proof sketch AP j (s, t) DTW (s, t) j {1,..., n} Anticipatory pruning: series of partial DTW paths plus lower bound estimate of the remainder from the previous filter step (1) For each possible step r, 1 r n, column minima lower bound the true path (2) DTW is reversible (3) Combine (1),(2): lower bound of DTW Lower bounding means: lossless pruning, i.e. speed-up + no loss of accuracy Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 15 / 21

Existing piecewise lower bounds Linearization LB Keogh : Euclidean distance to envelopes (upper, lower bound of segments) [Keogh, VLDB 2002; Zhu/Shasha, SIGMOD 2003] Corner boundaries LB Hybrid : piecewise corner-like shapes in the warping matrix through which every warping path has to pass [Zhou/Wong, ICDE 2007] Path Approximation FTW (Fast search method for Dynamic Time Warping): go from coarser DTW (less costly) to finer as needed: [Sakurai/Yoshikawa/Faloutsos, PODS 2005] are all piecewise lower bounds as required for anticipatory pruning (details in paper) Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 16 / 21

Experimental setup Synthetic and real world data sets SignLanguage: 1, 400 multivariate (11 attributes) of sign language finger tracking data [1], length 64 to 512 TRECVid: 650 to 2, 000 benchmark data [2] NEWSVid: 2, 000 to 8, 000 TV news recorded at 30 fps (20 attributes), length 64 to 2048 Random walk RW1/RW2: zero normalized time series of length 512 and of cardinality 10, 000 (1 to 50 attributes) 40 time series generated via RW1 6000 time series generated via RW2 value (t i ) 30 20 10 0-10 -20 value (t i ) 4000 2000 0-2000 -4000-30 -6000 128 256 384 (a) RW1 t (i+1)j point = in time t ij + (i) N(0, 1) 0 512 128 256 384 (b) RW2 t (i+1)j = tpoint ij + N(t in time ij (i) t (i 1)j, 1) 0 512 [1] www.cse.unsw.edu.au/ waleed/tml/data/ [2] Smeaton/Over/Kraaij, Evaluation campaigns and TRECVid, MIR 2006 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 17 / 21

relative Improvement [percent] Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments 100 80 60 40 20 0 1 5 10 20 30 40 50 number of attributes AP_FTW ES_FTW AP_LBKeogh ES_LBKeogh AP_LBHybrid ES_LBHybrid Figure: Relative improvement (#calc.) for varying number of attributes on RW2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 18 / 21

relative Improvement [percent] Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments 2 65 55 45 AP_LBKeogh ES_LBKeogh AP_FTW ES_FTW AP_LBHybrid ES_LBHybrid 35 25 10 30 50 70 90 110 130 150 k Figure: Efficiency improvement (#calc.) for varying DTW bandwidths on NEWSVid Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 19 / 21

average query time [s] log. scale average query time [s] log. scale average query time [s] log. scale Introduction Dynamic Time Warping Anticipatory pruning Experiments Conclusion Experiments 3 1 0.5 LBKeogh ES_LBKeogh AP_LBKeogh 100 10 0.25 0.125 256 128 64 32 16 8 reduced length 1 0.1 FTW ES_FTW AP_FTW 256 128 64 32 16 8 reduced length 1 0.5 LBHybrid ES_LBHybrid AP_LBHybrid 0.25 0.125 256 128 64 32 16 8 reduced length Figure: (log. scale) Absolute improvement (average query time), reduction, RW2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 20 / 21

Conclusion Anticipatory pruning Speed up DTW (Dynamic Time Warping) Widely used distance function for time series similarity search Our novel anticipatory pruning makes best use of A family of multistep filter-and-refine approaches Compute an estimated overall DTW distance from already available filter information: series of lower bounds of the DTW query q data DTW filter 1 AP n candidatesap result anticipatory pruning Experiments demonstrate substantially reduced runtime AP can be flexibly combined with existing and future DTW lower bounds AP is orthogonal to speed-up via indexing etc. AP 2 Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 21 / 21

Conclusion Anticipatory pruning Speed up DTW (Dynamic Time Warping) Widely used distance function for time series similarity search Our novel anticipatory pruning makes best use of A family of multistep filter-and-refine approaches Compute an estimated overall DTW distance from already available filter information: series of lower bounds of the DTW query q data DTW filter 1 AP n candidatesap result anticipatory pruning Experiments demonstrate substantially reduced runtime AP can be flexibly combined with existing and future DTW lower bounds AP is orthogonal to speed-up via indexing etc. AP 2 Thank you for your attention. Questions? Assent et al. Anticipatory DTW for Efficient Similarity Search in Time Series Databases 21 / 21