Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Size: px
Start display at page:

Download "Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques"

Transcription

1 Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database Management

2 Introduction Standard Time Series Value u =< u1,..., un 1 2 n Timestamp > Uncertain Time Series U 1 U 2?????? U =< U1,..., U n?????? Timestamp >????? U n?? 2

3 Introduction Observed value Exact value i, U = t + E i i Error i Uncertain Time Series U 1 Privacy concerns U 2 Multiple readings?????? U =< U1,..., U n Timestamp > U n????? Forecasting techniques???????? Data collection error Wireless sensor network Medical data analysis Location-based services 3

4 Introduction: Similarity Search Approaches Traditional Similarity Measures Uncertain Similarity Measures Values Values+ Statistical Information 4

5 Introduction: Similarity Search Approaches Traditional Similarity Measures [DNMP,12] OUTPERFORM Uncertain Similarity Measures Why? Using a Preprocessing Step 5

6 Introduction: Preprocessing Uncertain Time Series Filtering: Smoother but farther from the exact values Standard Time Series Filtering + Normalization : Closer to the exact values 6

7 Motivation Can preprocessing improve the performance of uncertain similarity measures? Will uncertain similarity measures outperform traditional ones using preprocessing techniques? 7

8 Outline Uncertain Similarity Measures Preprocessing Techniques Experimental Evaluation 8

9 -Orang & Shiri [OS,14] Uncertain Similarity Measures D ( X, Y ) = c A Constant Deterministic Probabilistic D ( X, Y ) = Z A Random Variable DUST -Sarangi et al. [SM,10] PROUD -Yeh et al. [YWY,09] Uncertain Correlation -Orang & Shiri [OS,12] P ( Z c) = p p c 9

10 -Orang & Shiri [OS,14] Uncertain Similarity Measures D ( X, Y ) = c A Constant Deterministic Probabilistic D ( X, Y ) = Z A Random Variable DUST -Sarangi et al. [SM,10] PROUD -Yeh et al. [YWY,09] Uncertain Correlation -Orang & Shiri [OS,12] Probabilistic Queries P( Eucl( X, Q) d ) p P( Corr( X, Q) c) p Given a dataset of uncertain time series D, an uncertain time series Q, a similarity threshold s, and a probability threshold p, a probabilistic query searches for uncertain time series X in D such that similarity between X and Q is higher than s, with a confidence of at least p. 10

11 Preprocessing Techniques Moving Average Filters Normalization 11

12 x =< x,..., xn 1 > Moving Average Filters [DNMP,12] MA Simple Moving Average MA MA x =< x,..., x 1 Uncertain Moving Average UEMA MA m UMA UMA x =< x,..., x 1 > UMA m > x MA i = 1 2w + 1 i + w k= i w UMA i + w UMA 1 xk Each value is substituted by average xi of adjacent = 2w + 1 values. k= i w σ k k k Each value is substituted by weighted UEMA average of adjacent k= i w Uncertain Exponential Moving Average xi = i+ w values. λ k i UEMA UEMA UEMA e x =< x,..., x > k= i w 1 m i+ w x ( e x k λ k i Each value is substituted by weighted average of adjacent values, weights decrease exponentially. 12 σ )

13 Normalization x =< x,..., xn x 1 ˆ 1 =< xˆ,..., xˆ n > > Filtering changes the scale and baseline xˆ i = x i s x x Normalization makes similarity measures invariant to scaling and shifting and hence helps better capture the similarity. Filtering + Normalization : Closer to the exact values 13

14 Experiment Setup CPU:2.66 GHz, RAM:4GB 16 UCR datasets [KZHHXWR] Set up similar to [YWY,09], [SM,10]: Data Parameters: The standard deviation of UTS: r Х σ r: Standard deviation ratio (SDR), varied from 0.01 to 4 σ: The standard deviation of the given standard time series Error distribution function: Exponential, Normal, Uniform Query Parameters Probability threshold, varied from 0.1 to 0.9. Similarity threshold, Correlation threshold c, varied from 0.1 to 0.9 Euclidean threshold d, d = 2 n 1 1 c Filtering Parameters [DNMP,12] w = 2 and λ = 1 Performance measure: Classification error, F1 Score Average over 10 runs 14

15 1) Deterministic Similarity Measures Comparison approach: 1NN-classification with K-fold cross-validation[dtswk,08] No Preprocessing Simple Moving Average DUST Error < Euclidean Error Uncertain Moving Average Uncertain Exponential moving Average The weighted filters help the Euclidean distance achieve similar performance as a weighted similarity measure such as DUST. DUST Error ~= Euclidean Error 15

16 2) Probabilistic Similarity Measures: Uncertain Correlation No Preprocessing Simple Moving Average more results No result Uncertain Moving Average Uncertain Exponential moving Average 16

17 2) Probabilistic Similarity Measures: Uncertain Correlation No Preprocessing Simple Moving Average Uncertain Moving Average Uncertain Exponential moving Average for low probability thresholds, the F1 score is higher than simple moving average 17

18 Dataset 50words Adiac Beef CBF Coffee ECG200 FISH FaceFour Gun-Point Lighting2 Lighting7 OSULeaf OliveOil Swedish Synthetic Trace Filter None MA UMA UEMA (+39%) 0.5 (+32%) 0.48 (+26%) (+41%) 0.74 (+45%) 0.71 (+39%) (+41%) 0.69 (+41%) 0.62 (+27%) (+26%) 0.45 (+18%) 0.45 (+18%) (+43%) 0.69 (+35%) 0.68 (+33%) (+35%) 0.62 (+35%) 0.61 (+33%) (+42%) 0.74 (+48%) 0.69 (+38%) (+36%) 0.53 (+36%) 0.5 (+28%) (+42%) 0.69 (+44%) 0.66 (+38%) (+35%) 0.43 (+39%) 0.41 (+32%) (+34%) 0.47 (34%) 0.45 (+29%) (+38%) 0.46 (+35%) 0.43 (+26%) (+43%) 0.73 (+43%) 0.66 (+29%) (+35%) 0.63 (+37%) 0.60 (+30%) (0%) 0.32 (-3%) 0.33 (0%) (+40%) 0.66 (+40%) 0.62 (+32%) 18

19 2) Probabilistic Similarity Measures: Uncertain Correlation No Preprocessing Simple Moving Average Pearson Correlation Uncertain Moving Average Uncertain Exponential moving Average 19

20 2) Probabilistic Similarity Measures: PROUD No Preprocessing Simple Moving Average Uncertain Moving Average Uncertain Exponential moving Average 20

21 2) Probabilistic Similarity Measures: PROUD EEEE x, y 100 P EEEE X, Y P EEEE X, Y 100 = 0 P EEEE X, Y 100 = 0 21

22 2) Probabilistic Similarity Measures: PROUD EEEE X, Y = X i Y i 2 n i=1 n 300 E EEEE X, Y = EEEE E(X), E Y + VVV X i + VVV(Y i ) i=1 E X =< E X 1,, E X n > P EEEE X, Y 100 = 0 P EEEE X, Y 100 = 0 22

23 PROUDS: An Enhanced Version of PROUD n EEEE X, Y = (E X i 2 + E Y i 2 + 2X i Y i ) i=1 E EEEE X, Y = EEEE E(X), E Y P EEEE Lemma X, Y 1. Given 100 uncertain = 0.8 time series X and Y with normal forms X and Y, the following holds. EEEE(X, Y ) = 2(n 1)(1 CCCC(X, Y)) 23

24 2) Probabilistic Similarity Measures: Uncertain Correlation 24

25 Conclusion Uncertain similarity measures can outperform the traditional similarity measures with and without data preprocessing. This indicates the effectiveness of uncertain similarity measures in practice. Uncertain similarity measures utilize all the available information to better quantify the similarity. Probabilistic similarity measures provide the users with more information about reliability of the result. Preprocessing is necessary for similarity search in uncertain time series. Simple and uncertain moving average filters improve the performance of the probabilistic measures more than uncertain exponential moving average filter. We propose an enhancement for the PROUD similarity measure, which improves its performance with and without data preprocessing. We found a linear relationship between probabilistic similarity measures. 25

26 Future Work Research on uncertain time series is new, mostly focused on modeling and similarity search. More work is required in this field: e.g., pattern discovery, indexing, prediction 26

27 References [DNMP,12] M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas, Uncertain Time Series Similarity: Return To The Basics, In Proc. of the VLDB Endowment, 5(11): , [DTSWK,08] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures, In Proc. of the VLDB Endowment, 1(2): , [KZHHXWR] E. Keogh, Q. Zhu, B. Hu, Y. Hao, X. Xi, L. Wei, C. A. Ratanamahatana, The UCR Time Series Classification/Clustering Homepage: [OS,12] M. Orang, and N. Shiri, A Probabilistic Approach To Correlation Queries In Uncertain Time Series Data, In Proc. of CIKM, , [OS,14] M. Orang, and N. Shiri, An Experimental Evaluation of Similarity Measures for Uncertain Time Series, In Proc. of IDEAS, [SM,10] S. R. Sarangi, K. Murthy, DUST: A Generalized Notion of Similarity between Uncertain Time Series, In Proc. of the 16th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, , [YWY,09] M. Y. Yeh, K. L. Wu, P. S. Yu, M. S. Chen, PROUD: A Probabilistic Approach to Processing Similarity Queries over Uncertain Data Streams, In Proc. of the 12th Int. Conf. on Extending Database Technology, ,

28 Sorry we could not attend the conference. For any questions or comments, please contact us at: 28

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database

More information

Uncertain Time-Series Similarity: Return to the Basics

Uncertain Time-Series Similarity: Return to the Basics Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects;

More information

MANAGING UNCERTAINTY IN SPATIO-TEMPORAL SERIES

MANAGING UNCERTAINTY IN SPATIO-TEMPORAL SERIES MANAGING UNCERTAINTY IN SPATIO-TEMPORAL SERIES Yania Molina Souto, Ana Maria de C. Moura, Fabio Porto Laboratório de Computação Científica LNCC DEXL Lab Petrópolis RJ Brasil yaniams@lncc.br, anamoura@lncc.br,

More information

Fusion of Similarity Measures for Time Series Classification

Fusion of Similarity Measures for Time Series Classification Fusion of Similarity Measures for Time Series Classification Krisztian Buza, Alexandros Nanopoulos, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim,

More information

Temporal and Frequential Metric Learning for Time Series knn Classication

Temporal and Frequential Metric Learning for Time Series knn Classication Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Temporal and Frequential Metric Learning for Time Series knn Classication Cao-Tri Do 123, Ahlame Douzal-Chouakria

More information

DIT - University of Trento Modeling and Querying Data Series and Data Streams with Uncertainty

DIT - University of Trento Modeling and Querying Data Series and Data Streams with Uncertainty PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento Modeling and Querying Data Series and Data Streams with Uncertainty Michele Dallachiesa

More information

Fundamentals of Similarity Search

Fundamentals of Similarity Search Chapter 2 Fundamentals of Similarity Search We will now look at the fundamentals of similarity search systems, providing the background for a detailed discussion on similarity search operators in the subsequent

More information

Time Series Classification Using Time Warping Invariant Echo State Networks

Time Series Classification Using Time Warping Invariant Echo State Networks 2016 15th IEEE International Conference on Machine Learning and Applications Time Series Classification Using Time Warping Invariant Echo State Networks Pattreeya Tanisaro and Gunther Heidemann Institute

More information

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data SDM 15 Vancouver, CAN Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data Houping Xiao 1, Yaliang Li 1, Jing Gao 1, Fei Wang 2, Liang Ge 3, Wei Fan 4, Long

More information

arxiv: v2 [cs.lg] 13 Mar 2018

arxiv: v2 [cs.lg] 13 Mar 2018 arxiv:1802.03628v2 [cs.lg] 13 Mar 2018 ABSTRACT Han Qiu Massachusetts Institute of Technology Cambridge, MA, USA hanqiu@mit.edu Francesco Fusco IBM Research Dublin, Ireland francfus@ie.ibm.com We propose

More information

Time-Series Analysis Prediction Similarity between Time-series Symbolic Approximation SAX References. Time-Series Streams

Time-Series Analysis Prediction Similarity between Time-series Symbolic Approximation SAX References. Time-Series Streams Time-Series Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Time-Series Analysis 2 Prediction Filters Neural Nets 3 Similarity between Time-series Euclidean Distance

More information

Time Series Data Cleaning

Time Series Data Cleaning Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Learning Dissimilarities on Time Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan Time series structure and

More information

Learning Time-Series Shapelets

Learning Time-Series Shapelets Learning Time-Series Shapelets Josif Grabocka, Nicolas Schilling, Martin Wistuba and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany SIGKDD 14,

More information

Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories

Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 10(3):428-440, June, 2015. Group Pattern Mining Algorithm of Moving Objects Uncertain Trajectories S. Wang, L. Wu, F. Zhou, C.

More information

An Alternate Measure for Comparing Time Series Subsequence Clusters

An Alternate Measure for Comparing Time Series Subsequence Clusters An Alternate Measure for Comparing Time Series Subsequence Clusters Ricardo Mardales mardales@ engr.uconn.edu Dina Goldin dqg@engr.uconn.edu BECAT/CSE Technical Report University of Connecticut 1. Introduction

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach

Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach Yinghui (Catherine) Yang Graduate School of Management, University of California, Davis AOB IV, One Shields

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

DISI - University of Trento Mining and Learning in Sequential Data Streams: Interesting Correlations and Classification in Noisy Settings

DISI - University of Trento Mining and Learning in Sequential Data Streams: Interesting Correlations and Classification in Noisy Settings PhD Dissertation International Doctorate School in Information and Communication Technologies DISI - University of Trento Mining and Learning in Sequential Data Streams: Interesting Correlations and Classification

More information

A Novel Method for Mining Relationships of entities on Web

A Novel Method for Mining Relationships of entities on Web , pp.480-484 http://dx.doi.org/10.14257/astl.2016.121.87 A Novel Method for Mining Relationships of entities on Web Xinyan Huang 1,3, Xinjun Wang* 1,2, Hui Li 1, Yongqing Zheng 1 1 Shandong University

More information

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns Learning in Probabilistic Graphs exploiting Language-Constrained Patterns Claudio Taranto, Nicola Di Mauro, and Floriana Esposito Department of Computer Science, University of Bari "Aldo Moro" via E. Orabona,

More information

Rare Event Discovery And Event Change Point In Biological Data Stream

Rare Event Discovery And Event Change Point In Biological Data Stream Rare Event Discovery And Event Change Point In Biological Data Stream T. Jagadeeswari 1 M.Tech(CSE) MISTE, B. Mahalakshmi 2 M.Tech(CSE)MISTE, N. Anusha 3 M.Tech(CSE) Department of Computer Science and

More information

Attraction and Avoidance Detection from Movements Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret Crofoot.

Attraction and Avoidance Detection from Movements Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret Crofoot. Attraction and Avoidance Detection from Movements Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret Crofoot. VLDB 14 Movement data contain valuable information Movement patterns

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

Frequency-hiding Dependency-preserving Encryption for Outsourced Databases

Frequency-hiding Dependency-preserving Encryption for Outsourced Databases Frequency-hiding Dependency-preserving Encryption for Outsourced Databases ICDE 17 Boxiang Dong 1 Wendy Wang 2 1 Montclair State University Montclair, NJ 2 Stevens Institute of Technology Hoboken, NJ April

More information

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE Dhanya Sridhar Lise Getoor U.C. Santa Cruz KDD Workshop on Causal Discovery August 14 th, 2016 1 Outline Motivation Problem Formulation Our Approach Preliminary

More information

Machine Learning: Pattern Mining

Machine Learning: Pattern Mining Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

Probabilistic Similarity Search for Uncertain Time Series

Probabilistic Similarity Search for Uncertain Time Series Probabilistic Similarity Search for Uncertain Time Series Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Matthias Renz Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 Munich, Germany

More information

Recent advances in Time Series Classification

Recent advances in Time Series Classification Distance Shapelet BoW Kernels CCL Recent advances in Time Series Classification Simon Malinowski, LinkMedia Research Team Classification day #3 S. Malinowski Time Series Classification 21/06/17 1 / 55

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

A Representation of Time Series for Temporal Web Mining

A Representation of Time Series for Temporal Web Mining A Representation of Time Series for Temporal Web Mining Mireille Samia Institute of Computer Science Databases and Information Systems Heinrich-Heine-University Düsseldorf D-40225 Düsseldorf, Germany samia@cs.uni-duesseldorf.de

More information

A Novel Click Model and Its Applications to Online Advertising

A Novel Click Model and Its Applications to Online Advertising A Novel Click Model and Its Applications to Online Advertising Zeyuan Zhu Weizhu Chen Tom Minka Chenguang Zhu Zheng Chen February 5, 2010 1 Introduction Click Model - To model the user behavior Application

More information

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces

Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces David Schultz and Brijnesh Jain Technische Universität Berlin, Germany arxiv:1701.06393v1 [cs.cv] 23 Jan 2017 Abstract.

More information

Finding Pareto Optimal Groups: Group based Skyline

Finding Pareto Optimal Groups: Group based Skyline Finding Pareto Optimal Groups: Group based Skyline Jinfei Liu Emory University jinfei.liu@emory.edu Jun Luo Lenovo; CAS jun.luo@siat.ac.cn Li Xiong Emory University lxiong@emory.edu Haoyu Zhang Emory University

More information

The τ-skyline for Uncertain Data

The τ-skyline for Uncertain Data CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data

Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com

More information

Detection of Highly Correlated Live Data Streams

Detection of Highly Correlated Live Data Streams BIRTE 17 Detection of Highly Correlated Live Data Streams R. Alseghayer, Daniel Petrov, P.K. Chrysanthis, M. Sharaf, A. Labrinidis University of Pittsburgh The University of Queensland Motivation U, m,

More information

Change Detection in Multivariate Data

Change Detection in Multivariate Data Change Detection in Multivariate Data Likelihood and Detectability Loss Giacomo Boracchi July, 8 th, 2016 giacomo.boracchi@polimi.it TJ Watson, IBM NY Examples of CD Problems: Anomaly Detection Examples

More information

Mobility Analytics through Social and Personal Data. Pierre Senellart

Mobility Analytics through Social and Personal Data. Pierre Senellart Mobility Analytics through Social and Personal Data Pierre Senellart Session: Big Data & Transport Business Convention on Big Data Université Paris-Saclay, 25 novembre 2015 Analyzing Transportation and

More information

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints

Efficient Parallel Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Efficient Partition based Algorithms for Similarity Search and Join with Edit Distance Constraints Yu Jiang,, Jiannan Wang, Guoliang Li, and Jianhua Feng Tsinghua University Similarity Search&Join Competition

More information

WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES. Z.Y. Dong X. Li Z. Xu K. L.

WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES. Z.Y. Dong X. Li Z. Xu K. L. WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES Abstract Z.Y. Dong X. Li Z. Xu K. L. Teo School of Information Technology and Electrical Engineering

More information

The Truncated Tornado in TMBB: A Spatiotemporal Uncertainty Model for Moving Objects

The Truncated Tornado in TMBB: A Spatiotemporal Uncertainty Model for Moving Objects The : A Spatiotemporal Uncertainty Model for Moving Objects Shayma Alkobaisi 1, Petr Vojtěchovský 2, Wan D. Bae 3, Seon Ho Kim 4, Scott T. Leutenegger 4 1 College of Information Technology, UAE University,

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)

More information

Finding persisting states for knowledge discovery in time series

Finding persisting states for knowledge discovery in time series Finding persisting states for knowledge discovery in time series Fabian Mörchen and Alfred Ultsch Data Bionics Research Group, Philipps-University Marburg, 3532 Marburg, Germany Abstract. Knowledge Discovery

More information

Latent Geographic Feature Extraction from Social Media

Latent Geographic Feature Extraction from Social Media Latent Geographic Feature Extraction from Social Media Christian Sengstock* Michael Gertz Database Systems Research Group Heidelberg University, Germany November 8, 2012 Social Media is a huge and increasing

More information

Available online at ScienceDirect. Procedia Engineering 119 (2015 ) 13 18

Available online at   ScienceDirect. Procedia Engineering 119 (2015 ) 13 18 Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 119 (2015 ) 13 18 13th Computer Control for Water Industry Conference, CCWI 2015 Real-time burst detection in water distribution

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu

More information

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components GT S3 : Sûreté, Surveillance, Supervision Meeting GdR Modélisation, Analyse et Conduite des Systèmes Dynamiques (MACS) January 28 th, 2014 A Data-driven Approach for Remaining Useful Life Prediction of

More information

Application and verification of ECMWF products 2010

Application and verification of ECMWF products 2010 Application and verification of ECMWF products Hydrological and meteorological service of Croatia (DHMZ) Lovro Kalin. Summary of major highlights At DHMZ, ECMWF products are regarded as the major source

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management PREDICTIVE COMPLEX EVENT PROCESSING USING EVOLVING BAYESIAN NETWORK HUI GAO, YONGHENG WANG* * College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China DOI: 10.5281/zenodo.1204185

More information

Searching Dimension Incomplete Databases

Searching Dimension Incomplete Databases IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO., JANUARY 3 Searching Dimension Incomplete Databases Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang Abstract

More information

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Differentially Private Real-time Data Release over Infinite Trajectory Streams Differentially Private Real-time Data Release over Infinite Trajectory Streams Kyoto University, Japan Department of Social Informatics Yang Cao, Masatoshi Yoshikawa 1 Outline Motivation: opportunity &

More information

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Khantil Patel, Orland Hoeber, and Howard J. Hamilton Department of Computer Science University of Regina, Canada patel26k@uregina.ca,

More information

Probabilistic Frequent Itemset Mining in Uncertain Databases

Probabilistic Frequent Itemset Mining in Uncertain Databases Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD'9), Paris, France, 29. Probabilistic Frequent Itemset Mining in Uncertain Databases Thomas Bernecker, Hans-Peter Kriegel, Matthias

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules

Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules Elaine P. M. de Sousa, Marcela X. Ribeiro, Agma J. M. Traina, and Caetano Traina Jr. Department of Computer Science

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

Fast Comparison of Software Birthmarks for Detecting the Theft with the Search Engine

Fast Comparison of Software Birthmarks for Detecting the Theft with the Search Engine 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science &

More information

On-line Dynamic Time Warping for Streaming Time Series

On-line Dynamic Time Warping for Streaming Time Series On-line Dynamic Time Warping for Streaming Time Series Izaskun Oregi 1, Aritz Pérez 2, Javier Del Ser 1,2,3, and José A. Lozano 2,4 1 TECNALIA, 48160 Derio, Spain {izaskun.oregui,javier.delser}@tecnalia.com

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Learning to Learn and Collaborative Filtering

Learning to Learn and Collaborative Filtering Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany

More information

Advanced Techniques for Mining Structured Data: Process Mining

Advanced Techniques for Mining Structured Data: Process Mining Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII Problem definition 1. Given

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Dynamic Time Warping and FFT: A Data Preprocessing Method for Electrical Load Forecasting

Dynamic Time Warping and FFT: A Data Preprocessing Method for Electrical Load Forecasting Vol. 9, No., 18 Dynamic Time Warping and FFT: A Data Preprocessing Method for Electrical Load Forecasting Juan Huo School of Electrical and Automation Engineering Zhengzhou University, Henan Province,

More information

Fast Approximation of Probabilistic Frequent Closed Itemsets

Fast Approximation of Probabilistic Frequent Closed Itemsets Fast Approximation of Probabilistic Frequent Closed Itemsets ABSTRACT Erich A. Peterson University of Arkansas at Little Rock Department of Computer Science 280 S. University Ave. Little Rock, AR 72204

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

Topological Data Analysis for Brain Networks

Topological Data Analysis for Brain Networks Topological Data Analysis for Brain Networks Relating Functional Brain Network Topology to Clinical Measures of Behavior Bei Wang Phillips 1,2 University of Utah Joint work with Eleanor Wong 1,2, Sourabh

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie, Shenglin Zhao, Zibin Zheng, Jieming Zhu and Michael R. Lyu School of Computer Science and Technology, Southwest

More information

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering Matrix Factorization Techniques For Recommender Systems Collaborative Filtering Markus Freitag, Jan-Felix Schwarz 28 April 2011 Agenda 2 1. Paper Backgrounds 2. Latent Factor Models 3. Overfitting & Regularization

More information

Expert Systems with Applications

Expert Systems with Applications Expert Systems with Applications 36 (29) 5718 5727 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Cluster-based under-sampling

More information

Improving time-series rule matching performance for detecting energy consumption patterns

Improving time-series rule matching performance for detecting energy consumption patterns Improving time-series rule matching performance for detecting energy consumption patterns Maël Guilleme, Laurence Rozé, Véronique Masson, Cérès Carton, René Quiniou, Alexandre Termier To cite this version:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data UAPD: Predicting Urban Anomalies from Spatial-Temporal Data Xian Wu, Yuxiao Dong, Chao Huang, Jian Xu, Dong Wang and Nitesh V. Chawla* Department of Computer Science and Engineering University of Notre

More information

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW 15 Florence, Italy Temporal Multi-View Inconsistency Detection for Network Traffic Analysis Houping Xiao 1, Jing Gao 1, Deepak Turaga 2, Long Vu 2, and Alain Biem 2 1 Department of Computer Science

More information

Mining State Dependencies Between Multiple Sensor Data Sources

Mining State Dependencies Between Multiple Sensor Data Sources Mining State Dependencies Between Multiple Sensor Data Sources C. Robardet Co-Authored with Marc Plantevit and Vasile-Marian Scuturici April 2013 1 / 27 Mining Sensor data A timely challenge? Why is it

More information

Streaming multiscale anomaly detection

Streaming multiscale anomaly detection Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming

More information

Corroborating Information from Disagreeing Views

Corroborating Information from Disagreeing Views Corroboration A. Galland WSDM 2010 1/26 Corroborating Information from Disagreeing Views Alban Galland 1 Serge Abiteboul 1 Amélie Marian 2 Pierre Senellart 3 1 INRIA Saclay Île-de-France 2 Rutgers University

More information

3.6 NCEP s Global Icing Ensemble Prediction and Evaluation

3.6 NCEP s Global Icing Ensemble Prediction and Evaluation 1 3.6 NCEP s Global Icing Ensemble Prediction and Evaluation Binbin Zhou 1,2, Yali Mao 1,2, Hui-ya Chuang 2 and Yuejian Zhu 2 1. I.M. System Group, Inc. 2. EMC/NCEP AMS 18th Conference on Aviation, Range,

More information

Very Sparse Random Projections

Very Sparse Random Projections Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,

More information

Reducing The Data Transmission in Wireless Sensor Networks Using The Principal Component Analysis

Reducing The Data Transmission in Wireless Sensor Networks Using The Principal Component Analysis Reducing The Data Transmission in Wireless Sensor Networks Using The Principal Component Analysis A Rooshenas, H R Rabiee, A Movaghar, M Y Naderi The Sixth International Conference on Intelligent Sensors,

More information

Handling Uncertainty in Clustering Art-exhibition Visiting Styles

Handling Uncertainty in Clustering Art-exhibition Visiting Styles Handling Uncertainty in Clustering Art-exhibition Visiting Styles 1 joint work with Francesco Gullo 2 and Andrea Tagarelli 3 Salvatore Cuomo 4, Pasquale De Michele 4, Francesco Piccialli 4 1 DTE-ICT-HPC

More information

Introduction The Search Algorithm Grovers Algorithm References. Grovers Algorithm. Quantum Parallelism. Joseph Spring.

Introduction The Search Algorithm Grovers Algorithm References. Grovers Algorithm. Quantum Parallelism. Joseph Spring. Quantum Parallelism Applications Outline 1 2 One or Two Points 3 4 Quantum Parallelism We have discussed the concept of quantum parallelism and now consider a range of applications. These will include:

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

Analysis Based on SVM for Untrusted Mobile Crowd Sensing

Analysis Based on SVM for Untrusted Mobile Crowd Sensing Analysis Based on SVM for Untrusted Mobile Crowd Sensing * Ms. Yuga. R. Belkhode, Dr. S. W. Mohod *Student, Professor Computer Science and Engineering, Bapurao Deshmukh College of Engineering, India. *Email

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Efficient search of the best warping window for Dynamic Time Warping

Efficient search of the best warping window for Dynamic Time Warping Efficient search of the best warping window for Dynamic Time Warping Chang Wei Tan 1 Matthieu Herrmann 1 Germain Geoffrey I. Forestier 2,1 Webb 1 François Petitjean 1 1 Faculty of IT, Monash University,

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Multiple Bits Distributed Moving Horizon State Estimation for Wireless Sensor Networks. Ji an Luo

Multiple Bits Distributed Moving Horizon State Estimation for Wireless Sensor Networks. Ji an Luo Multiple Bits Distributed Moving Horizon State Estimation for Wireless Sensor Networks Ji an Luo 2008.6.6 Outline Background Problem Statement Main Results Simulation Study Conclusion Background Wireless

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

Decision trees for stream data mining new results

Decision trees for stream data mining new results Decision trees for stream data mining new results Leszek Rutkowski leszek.rutkowski@iisi.pcz.pl Lena Pietruczuk lena.pietruczuk@iisi.pcz.pl Maciej Jaworski maciej.jaworski@iisi.pcz.pl Piotr Duda piotr.duda@iisi.pcz.pl

More information