Time Series Classification

Similar documents
SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Recent advances in Time Series Classification

When Dictionary Learning Meets Classification

Mining Classification Knowledge

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Mining Classification Knowledge

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

ECE521 week 3: 23/26 January 2017

Introduction to Signal Detection and Classification. Phani Chavali

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Statistical Pattern Recognition

MIRA, SVM, k-nn. Lirong Xia

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

STA 414/2104, Spring 2014, Practice Problem Set #1

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Machine Learning Linear Models

L11: Pattern recognition principles

Bits of Machine Learning Part 1: Supervised Learning

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Supervised locally linear embedding

Introduction. Chapter 1

Reservoir Computing and Echo State Networks

Support Vector Machine (continued)

Neural Networks Lecture 2:Single Layer Classifiers

Linear & nonlinear classifiers

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Machine Learning 2nd Edition

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Statistical Pattern Recognition

Kernel Methods and Support Vector Machines

GI01/M055: Supervised Learning

Switch Mechanism Diagnosis using a Pattern Recognition Approach

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Missing Data and Dynamical Systems

Neural Networks DWML, /25

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Machine Learning on temporal data

Pointwise Exact Bootstrap Distributions of Cost Curves

Active and Semi-supervised Kernel Classification

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Supervised Learning: Non-parametric Estimation

Neural Network Training

Recap from previous lecture

Lecture Notes 1: Vector spaces

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Discriminative Learning and Big Data

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Linear & nonlinear classifiers

Issues and Techniques in Pattern Classification

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Naïve Bayes classification

Kaviti Sai Saurab. October 23,2014. IIT Kanpur. Kaviti Sai Saurab (UCLA) Quantum methods in ML October 23, / 8

6.867 Machine learning

Chapter 14 Combining Models

Introduction to Machine Learning

Linear Models for Classification

Brief Introduction to Machine Learning

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Efficient and Principled Online Classification Algorithms for Lifelon

ECE 661: Homework 10 Fall 2014

Advanced statistical methods for data analysis Lecture 2

outline Nonlinear transformation Error measures Noisy targets Preambles to the theory

Holdout and Cross-Validation Methods Overfitting Avoidance

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Semi-Supervised Learning of Speech Sounds

Machine Learning Lecture 7

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

STA 4273H: Statistical Machine Learning

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Max Margin-Classifier

Multivariate statistical methods and data mining in particle physics

Curve learning. p.1/35

Undirected Graphical Models

Classification: The rest of the story

Introduction to Machine Learning

Support Vector Machines

Generalized Gradient Learning on Time Series under Elastic Transformations

Statistical learning theory, Support vector machines, and Bioinformatics

Chemometrics: Classification of spectra

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Selection of Classifiers based on Multiple Classifier Behaviour

Scalable robust hypothesis tests using graphical models

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

HMM part 1. Dr Philip Jackson

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Unsupervised Anomaly Detection for High Dimensional Data

Day 3: Classification, logistic regression

Learning with multiple models. Boosting.

Motivating the Covariance Matrix

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Transcription:

Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017

Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED 5 Further Work 6 Questions

Distance Measures Classifiers DTW vs. ED Further Work Questions Classification Sorting objects into pre-defined groups

Distance Measures Classifiers DTW vs. ED Further Work Questions A supervised learning problem We have a training set - data which is already labelled Given a new test time series - which group does it belong to?

Distance Measures Classifiers DTW vs. ED Further Work Questions - Applications Speech recognition Image processing ECG readings

Distance Measures Classifiers DTW vs. ED Further Work Questions Objectives of the Project Investigate commonly used methods Research the limitations and capabilities of these methods Understand when these methods are best used Identify areas of further research

Distance Measures Classifiers DTW vs. ED Further Work Questions Two Step Process 1 Measuring distance between test time series and time series in the training set Euclidean Distance Dynamic Time Warping

Distance Measures Classifiers DTW vs. ED Further Work Questions Two Step Process 1 Measuring distance between test time series and time series in the training set Euclidean Distance Dynamic Time Warping 2 Classifying the time series based on distance from training time series Usually some form of a nearest neighbour algorithm

Distance Measures Classifiers DTW vs. ED Further Work Questions Distance Measures: Euclidean Distance Pointwise difference between time series

Distance Measures Classifiers DTW vs. ED Further Work Questions Distance Measures: Dynamic Time Warping Uses dynamic programming to minimise the difference between the time series Accounts for different time scales

Distance Measures Classifiers DTW vs. ED Further Work Questions Distance Measures: Dynamic Time Warping - Algorithm Take two time series: T = {1, 3, 1, 0} and S = {0, 1, 3, 2} Construct a cost matrix C with C i,j = (T i S j ) 2 1 0 4 1 9 4 0 1 1 0 4 1 0 1 9 4 Construct the matrix D sequentially with D i,j = C i,j + min(d i 1,j 1, D i,j 1, D i 1,j ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Distance Measures Classifiers DTW vs. ED Further Work Questions Distance Measures: Dynamic Time Warping - Algorithm Take two time series: T = {1, 3, 1, 0} and S = {0, 1, 3, 2} Construct a cost matrix C with C i,j = (T i S j ) 2 1 0 4 1 9 4 0 1 1 0 4 1 0 1 9 4 Construct the matrix D sequentially with D i,j = C i,j + min(d i 1,j 1, D i,j 1, D i 1,j ) 1 1 5 6 10 5 1 2 11 5 5 2 11 6 14 6

Distance Measures Classifiers DTW vs. ED Further Work Questions Distance Measures: Dynamic Time Warping - Algorithm Take two time series: T = {1, 3, 1, 0} and S = {0, 1, 3, 2} Construct a cost matrix C with C i,j = (T i S j ) 2 1 0 4 1 9 4 0 1 1 0 4 1 0 1 9 4 Construct the matrix D sequentially with D i,j = C i,j + min(d i 1,j 1, D i,j 1, D i 1,j ) 1 1 5 6 10 5 1 2 11 5 5 2 11 6 14 6

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW Example: Optimal Path 1 1 5 6 10 5 1 2 11 5 5 2 11 6 14 6 Step Cell Alignment 1 (1,1) a 1 b 1 2 (1,2) a 1 b 2 3 (2,3) a 2 b 3 4 (3,4) a 3 b 4 5 (4,4) a 4 b 4

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW Example: Optimal Path 1 1 5 6 10 5 1 2 11 5 5 2 11 6 14 6 Step Cell Alignment 1 (1,1) a 1 b 1 2 (1,2) a 1 b 2 3 (2,3) a 2 b 3 4 (3,4) a 3 b 4 5 (4,4) a 4 b 4

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW Example: Optimal Path 1 1 5 6 10 5 1 2 11 5 5 2 11 6 14 6 Step Cell Alignment 1 (1,1) a 1 b 1 2 (1,2) a 1 b 2 3 (2,3) a 2 b 3 4 (3,4) a 3 b 4 5 (4,4) a 4 b 4

Distance Measures Classifiers DTW vs. ED Further Work Questions Classification Step - K-Nearest Neighbour Assigns label based on the most common label of the K nearest neighbours K is pre-specified Simplest example is 1-NN

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW vs. ED - Theoretical Results Take a training set made up of two time series, a (Class 1) and b (Class 2), and take a time series c we wish to label.

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW vs. ED - Theoretical Results Take a training set made up of two time series, a (Class 1) and b (Class 2), and take a time series c we wish to label. Assumptions All time series are the same length, n c is from Class 1, i.e. c = a, with some white noise - N(0, σ 2 ) We use ED and 1-NN

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW vs. ED - Theoretical Results P(c labelled correctly) = Φ 1 n (a i b i ) 2σ 2, i=1

Distance Measures Classifiers DTW vs. ED Further Work Questions DTW vs. ED - Theoretical Results P(c labelled correctly) = Φ 1 n (a i b i ) 2σ 2, What does this mean? We want: a small σ (variance of the noise) a longer time series i=1 well defined differences between training time series

Distance Measures Classifiers DTW vs. ED Further Work Questions Standard Performance Comparison Data set: coffee bean spectrograph readings

Distance Measures Classifiers DTW vs. ED Further Work Questions Standard Performance Comparison Our performance measure is the proportion of correct classifications

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Shifted Time Series

Distance Measures Classifiers DTW vs. ED Further Work Questions Efficiency Efficiency is very important particularly when using TSC in real time DTW performs very poorly in comparison with ED

Distance Measures Classifiers DTW vs. ED Further Work Questions Efficiency Efficiency is very important particularly when using TSC in real time DTW performs very poorly in comparison with ED Time Taken for Distance Measures (milliseconds) Measure Min. Time Mean Time Max. Time ED 11.21 11.69 55.43 DTW 2533.88 2581.30 3919.35

Distance Measures Classifiers DTW vs. ED Further Work Questions Pros and Cons: DTW vs. ED ED Only works with time series of the same length More resistant to noisy or spikey test data than DTW Fails when data is shifted or transformed with respect to time Quicker/simpler DTW Can be used on time series of any length Generally weaker when data is spikey or noisy Robust when data is shifted or transformed with respect to time Much slower

Distance Measures Classifiers DTW vs. ED Further Work Questions Changing DTW Adding a window: 1 1 10 5 1 5 5 2 14 6 More efficient Not much accuracy lost

Distance Measures Classifiers DTW vs. ED Further Work Questions Further Work Probabilistic K-NN Classification is generally a binary process May be advantageous to give a measure of how sure we are Early TSC Classifying time series before we have received all data Trade-off between accuracy and speed

Distance Measures Classifiers DTW vs. ED Further Work Questions Questions Thank you for listening! Any Questions?