Learning Time-Series Shapelets

Size: px

Start display at page:

Download "Learning Time-Series Shapelets"

Margaret Carson
5 years ago
Views:

1 Learning Time-Series Shapelets Josif Grabocka, Nicolas Schilling, Martin Wistuba and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany SIGKDD 14, 6/8/14 SIGKDD 14, 6/8/14 1 / 1

2 What are Time-Series Shapelets? (I) Definition: Patterns whose minimum distances to time-series yield discriminative predictors [Ye and Keogh(9), Lines et al.(1)lines, Davis, Hills, and Bagnall] Problem: 1. Learn K discriminative shapelets of length L (denoted as S R K L ).. From a dataset that has I time-series instances of length M (denoted as T R I M ), where each series is divided into J = M L sliding window segments. SIGKDD 14, 6/8/14 1 / 1

3 What are Time-Series Shapelets? (II) The minimum distances M R I K between shapelets and time-series: 1 M i,k = min j=1,...,j L... yield discriminative predictors: L (T i,j+l 1 S k,l ) (1) l=1 1 Shapelet transformed data S1 S T1 T 1 1 T3 T4 1 1 T S.9 T 4 T T T T S 1 Figure 1: Left: Two shapelets, Middle: Closest Segments, Right: Shapelet-Transform ed data SIGKDD 14, 6/8/14 / 1

4 Related Shapelets Work All the possible segments (exhaustively) of all time-series candidates are potential shapelet candidates: Compute the prediction accuracy of minimum distances of each candidate and then rank the top-k by prediction accuracy (feature ranking) Build a decision tree from the top-k features [Ye and Keogh(9), Lines et al.(1)lines, Davis, Hills, and Bagnall] Alternatively, use the new K-dimensional feature representation and use standard classifiers [Hills et al.(13)hills, Lines, Baranauskas, Mapp, and Bagnall]. Exhaustive search is expensive, speed-ups were proposed [Mueen et al.(11)mueen, Keogh, and Young, Rakthanmanon and Keogh(13)]. SIGKDD 14, 6/8/14 3 / 1

5 Proposed Method (I) A linear model of predictors M R I K and weights W R K, W R: can be used to estimate the target Ŷ R I : Ŷ i = W + K M i,k W k i {1,..., I } () k=1 The risk of estimating the true target Y { 1, +1} I from approximated target Ŷ R I is the logistic loss L(Y, Ŷ ) R I : ( ) L(Y i, Ŷi) = Y i ln σ(ŷi) (1 Y i ) ln 1 σ(ŷi), i {1,..., I } (3) SIGKDD 14, 6/8/14 4 / 1

6 Proposed Method (II) The objective function F R is a regularized loss function: argmin S,W F(S, W ) = argmin S,W I L(Y i, Ŷ i ) + λ W W (4) i=1 The objective function F can be decomposed into per-instance objectives F i : F i = L(Y i, Ŷi) + λ W I K W k, i {1,..., I } (5) k=1 Mission of This Paper: Learn S, W that minimize F. SIGKDD 14, 6/8/14 5 / 1

7 Differentiable Minimum Function (I) Approximate the true minimum M with the soft-minimum version ˆM: M i,k ˆM i,k = J j=1 D i,k,j e α D i,k,j J j =1 eα D i,k,j, α (, ] i {1,..., I }, k {1,..., K} (6) D i,k,j := 1 L L (T i,j+l 1 S k,l ), l=1 i {1,..., I }, k {1,..., K}, j {1,..., J} (7) SIGKDD 14, 6/8/14 6 / 1

8 Differentiable Minimum Function (II) The smooth approximation of the minimum function, allows only the minimum segment to contribute for α. 5.5 Series Shapelet Euclidean x Soft min, alpha= x 1 3 Soft min, alpha= Figure : Illustration of the soft minimum between a shapelet (green) and all the segments of a series (black) from the FaceFour dataset SIGKDD 14, 6/8/14 7 / 1

9 Learning Algorithm (I) The partial derivative of the per-instance objective function F i with respect to the l-th point of the k-th shapelet S k,l is computed using the chain rule of derivation: F i = L(Y i, Ŷi) Ŷi S k,l Ŷi ˆM i,k J j=1 ˆM i,k D i,k,j D i,k,j S k,l (8) F i = L(Y i, Ŷ i ) Ŷ i + Reg(W ) F i, = L(Y i, Ŷ i ) (9) W k Ŷ i Ŵ k W k W Ŷ i All the components of the partial derivative are computable as follows: L(Y i, Ŷ i ) Ŷ i ( )) = Y i σ (Ŷi, ( ˆM e α D i,k,j 1 + α i,k = D i,k,j Ŷ i ˆM i,k = W k, (D i,k,j ˆM i,k )) J j =1 eα D i,k,j Ŷ i Ŵ k = M i,k,, D i,k,j S k,l = L Reg(W ) W k = λ W I ( Sk,l T i,j+l 1 ) W k (1) SIGKDD 14, 6/8/14 8 / 1 (11)

10 Learning Algorithm (II) Require: T R I Q, Number of Shapelets K, Length of a shapelet L, Regularization λ W, Learning Rate η, Number of iterations: maxiter Ensure: Shapelets S R K L, Classification weights W R K, Bias W R 1: for iteration=1,..., maxiter do : for i = 1,..., I do 3: W W η F i W 4: for k = 1,..., K do 5: W k W k η F i W k 6: for L = 1,..., L do 7: S k,l S k,l η F i S k,l 8: return S, W, W SIGKDD 14, 6/8/14 9 / 1

11 Illustration of Learning a) Iteration b) Iteration 4 c) Iteration 8.6 Y= Y=1 W.7 Y= Y=1 W.8 Y= Y=1 W M M M M 1 S 1 (It. ) S (It. ) M 1 S 1 (It. 4) S (It. 4) M 1 S 1 (It. 8) S (It. 8) Time Time Time Time Time Time Figure 3: Learning Two Shapelets on the Gun-Point Dataset, (L = 4, η =.1, λ W =.1, α = 1 ) SIGKDD 14, 6/8/14 1 / 1

12 Advantage over Feature Ranking 1. Discover hidden/latent shapelets. Interactions of Variables M 1 M Pos. Class Neg. Class M M 1 Figure 4: Interactions among Shapelets Enable Individually Unsuccessful Shapelets (left plots) to Excel in Cooperation (right plot) SIGKDD 14, 6/8/14 11 / 1

13 Experimental Setup Our method, denoted LS, is compared against 13 baselines using 8 time-series dataset: Baselines: Quality Criteria: Information gain (IG), Kruskall-Wallis (KW), F-Stats (FT), Mood s Median Criterion (MM); Using shapelet-transformed data: Nearest Neighbors (1NN), Naive Bayes (NB), C4.5 tree (C4), Bayesian Networks (BN), Random Forest (RA), Rotation Forest (RO), Support Vector Machines (SV); Other Related: Fast Shapelets (FS), Dynamic Time Warping (DT). Datasets: 8 time-series datasets from the UCR and UEA collections from diverse domains; Provided train/test splits Hyper-parameters: are found using grid-search, by testing over a validation split from the training data SIGKDD 14, 6/8/14 1 / 1

14 Results IG KW FT MM DT C4 NN NB BN RF RO SV FS LS Total Wins LS Wins LS Draws LS Losses Rank Mean Rank C.I Rank S.D W.t. p-val Table 1: Comparative Figures of Classification Accuracies over 8 Time-series Datasets SIGKDD 14, 6/8/14 13 / 1

15 Conclusions 1. Learning shapelets to directly optimize the classification objective improves classification accuracy. Supervised Shapelet Learning more accurate than Exhaustive Shapelet Discovery Shapelets are not restricted to series segments Considers interactions among minimum distances features 3. Results against 13 baselines using 8 datasets validate the claim SIGKDD 14, 6/8/14 14 / 1

16 References J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery, 13. J. Lines, L. Davis, J. Hills, and A. Bagnall. A shapelet transform for time series classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1. A. Mueen, E. Keogh, and N. Young. Logical-shapelets: an expressive primitive for time series classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 11. T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. Proceedings of the 13th SIAM International Conference on Data Mining, 13. L. Ye and E. Keogh. Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 9. SIGKDD 14, 6/8/14 15 / 1

17 Back-up Slide: Dependence on Initialization 1 a) Train Loss 4 1 b) Train Error S 16: S 16: S 1: S 1:15. Figure 5: Sensitivity of Shapelet Initialization, Gun-Point dataset, Parameters: L = 3, η =.1, λ W =.1, Iterations= 3, α = 1 We used K-Means centroids as initial shapelets. SIGKDD 14, 6/8/14 16 / 1

Recent advances in Time Series Classification

Distance Shapelet BoW Kernels CCL Recent advances in Time Series Classification Simon Malinowski, LinkMedia Research Team Classification day #3 S. Malinowski Time Series Classification 21/06/17 1 / 55