The Caterpillar -SSA approach: automatic trend extraction and other applications. Theodore Alexandrov

Similar documents
The Caterpillar -SSA approach to time series analysis and its automatization

Chapter 2 Basic SSA. 2.1 The Main Algorithm Description of the Algorithm

Analysis of Time Series Structure: SSA and Related Techniques

Automatic Singular Spectrum Analysis and Forecasting Michele Trovero, Michael Leonard, and Bruce Elsheimer SAS Institute Inc.

SSA analysis and forecasting of records for Earth temperature and ice extents

Singular Spectrum Analysis for Forecasting of Electric Load Demand

Vulnerability of economic systems

Polynomial-exponential 2D data models, Hankel-block-Hankel matrices and zero-dimensional ideals

arxiv: v2 [stat.me] 18 Jan 2013

On the choice of parameters in Singular Spectrum Analysis and related subspace-based methods

If we want to analyze experimental or simulated data we might encounter the following tasks:

Variations of Singular Spectrum Analysis for separability improvement: non-orthogonal decompositions of time series

Polynomial Chaos and Karhunen-Loeve Expansion

Lesson 2: Analysis of time series

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

Time Series: Theory and Methods

Principal Component Analysis and Linear Discriminant Analysis

Impacts of natural disasters on a dynamic economy

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Econometric Reviews Publication details, including instructions for authors and subscription information:

ADAPTIVE FILTER THEORY

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Statistical and Adaptive Signal Processing

TRACKING THE US BUSINESS CYCLE WITH A SINGULAR SPECTRUM ANALYSIS

BOUNDARY CONDITIONS FOR SYMMETRIC BANDED TOEPLITZ MATRICES: AN APPLICATION TO TIME SERIES ANALYSIS

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Time Series. Anthony Davison. c

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

1. Select the unique answer (choice) for each problem. Write only the answer.

8. The approach for complexity analysis of multivariate time series

Preprocessing & dimensionality reduction

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

(a)

Singular Value Decomposition

Statistical Geometry Processing Winter Semester 2011/2012

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Communications and Signal Processing Spring 2017 MSE Exam

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric time series forecasting with dynamic updating

1 Math 241A-B Homework Problem List for F2015 and W2016

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

CONTENTS NOTATIONAL CONVENTIONS GLOSSARY OF KEY SYMBOLS 1 INTRODUCTION 1

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Principal Component Analysis

Prompt Network Anomaly Detection using SSA-Based Change-Point Detection. Hao Chen 3/7/2014

New data analysis for AURIGA. Lucio Baggio Italy, INFN and University of Trento AURIGA

Principal Component Analysis CS498

Statistics of Stochastic Processes

Estimating Structural Shocks with DSGE Models

USING THE MULTIPLICATIVE DOUBLE SEASONAL HOLT-WINTERS METHOD TO FORECAST SHORT-TERM ELECTRICITY DEMAND

Variance Decomposition

arxiv: v1 [q-fin.st] 7 May 2016

Classic Time Series Analysis

Linear Algebra - Part II

Baltic Astronomy, vol. 25, , 2016

ADAPTIVE FILTER THEORY

Window length selection of singular spectrum analysis and application to precipitation time series

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

DS-GA 1002 Lecture notes 10 November 23, Linear models

CS6964: Notes On Linear Systems

Spectral Regularization

I. Multiple Choice Questions (Answer any eight)

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES

Population Games and Evolutionary Dynamics

Sparsity in system identification and data-driven control

Choosing the Regularization Parameter

CSE 554 Lecture 7: Alignment

Tutorial on Principal Component Analysis

Geometric Modeling Summer Semester 2012 Linear Algebra & Function Spaces

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Part 2 Introduction to Microlocal Analysis

Numerical Analysis Comprehensive Exam Questions

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Extraction of Individual Tracks from Polyphonic Music

Automated Modal Parameter Estimation For Operational Modal Analysis of Large Systems

Regularization via Spectral Filtering

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Expectation Maximization

Fourier Analysis of Stationary and Non-Stationary Time Series

UNIT 6: The singular value decomposition.

Sampling and Low-Rank Tensor Approximations

Output high order sliding mode control of unity-power-factor in three-phase AC/DC Boost Converter

EE482: Digital Signal Processing Applications

Announcements (repeat) Principal Components Analysis

Analysis of Discrete-Time Systems

Principal Component Analysis

Provable Alternating Minimization Methods for Non-convex Optimization

CENTER MANIFOLD AND NORMAL FORM THEORIES

Learning with Singular Vectors

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

On a Data Assimilation Method coupling Kalman Filtering, MCRE Concept and PGD Model Reduction for Real-Time Updating of Structural Mechanics Model

The Cayley-Hamilton Theorem and the Jordan Decomposition

Application of Principal Component Analysis to TES data

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Transcription:

The Caterpillar -SSA approach: automatic trend extraction and other applications Theodore Alexandrov theo@pdmi.ras.ru St.Petersburg State University, Russia Bremen University, 28 Feb 2006 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 1/1

History Present Origins of the Caterpillar -SSA approach Singular System Analysis (Broomhead) Dynamic Systems, method of delays for analysis of attractors [middle of 80 s], Singular Spectrum Analysis (Vautard, Ghil, Fraedrich) Geophysics/meteorology signal/noise enhancing, signal detection in red noise (Monte Carlo SSA) [90 s], Caterpillar (Danilov, Zhigljavsky, Solntsev, ekrutkin, Golyandina) Principal Component Analysis for time series [end of 90 s], Present Automation: papers are published, see http://www.pdmi.ras.ru/ theo/autossa/ (Alexandrov, Golyandina) Change-point detection (Golyandina, ekrutkin, Zhigljavsky) Missed observations: a paper is published, a software is on www.gistatgroup.com (Golyandina, Osipov) 2-channel SSA: a paper is published, see www.gistatgroup.com (Golyandina, Stepanov) Some generalizations Future 2D, online Caterpillar -SSA... AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 2/1

Tasks and advantages Caterpillar -SSA kernel Theoretical framework, the most important concepts are Time series of finite rank (=order of Linear Recurrent Formula) Separability (possibility to separate/extract additive components) General tasks Additive components extraction (for example trend, harmonics, exp.modulated harmonics) Smoothing (self-adaptive linear filter with small L) Automatic calculation of LRF for t.s. of finite rank => prolongation of an extracted additive component => forecast of an extracted additive component Change-point detection Advantages Model-free Works with non-stationary time series (constrains will be described) Suits for short t.s., robust to noise model etc. AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 3/1

Caterpillar -SSA basic algorithm Decomposes time series into sum of additive components: F = F (1) Provides the information about each component +... + F(m) Algorithm 1. Trajectory matrix construction: F = (f 0,..., f 1 ), F X R L K (L window length, parameter) X = f 0 f 1... f L f 1 f 2... f L+1.......... f L 1 f L... f 1 2. Singular Value Decomposition (SVD): X = X j X j = λj U j V T j λ j eigenvalue, U j e.vector of XX T, V j e.vector of X T X, V j = X T U j / λ j 3. Grouping of SVD components: {1,..., d} = I k, X (k) = j I k X j 4. Reconstruction by diagonal averaging: X (k) (k) F = (1) (m) +... + Does exist an SVD such that it forms necessary additive component & how to group SVD components? AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 4/1

Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting, 1983) =180, L=60 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 5/1

Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting, 1983) =180, L=60 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 5/1

Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting, 1983) =180, L=60 SVD components: 1, 4, 5 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 5/1

Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting, 1983) =180, L=60 SVD components: 1, 4, 5 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 5/1

Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting, 1983) =180, L=60 SVD components: 1, 4, 5 SVD components with estimated periods: 2-3 (T=12), 6-7(T=6), 8(T=2), 11-12(T=4), 13-14(T=2.4) AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 5/1

Finite rank time series We said model-free, but the area of action is constrained to: span(exp cos P n). Important concepts L (L) = L (L) = span(x 1,..., X K ) the trajectory space for F, X i = (f i 1,..., f i+l 2 ) T. Time series F is a time series of (finite) rank d (rank(f ) = d), if L dim L (L) = d. Rank amount of SVD components order of LRF rank L (F ) = rank X amount of SVD components with λ j 0 is equal to the rank. F = (..., f 1, f 0, f 1,...) infinite time series, then f i+d = d k=1 a kf i+d k, a d 0 rank(f) = d. Examples of finite rank time series Exponentially modulated (e-m) harmonic F : f n = Ae αn cos(2πωn + φ). e-m harmonic (0 < ω < 1/2): rank = 2 e-m saw (ω = 1/2): rank = 1 exponential time series (ω = 0): rank = 1 harmonic (α = 1): rank = 2 Polynomial F : f n = m k=0 a kn k, a m 0: rank = m + 1 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 6/1

Separability F = F (1) + F(2), window length L, traj.matrices X = X(1) + X (1), traj.spaces L (L,1), L (L,2). F (1) and F(2) are the L-separable if L(L,1) L (L,2) and L (K,1) L (K,2). If F (1) and F(2) are separable then the SVD components of X can be grouped so that the first group corresponds to X (1) and the second to X (2). Reality: i.e. separability (separation of trajectory spaces) separation of additive components Approximate separability (approximate orthogonality of trajectory spaces) Asymptotic separability (with L, ) Examples Separability (strict, asymptotic) on some conditions const cos exp exp*cos Pn const - - + + - + - + - - cos + + + + - + - + - + exp - + - + - + + + - + exp*cos - + - + + + + + - + Pn - - - + - + - + - - signal is asymptotically separated from noise periodicity is asymptotically separated from trend Separability conditions (and the rate of convergence) rules for L setting (this problem had no solution before) AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 7/1

General trend extraction Trend slow varying deterministic additive component. Examples of parametric trends: exp, Pn, harmonic with large T (T > /2). How to identify trend SVD components Eigenvalues λ j contribution of F (j) to the form of F (F (j) is reconstructed by λ j U j V T j ). Trend is large its SVD components are the first. Eigenvectors U j = (u (j) 1,..., u(j) L )T Form of eigenvectors for some slow-varying time series f n u ( ) k e αn e αk m a mn m m b mk m Trend SVD components have slow-varying eigenvectors. e αn cos(2πωn + φ) e αk cos(2πωk + ψ) AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 8/1

Continuation and forecast Continuation F = (f 0,..., f 1 ), rank(f ) = d < L, then,typically, F is governed by LRF of order d. Main variant of the continuation: recurrent continuation using LRF. There are the unique minimal LRF (order d) and many LRFs of order > d The Caterpillar -SSA: L (L) : an orthogonal basis (e.g. eigenvectors) the LRF of order L 1 (automatically) deflation of LRF (considering characteristic polynomial of LRF) (can be automated) Continuation of an additive component F = F (1) F (1) and F(2) + F(2), rank(f(1) ) = d 1 < L, rank(f (1) ) = d 1 < L. are separable we can continue them separately. Forecast (approximate) F = F (1) approximation of F (1) + F(2), approximate separability (for example, signal+noise or slightly distorted time series) by d-dimension trajectory space approximation by LRF. AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 9/1

Change-point detection in brief Problem statement F is homogeneous if it is governed by some LRF with order <<. Assume that at one time it stops following the original LRF and after a certain time period it again becomes governed by an LRF. We want to a posteriori detect such heterogeneities (change-points) and compare somehow homogeneous partes before and after change. Solution F is governed by the LRF for suff. large, L: L (L) = span(x 1,..., X K ) is independent of Minimal LRF L (L) LRF heterogeneities can be described in terms of corresponding lagged vectors: the perturbations force the lagged vectors to leave the space L (L) We can measure this distance and detect a change-point AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 10/1

Automation of time series processing Problem statement Automation of manual processing of large set of similar time series (family) Manual processing is the ideal quality in comparison with manual processing we can use the stated theory Why family? Family processing several randomly taken time series can be used for: testing the auto-procedure, whether it works in general (necessary condition) finding proper parameters of the auto-procedure for the whole family (performance optimization) Bootstrap test of the procedure We must know noise (residual) model (or its approximation) Extract trend Consider F manually and estimate parameters of noise Simulate noise using these parameters and generate surrogate data: G = Extract trend from surrogate data MSE( G, G ) a measure of procedure quality + noise AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 11/1

Auto-method of trend extraction Eigenvectors of trend SVD components have slow-varying form Search all eigenvectors, let us assume we process an U = (u 1,..., u L ) T. ( u n = c 0 + ck cos(2πnk/l) + s k sin(2πnk/l) ) + ( 1) n c L/2, 1 k L 1 2 Periodogram Π(ω), ω {k/l}, reflects the contribution of a harmonic with the frequency ω into the Fourier decomposition of U. Π L U (k/l) = L 2 2c 2 0, k = 0, c 2 k + s 2 k, 1 k L 1 2, 2c 2 L/2, L is even and k = L/2. Low Frequencies method Parameter ω 0, upper boundary for the low frequencies interval Define C(U) = 0 ω ω Π(ω) 0, ω k/l, k Z contribution of LF frequencies Π(ω) 0 ω 0.5 + examples C(U) C 0 eigenvector U corresponds to a trend, where C 0 (0, 1) the threshold AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 12/1

Choice of ω 0 Examining the periodogram of an original time series Periodicity with period T exists ω 0 < 1/T Examples Exp+cos and its periodogram, f n = e 0.01 n + cos(2 π n/12) 0.3 < ω 0 < 0.8 Pn+cos and its periodogram, f n = (x 10)(x 40)(x 60)(x 95) cos(2 π n/12) ω 0 0.7 < 0.8 Traffat (left), its periodogram (center) and periodogram of normalized time series (right) 0.3 < ω 0 < 0.8 AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 13/1

Choice of C 0, measure of quality of trend extraction If we have a measure R of quality of trend extraction C opt = argmin C0 [0,1] R Measure The natural measure of quality is MSE(F, trend (with C 0 ), but it requires unknown F. 0 ), where F is the real trend and 0 is the extracted We propose R(C 0 ) = C(F 0 ) C(F), 0 is extracted with C 0 R(C 0 ) is consistent with MSE(F, it behaves like MSE 0 ) in such a way: by means of R(C 0 ) we can define C opt = argmin C0 MSE(F, 0 ) AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 14/1

Examples of C opt estimation Model example, Pn+noise f n = (n 10)(n 70)(n 160) 2 (n 290) 2 /1e11 + (0, 25), = 329, L = /2 = 160, ω 0 = 0.07 C 0 = 1... 0.9 : graphics reflect stepwise identification of trend SVD components considerable changes of MSE(F, 0 ) C opt < 0.9( 0.9) Real-life example, Massachusetts unemployment Massachusetts unemployment (thousands, monthly), from economagic.com = 331, L = /2 = 156, ω 0 = 0.05 < 1/12 = 0.08(3) C 0 = 1... 0.75 : graphics reflect stepwise identification of trend SVD components considerable changes of MSE(F, 0 ) C opt < 0.75( 0.75) AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 15/1

Final slide: to sum up I. We have a family of similar time series F = {F }. II. Take randomly (or somehow otherwise) a test subset T of several characteristic time series. On these time series perform: 1. Extract trends manually 2. Examine periodograms of the time series and choose ω 0 3. Check if the proposed auto-procedure works in general on such time series: Bootstrap comparison: how trends automatically extracted from surrogate data are close to manually extracted trends If they are sufficiently close then auto-procedure is accepted It requires knowledge of noise model but we can take a simple one as a first approximation 4. Estimate C (F) opt for each time series and take a minimum from them C (F) opt = min F T C(F) opt as the optimal C 0 for the family F (this optimization step can be skipped, then during processing of F we have to estimate C opt for each time series) III. Process all time series from the family F AutoSSA: http://www.pdmi.ras.ru/ theo/autossa/ p. 16/1