Time Series Outlier Detection

Similar documents
Time Series Analysis -- An Introduction -- AMS 586

γ 0 = Var(X i ) = Var(φ 1 X i 1 +W i ) = φ 2 1γ 0 +σ 2, which implies that we must have φ 1 < 1, and γ 0 = σ2 . 1 φ 2 1 We may also calculate for j 1

Time Series I Time Domain Methods

Ross Bettinger, Analytical Consultant, Seattle, WA

1 Linear Difference Equations

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

ARMA (and ARIMA) models are often expressed in backshift notation.

at least 50 and preferably 100 observations should be available to build a proper model

Part III Example Sheet 1 - Solutions YC/Lent 2015 Comments and corrections should be ed to

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

ESSE Mid-Term Test 2017 Tuesday 17 October :30-09:45

Identifiability, Invertibility

Time Series Analysis

Classic Time Series Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

arxiv: v1 [stat.co] 11 Dec 2012

FE570 Financial Markets and Trading. Stevens Institute of Technology

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Estimation and application of best ARIMA model for forecasting the uranium price.

Dynamic Time Series Regression: A Panacea for Spurious Correlations

AR(p) + I(d) + MA(q) = ARIMA(p, d, q)

Lecture 2: ARMA(p,q) models (part 2)

Applied time-series analysis

Exercises - Time series analysis

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

3 Theory of stationary random processes

Statistics of stochastic processes

STAT 436 / Lecture 16: Key

Some Time-Series Models

Time Series and Forecasting

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Econometría 2: Análisis de series de Tiempo

3. ARMA Modeling. Now: Important class of stationary processes

ITSM-R Reference Manual

Time Series 4. Robert Almgren. Oct. 5, 2009

ARMA models with time-varying coefficients. Periodic case.

Time Series and Forecasting

Scenario 5: Internet Usage Solution. θ j

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

Time Series Analysis. Solutions to problems in Chapter 5 IMM

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Forecasting using R. Rob J Hyndman. 2.5 Seasonal ARIMA models. Forecasting using R 1

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

A Data-Driven Model for Software Reliability Prediction

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

ARIMA Models. Richard G. Pierse

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

ECON/FIN 250: Forecasting in Finance and Economics: Section 7: Unit Roots & Dickey-Fuller Tests

Lesson 9: Autoregressive-Moving Average (ARMA) models

Univariate Time Series Analysis; ARIMA Models

STOR 356: Summary Course Notes

Forecasting. This optimal forecast is referred to as the Minimum Mean Square Error Forecast. This optimal forecast is unbiased because

Discrete time processes

STAT 443 (Winter ) Forecasting

Seasonality. Matthieu Stigler January 8, Version 1.1

Introduction to Time Series Analysis. Lecture 11.

7. Forecasting with ARIMA models

Time Series Analysis - Part 1

Evaluation of Some Techniques for Forecasting of Electricity Demand in Sri Lanka

Econometric Forecasting

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

Forecasting: Principles and Practice. Rob J Hyndman. 12. Advanced methods OTexts.com/fpp/9/2/ OTexts.com/fpp/9/3/

Time Series 2. Robert Almgren. Sept. 21, 2009

Autoregressive and Moving-Average Models

Lecture # 37. Prof. John W. Sutherland. Nov. 28, 2005

Lecture 19 Box-Jenkins Seasonal Models

Circle a single answer for each multiple choice question. Your choice should be made clearly.

We will only present the general ideas on how to obtain. follow closely the AR(1) and AR(2) cases presented before.

2. An Introduction to Moving Average Models and ARMA Models

Ch 9. FORECASTING. Time Series Analysis

Econ 623 Econometrics II Topic 2: Stationary Time Series

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

Module 4. Stationary Time Series Models Part 1 MA Models and Their Properties

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

11. Further Issues in Using OLS with TS Data

Statistical Methods for Forecasting

Test for Parameter Change in ARIMA Models

Econometría 2: Análisis de series de Tiempo

MCMC analysis of classical time series algorithms.

A time series is called strictly stationary if the joint distribution of every collection (Y t

Chapter 15 Finding Outliers in Linear and Nonlinear Time Series

Covariances of ARMA Processes

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Lecture 2: Univariate Time Series

Class 1: Stationary Time Series Analysis

Univariate Nonstationary Time Series 1

Stat 565. (S)Arima & Forecasting. Charlotte Wickham. stat565.cwick.co.nz. Feb

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Forecasting. Simon Shaw 2005/06 Semester II

Data Mining Techniques

We use the centered realization z t z in the computation. Also used in computing sample autocovariances and autocorrelations.

Transcription:

Time Series Outlier Detection Tingyi Zhu July 28, 2016 Tingyi Zhu Time Series Outlier Detection July 28, 2016 1 / 42

Outline Time Series Basics Outliers Detection in Single Time Series Outlier Series Detection from Multiple Time Series Demos Tingyi Zhu Time Series Outlier Detection July 28, 2016 2 / 42

Time Series Basics Tingyi Zhu Time Series Outlier Detection July 28, 2016 3 / 42

First-order Autoregression A model denoted as AR(1), in which the value of X at time t is a linear function of the value of X at time t 1: Assumptions: ε t i.i.d N(0, σ), stochastic term. ε t is independent of X t. X t = φx t 1 + ε t (1) Tingyi Zhu Time Series Outlier Detection July 28, 2016 4 / 42

General Autoregressive Model AR(p): X t = φ 1 X t 1 + φ 2 X t 2 + + φ p X t p + ε t p = φ i X t i + ε t = i=1 p φ i B i X t + ε t i=1 where we use the backshift operator B (BX t = X t 1, B k X t = X t k ). Alternative notation: φ(b) is a polynomial of B, φ(b)x t = ε t φ(b) = 1 φ 1 B φ 2 B 2 φ p B p = 1 p φ i B i i=1 Tingyi Zhu Time Series Outlier Detection July 28, 2016 5 / 42

Moving Average Another approach for modeling univariate time series X t depends linearly on its own current and previous stochastic terms MA(1): MA(q): X t = ε t + θ 1 ε t 1 X t = ε t + θ 1 ε t 1 + + θ q ε t q Tingyi Zhu Time Series Outlier Detection July 28, 2016 6 / 42

θ 1,..., θ q : parameters of MA model ε t,..., ε t q : stochastic terms Using backshift operator B, model simplified as X t = (1 + θ 1 B + + θ q B q )ε t q = (1 + θ i B i )ε t = θ(b)ε t i=1 Tingyi Zhu Time Series Outlier Detection July 28, 2016 7 / 42

ARMA Model A model consists of both autoregressive (AR) part and moving average (MA) part: X t = p q φ i X t i + ε t + θ i ε t i (2) i=1 i=1 referred to as the ARMA(p,q) model. p: the order of the autoregressive part q: the order of the moving average part More concisely, using backshift operator B, (2) becomes: φ(b)x t = θ(b)ε t Tingyi Zhu Time Series Outlier Detection July 28, 2016 8 / 42

Stationarity of Time Series In short, a time series is stationary if its statistical properties are all constant over time. To mention some properties: Mean: E[X t ] = E[X s ] for any t, s Z, Variance: Var[X t ] = Var[X s ] for any t, s Z, Joint distribution: Cov(X t, X t+1 ) = Cov(X s, X s+1 ) for any t, s Z. Tingyi Zhu Time Series Outlier Detection July 28, 2016 9 / 42

Tingyi Zhu Time Series Outlier Detection July 28, 2016 10 / 42

Requirements for a Stationary Time Series AR(1) X t = φx t 1 + ε t : φ < 1 AR(p) φ(b)x t = ε t : All the roots of φ(z) = 0 are outside unit circle. MA models are always stationary ARMA(p,q) φ(b)x t = θ(b)ε t : All the roots of φ(z) = 0 are outside unit circle. Tingyi Zhu Time Series Outlier Detection July 28, 2016 11 / 42

Non-stationary time series Trend effect Seasonal effect AirPassengers 100 200 300 400 500 600 1950 1952 1954 1956 1958 1960 Time Figure: Monthly totals of international airline passengers, 1949 to 1960. Tingyi Zhu Time Series Outlier Detection July 28, 2016 12 / 42

Time Series Decomposition Think of a more general time series formulation including both trend and seasonal effect: X t = T t + S t + E t (3) X t is data point at time t Tt is the trend component at time t St is the seasonal component at time t E t is the remainder component at time t (containing AR and MA terms) Tingyi Zhu Time Series Outlier Detection July 28, 2016 13 / 42

Series with Trend, examples: Assuming no seasonal effect, i.e. S t = 0 Linear trend: X t = 2t + 0.5X t 1 + ε t Quadratic trend: X t = 2t + t 2 + 0.5X t 1 + ε t Goal: remove the trend, to transform the series to be stationary Solution: lag-1 differencing Tingyi Zhu Time Series Outlier Detection July 28, 2016 14 / 42

Differencing and Trend Define the lag-1 difference operator, where B is the backshift operator. If X t = β 0 + β 1 t + E t, then If X t = k i=0 β it i + E t, then X t = X t X t 1 = (1 B)X t, X t = β 1 + E t. k X t = (1 B) k X t = k!β k + k E t. we call k kth lag-1 difference operator. Tingyi Zhu Time Series Outlier Detection July 28, 2016 15 / 42

Lag-1 Differencing S&P 500 Quote Year To Date S&P 500 YTD Lag 1 Differencing 1850 1950 2050 2150 80 60 40 20 0 20 40 Jan 04 2016 Mar 01 2016 May 02 2016 Jul 01 2016 Jan 04 2016 Mar 01 2016 May 02 2016 Jul 01 2016 Tingyi Zhu Time Series Outlier Detection July 28, 2016 16 / 42

Series with Seasonal Effect, example: For quarterly data, with possible seasonal (quarterly) effects, we can define indicator function S j. For j = 1, 2, 3, 4, { 1 if observation is in quarter j of a year, S j = 0 otherwise. A model with seasonal effects could be written as X t = α 1 S 1 + α 2 S 2 + α 3 S 3 + α 4 S 4 + ε t Goal: remove the seasonal effects Solution: lag-s differencing, where s is the number of seasons Tingyi Zhu Time Series Outlier Detection July 28, 2016 17 / 42

Differencing and Seasonal Effects Define the lag-s difference operator, s X t = X t X t s = (1 B s )X t, where B is the backshift operator. If X t = T t + S t + E t, and S t has period s (i.e. S t = S t s for all t), then s X t = (1 B s )X t = T t T t s + s E t. Tingyi Zhu Time Series Outlier Detection July 28, 2016 18 / 42

Non-seasonal ARIMA S t = 0 ARIMA stands for Auto-Regressive Integrated Moving Average, ARMA integrated with differencing. A nonseasonal ARIMA model is classified as ARIMA(p,d,q), where p is the order of AR terms, d is the number of nonseasonal differences needed for stationarity, q is the order of MA terms. Tingyi Zhu Time Series Outlier Detection July 28, 2016 19 / 42

Non-seasonal ARIMA, Cont. Recall ARMA(p,q): φ(b)x t = θ(b)ε t, φ(b) and θ(b) are polynomials of B of order p and q. Stationary requirement: all roots of φ(z) = 0 outside unit circle. ARIMA(p,d,q): φ(b)(1 B) d X t = θ(b)ε t, Xt is not stationary. Why? Z t = (1 B) d X t is ARMA(p,q), is stationary. Tingyi Zhu Time Series Outlier Detection July 28, 2016 20 / 42

Seasonal ARIMA A seasonal ARIMA model is classified as ARIMA(p, d, q) (P, D, Q) m p is the order of AR terms, d is the number of nonseasonal differences, q is the order of MA terms. P is the order of seasonal AR terms, D is the number of seasonal differences, Q is the order of seasonal MA terms. m is the number of seasons. Tingyi Zhu Time Series Outlier Detection July 28, 2016 21 / 42

Example: ARIMA(1, 1, 1) (1, 1, 1) 4 Tingyi Zhu Time Series Outlier Detection July 28, 2016 22 / 42

General ARIMA The ARIMA model can be generalized as follow: φ(b)α(b)x t = θ(b)ε t, φ(b): autoregressive polynomial, all roots outside unit circle α(b): differencing filter renders the data stationary, all roots on the unit circle θ(b): moving average polynomial, all roots outside unit circle (to assure θ(b) is invertible. Alternatively, X t = θ(b) φ(b)α(b) ε t. Tingyi Zhu Time Series Outlier Detection July 28, 2016 23 / 42

Outliers Detection in Single Time Series Tingyi Zhu Time Series Outlier Detection July 28, 2016 24 / 42

Automatic Detection Procedure Described in Chung Chen, Lon-Mu Liu. Joint Estimation of Model Parameters and Outlier Effects in Time Series,JASA, 1993 Based on the framework of ARIMA models R package tsoutlier written by YAHOO in 2014 Tingyi Zhu Time Series Outlier Detection July 28, 2016 25 / 42

Types of Outliers General representation: L(B)I t (t j ) L(B): a polynomial of lag operator B I t (t j ) = 1 there s outlier at time t = t j, and 0 otherwise. Types of outliers: Additive Outliers (AO): L(B) = 1; Level Shift (LS): L(B) = 1 1 B ; Temporary Change (TC): L(B) = 1 1 δb ; Seasonal Level Shift (SLS): L(B) = 1 1 B s ; Innovational Outliers (IO): L(B) = θ(b) φ(b)α(b). Tingyi Zhu Time Series Outlier Detection July 28, 2016 26 / 42

Types of Outliers Tingyi Zhu Time Series Outlier Detection July 28, 2016 27 / 42

Formulation ARIMA model: X t = θ(b) φ(b)α(b) ε t. Model with outliers at time t 1, t 2,..., t m : X t = m ω j L j (B)I t (t j ) + θ(b) φ(b)α(b) ε t. j=1 Lj (B) depends on pattern of the jth outlier I t (t j ) = 1 there s outlier at time t = t j, and 0 otherwise. ωj denotes the magnitude of the jth outlier effect Tingyi Zhu Time Series Outlier Detection July 28, 2016 28 / 42

Effect of One Outlier Assume the time series parameters are known, we examine the effect of one outlier: Define polynomial π(b) as: X t = ωl(b)i t (t 1 ) + θ(b) φ(b)α(b) ε t π(b) = φ(b)α(b) θ(b) = 1 π 1 B π 2 B, Contaminated by the outlier, the estimated residual ê t becomes (Without outlier, ê t = π(b)x t.) ê t = π(b)x t Tingyi Zhu Time Series Outlier Detection July 28, 2016 29 / 42

For the four types of outliers, IO: ê t = ωi t (t 1 ) + ε t, AO: ê t = ωπ(b)i t (t 1 ) + ε t, LS: ê t = ω π(b) 1 B I t(t 1 ) + ε t, TC: ê t = ω π(b) 1 δb I t(t 1 ) + ε t. Alternatively, ê t = ωx i,t + ε t, t = t 1, t 1 + 1,... and i = 1, 2, 3, 4 x i,t = 0 for all i and t < t 1, x i,t = 1 for all i, x 1,t1 +k = 0, x 2,t1 +k = π k, x 3,t1 +k = 1 k j=1 π j, x 4,t1 +k = δ k k 1 j=1 δk j π j π k. A simple linear regression! Tingyi Zhu Time Series Outlier Detection July 28, 2016 30 / 42

Estimate of ω The least square estimate doe the effect of a single outlier at t = t 1 can be expressed as Tingyi Zhu Time Series Outlier Detection July 28, 2016 31 / 42

Test Statistics τ From regression analysis, we have ˆω ω n ( x ˆσ i,t) 2 1/2 N(0, 1), a t=t 1 where ˆσ a is the estimation of residual standard deviation. We want to test whether ω = 0, then the following statistics are approximately N(0, 1): Tingyi Zhu Time Series Outlier Detection July 28, 2016 32 / 42

Procedure in the Presence of Multiple Ouliers In the presence of multiple outliers, recall the model X t = m ω j L j (B)I t (t j ) + θ(b) φ(b)α(b) ε t. j=1 where ˆσ a is the estimation of residual standard deviation. The estimated residual becomes ê t = m ω j π(b)l j (B)I t (t j ) + ε t j=1 Tingyi Zhu Time Series Outlier Detection July 28, 2016 33 / 42

Stage 1: Joint Estimation of Outlier Effect and Model Parameters Fitting the series by an ARIMA model (forecast package in R), obtain initial parameter (φ(b), θ(b), α(b)) estimation of the model. Detect outliers one by one sequentially Tingyi Zhu Time Series Outlier Detection July 28, 2016 34 / 42

Stage 2: Initial Parameter Estimation and Outlier Detection Tingyi Zhu Time Series Outlier Detection July 28, 2016 35 / 42

Tingyi Zhu Time Series Outlier Detection July 28, 2016 36 / 42

Outlier Series Detection from Multiple Time Series Tingyi Zhu Time Series Outlier Detection July 28, 2016 37 / 42

Detect Anomalous Series Goal: efficiently find the least similar time series in a large set Motivation: Internet companies monitoring the servers(cpu, Memory), find unusual behaviors Tingyi Zhu Time Series Outlier Detection July 28, 2016 38 / 42

Detect Anomalous Series Described in Rob J Hyndman et al. Large-Scale Unusual Time Series Detection, ICDM, 2015 Approach: Extract features from time series, PCA R package anomalous Test on real data from YAHOO email server, 80% accuracy compared to 40% from previous methods Tingyi Zhu Time Series Outlier Detection July 28, 2016 39 / 42

Step 1: Extract Features from Time Series 15 features selected, each captures the global information of time series Tingyi Zhu Time Series Outlier Detection July 28, 2016 40 / 42

Step2: PCA to reduce dimension dim=15 initially, correlation existing between features The first 2 PCs are sufficient, capturing most of the variance Step 3: Implement multi-dimentional outlier detection algorithm to find outlier series Density based α-hull Tingyi Zhu Time Series Outlier Detection July 28, 2016 41 / 42

Demo Tingyi Zhu Time Series Outlier Detection July 28, 2016 42 / 42