Data Mining Techniques

Size: px

Start display at page:

Download "Data Mining Techniques"

Rudolph Daniels
5 years ago
Views:

1 Data Mining Techniques CS Section 3 - Fall 2016 Lecture 18: Time Series Jan-Willem van de Meent (credit: Aggarwal Chapter 14.3)

2 Time Series Data

3 Time Series Data

4 Time Series Data Time series forecasting is fundamentally hard Rare events often play a big role in changing trends Impossible to know how events will affects trends (and often when such events will occur)

5 Time Series Data source: In some cases there are clear trends (here: seasonal effects + growth)

6 Autoregressive Models

7 Time Series Smoothing 200 Moving Average Exponential (a) Moving average smoothing 200 (b) Ex IBM STOCK PRICE Figure 14.1: Various smoothing 195 methods applied to IBM stock to September 4, 2014 Exponential Smoothing In exponential smoothing, the smoothed value y i ACTUAL VALUES ACTUAL VALUES DAY MOVING AVERAGE EXP. SMOOTHING (α=0.1) the current value 50 DAY MOVING y AVERAGE EXP. SMOOTHING (α=0.05) i,andthepreviouslysmoothedvaluey i NUMBER OF TRADING DAYS NUMBER OF TRADING DAYS α (0, 1) is used for this purpose. (a) Moving average smoothing y 0 i = 1 k P k 1 n=0 y i n IBM STOCK PRICE (b) Exponential smoothing is define y i = α y i + (1 α) y i 1 The value of y 0 is typically set to the first point in the seri

8 Stationary Time Series y t = c + t E[ t ]=0 Definition (Strictly Stationary Time Series) Astrictlystationarytimeseries is one in which the probabilistic distribution of the values in any time interval [a, b] is identical to that in the shifted interval [a + h, b + h] for any value of the time shift h. Differencing yt - yt-1 Log differencing log yt - log yt PRICE VALUE ORIGINAL SERIES DIFFERENCED SERIES LOGARITHM(PRICE VALUE) ORIGINAL SERIES (LOG) DIFFERENCED SERIES (LOG) TIME INDEX (a) Unscaled series TIME INDEX (b) Logarithmic scaling

9 ed with time. It is evident that the differencing operation does not help i stationary. In Fig. 14.3b, the logarithm function is applied to the seri ncing operation. In this case, the series becomes stationary after the diff Auto-correlation following, a number of univariate time series forecasting models will be d IBM Stock Price Sine Wave els work effectively under different assumptions on the time series patter 1 1 odels assume a stationary time series, whereas others do not AUTOCORRELATION 0.6 Autoregressive 0.4 Models time series contain a single variable that is predicted using autocor lations represent the correlations between adjacently located timesta related. The autocorrelations in a time series are defined with respect ue of the lag L. Thus,foratimeseriesy LAG 1,...y n,theautocorrelationa LAG (DEGREES) AUTOCORRELATION ically, the behavioral attribute values at adjacently located timestamps the Pearson coefficient of correlation between y t and y t+l. Autocorrelation(L) = Covariance t(y t,y t+l ) Variance t (y t )

10 Autoregressive Models Autoregressive: AR(p) px y t = a i y t i + c + t i=1 Moving-Average: MA(q) q y t = b i ϵ t i + c + ϵ t i=1 Autoregressive moving-average: ARMA(p,q) y t = p a i y t i + q i=1 i=1 b i ϵ t i + c + ϵ t Autoregressive integrated moving-average: ARIMA(p,d,q) px qx y (d) t = a i y (d) t i + b i t i + c + t i=1 i=1 Do least-squares regression to estimate a,b,c

11 ARIMA on Airline Data (p,d,q) = (0,1,12) source:

12 Hidden Markov Models

13 Time Series with Distinct States

14 Can we use a Gaussian Mixture Model? Time Series Histogram Posterior on states Mixture

15 Can we use a Gaussian Mixture Model? Time Series Histogram Posterior on states Mixture

16 Hidden Markov Models Estimate from GMM Estimate from HMM Idea: Mixture model + Markov chain for states Can model correlation between subsequent states (more likely to be in same state than different state)

17 (adapted from:: Mining of Massive Datasets, Reminder: Random Surfers in PageRank y/2 y a/2 a y/2 m a/2 m Model for random Surfer: At time t = 0 pick a page at random At each subsequent time t follow an outgoing link at random

18 (adapted from:: Mining of Massive Datasets, Reminder: Random Surfers in PageRank y/2 y a/2 a y/2 m a/2 m

19 Hidden Markov Models Gaussian Mixture Gaussian HMM A = M > z n Discrete( ) x n z n = k Normal(µ k, k) z 1 Discrete( ) z t+1 z t = k Discrete(A k ) x t z t = k Normal(µ k, k)

20 Review: Gaussian Mixtures Expectation Maximization 1. Update cluster probabilities i tk = p(z t = k x t, i 1 ) = p(x t,z t = k i 1 ) P l p(x t,z t = l i 1 ) 2. Update parameters z n Discrete( ) x n z n = k Normal(µ k, k) µ i k = 1 P T Nk i t=1 tk i x t i k = 1 N i k i k = N i k/n P T t=1 i tk (xi t µ i k )2 1/2 N i k = P T t=1 i tk

21 Forward-backward Algorithm Expectation step for HMM t,k = p(z t = k x 1:T, ) = p(x 1:t,z t )p(x t+1:t z t ) p(x 1:T ) / t,k t,k t,l := p(x 1:t,z t ) = X k p(x t µ l, l)a kl t 1,k z 1 Discrete( ) z t+1 z t = k Discrete(A k ) x t z t = k Normal(µ k, k) t,k := p(x t+1:t z t ) = X l t+1,l p(x t+1 µ l, l) A kl

22 Other Examples for HMMs RNA splicing Handwritten Digits State 1: Exon (relevant) State 2: Splice site State 3: Intron (ignored) State 1: Sweeping arc State 2: Horizontal line

Classic Time Series Analysis

Classic Time Series Analysis Concepts and Definitions Let Y be a random number with PDF f Y t ~f,t Define t =E[Y t ] m(t) is known as the trend Define the autocovariance t, s =COV [Y t,y s ] =E[ Y t t