Time Series Mining and Forecasting

Size: px

Start display at page:

Download "Time Series Mining and Forecasting"

Hannah Ward
5 years ago
Views:

1 CSE 6242 / CX 4242 Mar 25, 2014 Time Series Mining and Forecasting Duen Horng (Polo) Chau Georgia Tech Slides based on Prof. Christos Faloutsos s materials

2 Outline Motivation Similarity search distance functions Linear Forecasting on-linear forecasting Conclusions

3 Problem definition Given: one or more sequences x 1, x 2,, x t, (y 1, y 2,, y t, ) ( ) Find similar sequences; forecasts patterns; clusters; outliers

4 Motivation - Applications Financial, sales, economic series Medical ECGs +; blood pressure etc monitoring reactions to new drugs elderly care

5 Motivation - Applications (cont d) Smart house sensors monitor temperature, humidity, air quality video surveillance

6 Motivation - Applications (cont d) Weather, environment/anti-pollution volcano monitoring air/water pollutant monitoring

7 Motivation - Applications (cont d) Computer systems Active Disks (buffering, prefetching) web servers (ditto) network traffic monitoring...

8 Stream Data: Disk accesses #bytes time

9 Problem #1: Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress count lynx caught per year (packets per day; temperature per day) year

10 Problem#2: Forecast Given x t, x t-1,, forecast x t+1 umber of packets sent Time Tick??

11 Problem#2 : Similarity search E.g.., Find a 3-tick pattern, similar to the last one umber of packets sent Time Tick??

12 Problem #3: Given: A set of correlated time sequences Forecast Sent(t) 90 umber of packets sent lost repeated Time Tick

13 Important observations Patterns, rules, forecasting and similarity indexing are closely related: To do forecasting, we need to find patterns/rules to find similar settings in the past to find outliers, we need to have forecasts (outlier = too far away from our forecast)

14 Outline Motivation Similarity Search and Indexing Linear Forecasting on-linear forecasting Conclusions

15 Outline Motivation Similarity search and distance functions... Euclidean Time-warping

16 Importance of distance functions Subtle, but absolutely necessary: A must for similarity indexing (-> forecasting) A must for clustering Two major families Euclidean and Lp norms Time warping and variations

17 Euclidean and Lp = = n i x i y i y x D 1 2 ) ( ), ( x(t) y(t)... = = n i p i i p y x y x L 1 ), ( L 1 : city-block = Manhattan L 2 = Euclidean L

18 Observation #1 Time sequence -> n-d vector Day-n... Day-2 Day-1

19 Observation #2 Euclidean distance is closely related to cosine similarity dot product cross-correlation function Day-n... Day-2 Day-1

20 Time Warping allow accelerations - decelerations (with or w/o penalty) THE compute the (Euclidean) distance (+ penalty) related to the string-editing distance

21 stutters : Time Warping

22 Time warping Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y

23 Time warping Thus, with no penalty for stutter, for sequences x 1, x 2,, x i,; y 1, y 2,, y j D( i, j) = x[ i] y[ j] + D( i min# D( i " D( i 1,, j 1, j 1) 1) j) no stutter x-stutter y-stutter

24 Time warping VERY SIMILAR to the string-editing distance D( i, j) = x[ i] y[ j] + D( i min# D( i " D( i 1,, j 1, j 1) 1) j) no stutter x-stutter y-stutter

25 Time warping Complexity: O(M*) - quadratic on the length of the strings Many variations (penalty for stutters; limit on the number/percentage of stutters; ) popular in voice processing [Rabiner + Juang]

26 Other Distance functions piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97] cepstrum (for voice [Rabiner+Juang]) do DFT; take log of amplitude; do DFT again Allow for small gaps [Agrawal+95] See tutorial by [Gunopulos + Das, SIGMOD01]

27 Other Distance functions In [Keogh+, KDD 04]: parameter-free, MDL based

28 Conclusions Prevailing distances: Euclidean and time-warping

29 Outline Motivation Similarity search and distance functions Linear Forecasting on-linear forecasting Conclusions

30 Linear Forecasting

31 Forecasting Prediction is very difficult, especially about the future. - ils Bohr Danish physicist and obel Prize laureate

32 Outline Motivation... Linear Forecasting Auto-regression: Least Squares; RLS Co-evolving time sequences Examples Conclusions

33 Reference [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE (Describes MUSCLES and Recursive Least Squares)

34 Problem#2: Forecast Example: give x t-1, x t-2,, forecast x t umber of packets sent Time Tick??

35 Forecasting: Preprocessing MAUALLY: remove trends spot periodicities 7 days time time

36 Problem#2: Forecast Solution: try to express x t as a linear function of the past: x (up to a window of w) t-1, x t-2,, ?? Time Tick

37 (Problem: Back-cast; interpolate) Solution - interpolate: try to express x t as a linear function of the past AD the future: x t+1, x t+2, x t+wfuture; x t-1, x t-wpast (up to windows of w past, w future ) EXACTLY the same algo s?? Time Tick

38 Linear Regression: idea patient weight height ?? Body height Body weight express what we don t know (= dependent variable ) as a linear function of what we know (= independent variable(s) )

39 Linear Regression: idea patient weight height ?? Body height Body weight express what we don t know (= dependent variable ) as a linear function of what we know (= independent variable(s) )

40 Linear Regression: idea patient weight height ?? Body height Body weight express what we don t know (= dependent variable ) as a linear function of what we know (= independent variable(s) )

41 Linear Regression: idea patient weight height ?? Body height Body weight express what we don t know (= dependent variable ) as a linear function of what we know (= independent variable(s) )

42 Linear Auto Regression: Time Packets Sent (t-1) Packets Sent(t) ??

43 Linear Auto Regression: Time Packets Sent (t-1) Packets Sent(t) lag-plot #packets sent at time t 25?? #packets sent at time t-1 lag w = 1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

44 Linear Auto Regression: Time Packets Sent (t-1) Packets Sent(t) lag-plot #packets sent at time t 25?? #packets sent at time t-1 lag w = 1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

45 Linear Auto Regression: Time Packets Sent (t-1) Packets Sent(t) lag-plot #packets sent at time t 25?? #packets sent at time t-1 lag w = 1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

46 Linear Auto Regression: Time Packets Sent (t-1) Packets Sent(t) lag-plot #packets sent at time t 25?? #packets sent at time t-1 lag w = 1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

47 Outline Motivation... Linear Forecasting Auto-regression: Least Squares; RLS Co-evolving time sequences Examples Conclusions

48 More details: Q1: Can it work with window w > 1? A1: YES x t x t-1 x t-2

49 More details: Q1: Can it work with window w > 1? A1: YES (we ll fit a hyper-plane, then) x t x t-1 x t-2

50 More details: Q1: Can it work with window w > 1? A1: YES (we ll fit a hyper-plane, then) x t x t-1 x t-2

51 More details: Q1: Can it work with window w > 1? A1: YES The problem becomes: X [ w] a [w 1] = y [ 1] OVER-COSTRAIED a is the vector of the regression coefficients X has the values of the w indep. variables y has the values of the dependent variable

52 More details: X [ w] a [w 1] = y [ 1] " # % & = " # % & " # % & w w w w y y y a a a X X X X X X X X X " ,,,,,,,,, Ind-var1 Ind-var-w time

53 More details: X [ w] a [w 1] = y [ 1] " # % & = " # % & " # % & w w w w y y y a a a X X X X X X X X X " ,,,,,,,,, Ind-var1 Ind-var-w time

54 More details Q2: How to estimate a 1, a 2, a w = a? A2: with Least Squares fit " a = ( X T X ) -1 (X T y) (Moore-Penrose pseudo-inverse) a is the vector that minimizes the RMSE from y

55 More details Straightforward solution: w a = ( X T X ) -1 (X T y) a : Regression Coeff. Vector X : Sample Matrix X : Observations: Sample matrix X grows over time needs matrix inversion O( w 2 ) computation O( w) storage

56 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of Recursive Least Squares (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion (How is that possible?)

57 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of Recursive Least Squares (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion (How is that possible?) A: our matrix has special form: (X T X)

58 SKIP More details w At the +1 time tick: X +1 X : x +1

59 SKIP More details Let G = ( X T X ) -1 ( gain matrix ) G +1 can be computed recursively from G w G w

60 SKIP EVE more details: 1 T + 1 = G [ c] [ G x + 1 ] x + 1 G G 1 x w row vector c = + [ 1+ x G x T ] Let s elaborate (VERY IMPORTAT, VERY VALUABLE)

61 SKIP EVE more details: a T = 1 T [ X X ] [ X y ]

62 SKIP EVE more details: a T = 1 T [ X X ] [ X y ] [w x 1] [(+1) x w] [(+1) x 1] [w x (+1)] [w x (+1)]

63 SKIP EVE more details: a T = 1 T [ X X ] [ X y ] [(+1) x w] [w x (+1)]

64 EVE more details: T G x x G c G G = ] [ ] [ ] 1 [ 1 1 T x G x c = ] [ ] [ = T T y X X X a ] [ T X X G 1 x w row vector gain matrix SKIP

65 EVE more details: T G x x G c G G = ] [ ] [ ] 1 [ 1 1 T x G x c = SKIP

66 SKIP EVE more details: 1x1 1xw wxw wxw wxw wx1 wxw 1 T + 1 = G [ c] [ G x + 1 ] x + 1 G G SCALAR c = + [ 1+ x G x T ]

67 Altogether: T G x x G c G G = ] [ ] [ ] 1 [ 1 1 T x G x c = ] [ ] [ = T T y X X X a ] [ T X X G SKIP

68 SKIP Altogether: G 0 δ I where I: w x w identity matrix δ: a large positive number

69 Comparison: Straightforward Least Squares eeds huge matrix (growing in size) O( w) Costly matrix operation O( w 2 ) Recursive LS eed much smaller, fixed size matrix O(w w) Fast, incremental computation O(1 w 2 ) no matrix inversion = 10 6, w = 1-100

70 Pictorially: Given: Dependent Variable Independent Variable

71 Pictorially:. Dependent Variable new point Independent Variable

72 Pictorially: RLS: quickly compute new best fit Dependent Variable new point Independent Variable

73 Even more details Q4: can we forget the older samples? A4: Yes - RLS can easily handle that [Yi+00]:

74 Adaptability - forgetting Dependent Variable eg., #bytes sent Independent Variable eg., #packets sent

75 Adaptability - forgetting Trend change Dependent Variable eg., #bytes sent (R)LS with no forgetting Independent Variable eg. #packets sent

76 Adaptability - forgetting Trend change Dependent Variable Independent Variable (R)LS with no forgetting (R)LS with forgetting RLS: can *trivially* handle forgetting

Time Series. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. Mining and Forecasting

Time Series. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. Mining and Forecasting http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech