Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research supported by the Naval Postgraduate School Assistance Grant N00244-15-1-0052

Outline [1] Background [2] Flow Field Forecasting Overview [3] Strengths of Flow Field Forecasting [4] Comparison Study with Traditional Methods [5] Bivariate Forecasting [6] Autonomous History Selection [7] Other Forecasting Outputs [8] Concluding Remarks

Background Spring 2011 - Original concept was a need to predict network performance characteristics on the Energy Sciences Network (DoE) Long sequence of observations with observation times Predict future observation autonomously with no human guidance Accept non-uniformly spaced observations Error estimates Fast/Computationally efficient Able to exploit parallel data

Background (continued) December 2011 Poster Session: Introducing Flow Field Forecasting 10 th Annual International Conference on Machine Learning and Applications (ICMLA), Honolulu HI. June 2012 Introduced method for continuously updating forecast, 32 nd Annual International Symposium on Forecasting (ISF), Boston MA. August 2012 Contributed Session on Forecasting JSM 2012, San Diego CA. May 2013 Flow Field Forecasting for Univariate Time Series, published in Statistical Analysis and Data Mining (SADM) March 2014 R package accepted and placed on the Comprehensive R Archive Network (CRAN). Package is called flowfield January 2015 Awarded research assistance grant from the Naval Post Graduate School to research the next generation flow field software

FF Forecasting in 3 Easy Steps Methodology Framework that makes associations between historical process levels and subsequent changes. Extract the flow from one level to the next Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes 3 Step Framework 1. Extract data histories (levels and subsequent changes) 2. Interpolate between observed levels in histories 3. Use the interpolator to step-by-step predict the process forward to the desired forecast horizon

Step 1: Extract Histories? Use penalized spline regression to build a skeleton of historical process levels and changes Extract relevant histories based on application Data Stream (Time Series) Extract Noise PSR

History Extraction Past histories h 1 and h 2 and associated changes d 1 and d 2. Example 1 Example 2 Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes

Step 2: Interpolate the Flow Field The current history may include values that may not have been observed In the past. We use GPR to interpolate observed values to unobserved values.

Step 3: Iteratively Build to the Future d - Slope, s - Level, κ - Knot δ - GPR interpolated value

Strengths of FFF Step I data skeleton achieves data reduction and standardization (estimates process noise) Runs autonomously no interactive supervision of a skilled analyst Conservative In situations where there is no information in the history space that corresponds to the current situation, it conservatively predicts no change Computationally efficient Large data streams with limited computational resources Penalized spline regression is computationally efficient. To further increase its efficiency, we replace the standard numerical search for the optimal smoothing by an asymptotic approximation [Wand, 1999] The step II Gaussian process regression and the step III extrapolation mechanism are also computationally efficient

Comparison Study We compare FFF with Box-Jenkins ARIMA, Exponential Smoothing and Artificial Neural Networks ARIMA & Exponential Smoothing we use R package forecast [Hyndman and Khandakar] Artificial Neural Networks we use R package tsdyn [A. Di Narzo, F. Dii Narzo, J.L. Aznarte and M. Stigler]

Simulated Time Series Simulated data using a baseline data model of the form: Y i = S t + ε i (ε i - Gaussian noise) N = 1500 uniformly spaced observation times ti {1, 2,..., 1550} and σ = 0.4. For the Systematically Determined Component (S(t)), we used realizations of a zero-mean, unit-variance stationary Gaussian process with squared exponential covariance Cov S t, S t = k t t = exp (t t ) 2 2Δ 2

Comparison 1 For our first comparison, we generated 1000 time series realizations (3 pictured) - This model expresses short term noise and longer term, non- Markovian dynamics - Models such as this might plausibly be encountered in real data set - Characteristic length, Δ = 50 Each time series was 1550 observations (mean zero, σ = 0.4) 1500 observations were used to build the model and 50 observations were used for testing Mean forecast error was computed for each method

FF was very competitive with the other traditional methods Comparison 1: Results Artificial NN was marginally worse and took 4 times longer

Comparison 2 For our second comparison, we generated 1000 time series realizations (3 pictured) Variant data model with a recurring distinctive history The characteristic length is Δ = 500 in the time interval [500, 600] and then again beginning at time 1490; elsewhere, Δ = 50.

Comparison 2: Results Short range forecast competitive Long range, FF wins decisively

Comparison 3 Irregularly Space Intervals Most traditional forecasting methods rely on time series data collected at regular intervals FF forecasting is not handicapped by this restriction Demonstration 3 compares FF forecasting to itself

Demonstration 3 We compute 2 time series from the baseline model used in demonstration 1 The first time series uses uniformly spaced observations The second series uses non-uniformly spaced observation times. Times are drawn from a Poisson process yielding time spacings between observations that are exponentially distributed

This demonstration highlights a unique capability of flow field forecasting to accept non-uniformly spaced time series Flow field forecasting can do this with almost no loss of forecast accuracy Demonstration 3: Results

Next Generation Software Goals Move from a univariate data stream to multivariate For bivariate forecasting we compute 2 separate PSRs Next we would forecast both a change in the x- direction and a change in the y-direction Autonomous selection of history structure

Closest Point Approach (CPA) Recall the FFF Guiding Principle: Past associations between history and change are predictive of changes associated with current histories/future changes For CPA we need to find which prior history matches closest with the current history Speed Bumps Sampling rate vs. data stream change rate(s) Number of lags to include in history structure Appropriate distance measure in a high dimensional space Characteristic length for GPR interpolator (if used)

CPA Algorithm Suppose there are p candidate predictor values for the history (e.g. x t, y t, x t-1, y t-1, Δ x(t), Δ y(t), ) For p-candidate predictors this gives us 2 p 1 power sets Create a distance table by computing the distance from between the current point and all historical points for a given history structure

CPA Algorithm (continued) Create the following distance table P1 P2 : H1 H2 Hj H2 p -1 Pi C P i j : Entry (i,j) is the distance from point i to the current point (C) under history structure j C P i j

CPA Algorithm (continued) For each column in the table, determine the minimum distance value P j = argmin Pi C P i j Standardize this value by subtracting the column mean and dividing by the column standard deviation Q j = d C, P j C P i j sd( C P i j ) Determine the minimum value of Q j The minimum value of Q j gives us the closest point as well as the history structure that gave us that point Use the closest point to forecast the next (x,y)

The CPA algorithm is statistically equivalent to adding a penalty to the distance when comparing two different dimensional history structures Suppose I am comparing a history of dimension j to a history of dimension size Let D k = Additive Penalty d C,P k sd( C P i k ) Check to see if D j + Π jk < D k and D j = d C,P j sd( C P i j ) where Π jk = C P i k sd( C P i k ) C P i j sd( C P i j )

We forecast a periodic data stream using the parametric model x(t) = t + 0.5*cos(3*t) + N(0,σ 2 ) CPA Demonstrations y(t) = t+3*sin(t) + N(0,σ 2 )

Mean Flow Certainty Approach (MFCA) The MFC (ω) expresses through the variance an estimate of how well the forecast path is accurately reflected in the history space The MFC is a value between 0 and 1. The closer ω is to 1 the more accurately the history space matches with the forecast path MFC is analogous to R 2 in linear regression

MFCA Algorithm Create a large set of all potential predictors as was done with CPA Hold out the last 5 data stream values for a test set Perform GPR and all possible subsets of these predictors using all but the last 5 data stream values

MFCA Algorithm (continued) Calculate the mean prediction error (MPE) for the last data values and the average mean flow certainty (MFC) Calculate the prediction strength PS = MFC x exp(-mpe) Choose the history structure (i.e. subset of predictors) that gives us the value of PS that is closest to 1.

Issues/Concerns CPA works great if the algorithm picks the correct point Occasionally due to additional factors (i.e. sampling rate, data stream changes) the incorrect point is chosen An incorrectly chosen closest point results in a poor forecast MFCA requires the correct choice of a characteristic length (Δ). The correct choice of Δ balances the bias variance tradeoff Both algorithms require selecting the appropriate history depth (i.e. number of lags)

Hybrid Approach It is our belief that the correct algorithm will most likely be a combination of the two methods We think that we should pick some subset of closest points, potentially 5, using CPA and then perform a localized GPR on only these 5 points using MFCA to determine the winner

Future Work Investigate thoroughly the hybrid approach Look into R-trees as a way to organize the history structure searches Look into an innovative way to calculate the characteristic length Given a data stream, can we figure out a way a priori whether our method will provide a reasonable forecast. This may be accomplished by looking for a clustering of histories Investigate the effect of data sampling rate and the appropriate number of lags in our potential set of history predictors

Concluding Remarks Novel, computationally efficient method, for forecasting a bivariate time series Results are generalizable to multivariate data streams Created a new proximity measure for comparing spaces in different dimensions Results could be used to improve univariate forecasting methods Instead of predicting slope, we could predict acceleration or potential energy

Questions? Those who have knowledge, don't predict. Those who predict, don't have knowledge. --Lao Tzu, 6th Century BC Chinese Poet

Backup Slides

Different Forecasting Methods (Flow FF) Flow field forecasting works by estimating the flow field or slope field. Essentially we are using GPR to predict (i.e. interpolate) the forward slope and using this to predict the next location A conservative feature of GPR is that when trying to interpolate the slope, if there is no information in the past the is close to the most recent history it conservatively predicts no change or zero slope

Different Forecasting Methods (Force FF) When forecasting a bivariate data stream, predicting zero change the slope may not accurately reflect the physics of the situation When forecasting in 2 dimensions the conservative predicting might be no change in velocity Force Acceleration (assuming constant mass) Using GPR to predict no change in acceleration results in constant velocity

Potential Energy Forecasting Use Force Field Forecasting to create an estimated Force Field, (F x, F y ) A force field (F x, F y ) that has an associated potential energy V(x, y) is said to be conservative From (F x, F y ) we create an estimate of the potential energy V(x, y) Using the estimated potential energy we calculate consistent estimates of the force field components (F x, F y )

Potential Energy Forecasting (continued) F x x, y = Δ Δx V(x, y) and F y x, y = Δ Δy V(x, y) We can then check for conservatism by looking at the distances F x x, y F x x, y and F y x, y F y x, y We estimate the next x and y increments on our path by Δx = (x c + F x x c, y c Δt)Δt and Δy = (y c + F y x c, y c Δt)Δt