Forecasting Data Streams: Next Generation Flow Field Forecasting

Similar documents
Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Gaussian Process Regression

Gaussian with mean ( µ ) and standard deviation ( σ)

Lecture 9. Time series prediction

CPSC 540: Machine Learning

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

A general mixed model approach for spatio-temporal regression data

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Multivariate Bayesian Linear Regression MLAI Lecture 11

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

ECE521 week 3: 23/26 January 2017

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Theoretical and Simulation-guided Exploration of the AR(1) Model

Course in Data Science

Introduction. Chapter 1

CS534 Machine Learning - Spring Final Exam

Lecture 2: Univariate Time Series

Statistics 910, #15 1. Kalman Filter

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Lecture 2 Machine Learning Review

Machine Learning Linear Regression. Prof. Matteo Matteucci

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

41903: Introduction to Nonparametrics

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

PMR Learning as Inference

Analysing geoadditive regression data: a mixed model approach

EE/CpE 345. Modeling and Simulation. Fall Class 9

CSCI-567: Machine Learning (Spring 2019)

STAT 518 Intro Student Presentation

Data Mining und Maschinelles Lernen

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

Data Mining Stat 588

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Gaussian Processes for Machine Learning

Gradient-Based Learning. Sargur N. Srihari

Neutron inverse kinetics via Gaussian Processes

Parametric Techniques Lecture 3

Machine Learning Practice Page 2 of 2 10/28/13

Gaussian Process Regression Forecasting of Computer Network Conditions

Machine learning - HT Maximum Likelihood

7. Forecasting with ARIMA models

Gibbs Sampling in Linear Models #2

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

Learning Gaussian Process Models from Uncertain Data

C-14 Finding the Right Synergy from GLMs and Machine Learning

Sparse Linear Models (10/7/13)

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Recap from previous lecture

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Introduction to Machine Learning Midterm Exam

Bayesian Methods for Machine Learning

Bayesian non-parametric model to longitudinally predict churn

A State Space Model for Wind Forecast Correction

Statistical Distribution Assumptions of General Linear Models

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

INTRODUCTION TO PATTERN RECOGNITION

Statistical Inference and Methods

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Econ 582 Nonparametric Regression

Day 3: Classification, logistic regression

A short introduction to INLA and R-INLA

Hierarchical Boosting and Filter Generation

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Machine Learning Linear Classification. Prof. Matteo Matteucci

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

Neural Networks Based on Competition

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

9 Classification. 9.1 Linear Classifiers

Computational Genomics

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Local linear forecasts using cubic smoothing splines

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Factor Analysis (10/2/13)

CSE446: non-parametric methods Spring 2017

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Machine Learning (CS 567) Lecture 5

Wrapped Gaussian processes: a short review and some new results

Multivariate Random Variable

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Introduction to Machine Learning Midterm Exam Solutions

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Probabilistic Energy Forecasting

CHAPTER 6 CONCLUSION AND FUTURE SCOPE

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Density Estimation. Seungjin Choi

Modelling geoadditive survival data

Transcription:

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research supported by the Naval Postgraduate School Assistance Grant N00244-15-1-0052

Outline [1] Background [2] Flow Field Forecasting Overview [3] Strengths of Flow Field Forecasting [4] Comparison Study with Traditional Methods [5] Bivariate Forecasting [6] Autonomous History Selection [7] Other Forecasting Outputs [8] Concluding Remarks

Background Spring 2011 - Original concept was a need to predict network performance characteristics on the Energy Sciences Network (DoE) Long sequence of observations with observation times Predict future observation autonomously with no human guidance Accept non-uniformly spaced observations Error estimates Fast/Computationally efficient Able to exploit parallel data

Background (continued) December 2011 Poster Session: Introducing Flow Field Forecasting 10 th Annual International Conference on Machine Learning and Applications (ICMLA), Honolulu HI. June 2012 Introduced method for continuously updating forecast, 32 nd Annual International Symposium on Forecasting (ISF), Boston MA. August 2012 Contributed Session on Forecasting JSM 2012, San Diego CA. May 2013 Flow Field Forecasting for Univariate Time Series, published in Statistical Analysis and Data Mining (SADM) March 2014 R package accepted and placed on the Comprehensive R Archive Network (CRAN). Package is called flowfield January 2015 Awarded research assistance grant from the Naval Post Graduate School to research the next generation flow field software

FF Forecasting in 3 Easy Steps Methodology Framework that makes associations between historical process levels and subsequent changes. Extract the flow from one level to the next Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes 3 Step Framework 1. Extract data histories (levels and subsequent changes) 2. Interpolate between observed levels in histories 3. Use the interpolator to step-by-step predict the process forward to the desired forecast horizon

Step 1: Extract Histories? Use penalized spline regression to build a skeleton of historical process levels and changes Extract relevant histories based on application Data Stream (Time Series) Extract Noise PSR

History Extraction Past histories h 1 and h 2 and associated changes d 1 and d 2. Example 1 Example 2 Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes

Step 2: Interpolate the Flow Field The current history may include values that may not have been observed In the past. We use GPR to interpolate observed values to unobserved values.

Step 3: Iteratively Build to the Future d - Slope, s - Level, κ - Knot δ - GPR interpolated value

Strengths of FFF Step I data skeleton achieves data reduction and standardization (estimates process noise) Runs autonomously no interactive supervision of a skilled analyst Conservative In situations where there is no information in the history space that corresponds to the current situation, it conservatively predicts no change Computationally efficient Large data streams with limited computational resources Penalized spline regression is computationally efficient. To further increase its efficiency, we replace the standard numerical search for the optimal smoothing by an asymptotic approximation [Wand, 1999] The step II Gaussian process regression and the step III extrapolation mechanism are also computationally efficient

Comparison Study We compare FFF with Box-Jenkins ARIMA, Exponential Smoothing and Artificial Neural Networks ARIMA & Exponential Smoothing we use R package forecast [Hyndman and Khandakar] Artificial Neural Networks we use R package tsdyn [A. Di Narzo, F. Dii Narzo, J.L. Aznarte and M. Stigler]

Simulated Time Series Simulated data using a baseline data model of the form: Y i = S t + ε i (ε i - Gaussian noise) N = 1500 uniformly spaced observation times ti {1, 2,..., 1550} and σ = 0.4. For the Systematically Determined Component (S(t)), we used realizations of a zero-mean, unit-variance stationary Gaussian process with squared exponential covariance Cov S t, S t = k t t = exp (t t ) 2 2Δ 2

Comparison 1 For our first comparison, we generated 1000 time series realizations (3 pictured) - This model expresses short term noise and longer term, non- Markovian dynamics - Models such as this might plausibly be encountered in real data set - Characteristic length, Δ = 50 Each time series was 1550 observations (mean zero, σ = 0.4) 1500 observations were used to build the model and 50 observations were used for testing Mean forecast error was computed for each method

FF was very competitive with the other traditional methods Comparison 1: Results Artificial NN was marginally worse and took 4 times longer

Comparison 2 For our second comparison, we generated 1000 time series realizations (3 pictured) Variant data model with a recurring distinctive history The characteristic length is Δ = 500 in the time interval [500, 600] and then again beginning at time 1490; elsewhere, Δ = 50.

Comparison 2: Results Short range forecast competitive Long range, FF wins decisively

Comparison 3 Irregularly Space Intervals Most traditional forecasting methods rely on time series data collected at regular intervals FF forecasting is not handicapped by this restriction Demonstration 3 compares FF forecasting to itself

Demonstration 3 We compute 2 time series from the baseline model used in demonstration 1 The first time series uses uniformly spaced observations The second series uses non-uniformly spaced observation times. Times are drawn from a Poisson process yielding time spacings between observations that are exponentially distributed

This demonstration highlights a unique capability of flow field forecasting to accept non-uniformly spaced time series Flow field forecasting can do this with almost no loss of forecast accuracy Demonstration 3: Results

Next Generation Software Goals Move from a univariate data stream to multivariate For bivariate forecasting we compute 2 separate PSRs Next we would forecast both a change in the x- direction and a change in the y-direction Autonomous selection of history structure

Closest Point Approach (CPA) Recall the FFF Guiding Principle: Past associations between history and change are predictive of changes associated with current histories/future changes For CPA we need to find which prior history matches closest with the current history Speed Bumps Sampling rate vs. data stream change rate(s) Number of lags to include in history structure Appropriate distance measure in a high dimensional space Characteristic length for GPR interpolator (if used)

CPA Algorithm Suppose there are p candidate predictor values for the history (e.g. x t, y t, x t-1, y t-1, Δ x(t), Δ y(t), ) For p-candidate predictors this gives us 2 p 1 power sets Create a distance table by computing the distance from between the current point and all historical points for a given history structure

CPA Algorithm (continued) Create the following distance table P1 P2 : H1 H2 Hj H2 p -1 Pi C P i j : Entry (i,j) is the distance from point i to the current point (C) under history structure j C P i j

CPA Algorithm (continued) For each column in the table, determine the minimum distance value P j = argmin Pi C P i j Standardize this value by subtracting the column mean and dividing by the column standard deviation Q j = d C, P j C P i j sd( C P i j ) Determine the minimum value of Q j The minimum value of Q j gives us the closest point as well as the history structure that gave us that point Use the closest point to forecast the next (x,y)

The CPA algorithm is statistically equivalent to adding a penalty to the distance when comparing two different dimensional history structures Suppose I am comparing a history of dimension j to a history of dimension size Let D k = Additive Penalty d C,P k sd( C P i k ) Check to see if D j + Π jk < D k and D j = d C,P j sd( C P i j ) where Π jk = C P i k sd( C P i k ) C P i j sd( C P i j )

We forecast a periodic data stream using the parametric model x(t) = t + 0.5*cos(3*t) + N(0,σ 2 ) CPA Demonstrations y(t) = t+3*sin(t) + N(0,σ 2 )

Mean Flow Certainty Approach (MFCA) The MFC (ω) expresses through the variance an estimate of how well the forecast path is accurately reflected in the history space The MFC is a value between 0 and 1. The closer ω is to 1 the more accurately the history space matches with the forecast path MFC is analogous to R 2 in linear regression

MFCA Algorithm Create a large set of all potential predictors as was done with CPA Hold out the last 5 data stream values for a test set Perform GPR and all possible subsets of these predictors using all but the last 5 data stream values

MFCA Algorithm (continued) Calculate the mean prediction error (MPE) for the last data values and the average mean flow certainty (MFC) Calculate the prediction strength PS = MFC x exp(-mpe) Choose the history structure (i.e. subset of predictors) that gives us the value of PS that is closest to 1.

Issues/Concerns CPA works great if the algorithm picks the correct point Occasionally due to additional factors (i.e. sampling rate, data stream changes) the incorrect point is chosen An incorrectly chosen closest point results in a poor forecast MFCA requires the correct choice of a characteristic length (Δ). The correct choice of Δ balances the bias variance tradeoff Both algorithms require selecting the appropriate history depth (i.e. number of lags)

Hybrid Approach It is our belief that the correct algorithm will most likely be a combination of the two methods We think that we should pick some subset of closest points, potentially 5, using CPA and then perform a localized GPR on only these 5 points using MFCA to determine the winner

Future Work Investigate thoroughly the hybrid approach Look into R-trees as a way to organize the history structure searches Look into an innovative way to calculate the characteristic length Given a data stream, can we figure out a way a priori whether our method will provide a reasonable forecast. This may be accomplished by looking for a clustering of histories Investigate the effect of data sampling rate and the appropriate number of lags in our potential set of history predictors

Concluding Remarks Novel, computationally efficient method, for forecasting a bivariate time series Results are generalizable to multivariate data streams Created a new proximity measure for comparing spaces in different dimensions Results could be used to improve univariate forecasting methods Instead of predicting slope, we could predict acceleration or potential energy

Questions? Those who have knowledge, don't predict. Those who predict, don't have knowledge. --Lao Tzu, 6th Century BC Chinese Poet

Backup Slides

Different Forecasting Methods (Flow FF) Flow field forecasting works by estimating the flow field or slope field. Essentially we are using GPR to predict (i.e. interpolate) the forward slope and using this to predict the next location A conservative feature of GPR is that when trying to interpolate the slope, if there is no information in the past the is close to the most recent history it conservatively predicts no change or zero slope

Different Forecasting Methods (Force FF) When forecasting a bivariate data stream, predicting zero change the slope may not accurately reflect the physics of the situation When forecasting in 2 dimensions the conservative predicting might be no change in velocity Force Acceleration (assuming constant mass) Using GPR to predict no change in acceleration results in constant velocity

Potential Energy Forecasting Use Force Field Forecasting to create an estimated Force Field, (F x, F y ) A force field (F x, F y ) that has an associated potential energy V(x, y) is said to be conservative From (F x, F y ) we create an estimate of the potential energy V(x, y) Using the estimated potential energy we calculate consistent estimates of the force field components (F x, F y )

Potential Energy Forecasting (continued) F x x, y = Δ Δx V(x, y) and F y x, y = Δ Δy V(x, y) We can then check for conservatism by looking at the distances F x x, y F x x, y and F y x, y F y x, y We estimate the next x and y increments on our path by Δx = (x c + F x x c, y c Δt)Δt and Δy = (y c + F y x c, y c Δt)Δt