Problem Statements in Time Series Forecasting Vadim Strijov, Visiting Professor at IAM METU Computing Center of the Russian Academy of Sciences Institute of Applied Mathematics, Middle East Technical University June 4 th, 2012 Vadim Strijov Problem Statements in Time Series Forecasting 1 / 47
The plan and the goal 1 Customer demand sales forecasting as the example of error function statement. 2 Periodical time series: the one-model autoregressive problem. 3 Non-periodical time series: the mixed-model autoregressive problem. 4 Some problems of regression and classification. The goal is to state the problem of the time series event forecasting as the auto-regressive problem. Vadim Strijov Problem Statements in Time Series Forecasting 2 / 47
Sales planning The set of retailers problems: custom inventory, calculation of optimal insurance stocks, consumer demand forecasting. Vadim Strijov Problem Statements in Time Series Forecasting 3 / 47
The initial problem statement There given: time-scale, historical time series, additional time series; the quality of forecasting: minimum loss of money; we must: forecast the time series. Vadim Strijov Problem Statements in Time Series Forecasting 4 / 47
Custom inventory Vadim Strijov Problem Statements in Time Series Forecasting 5 / 47
Excessive forecast Vadim Strijov Problem Statements in Time Series Forecasting 6 / 47
Insufficient forecast Vadim Strijov Problem Statements in Time Series Forecasting 7 / 47
Loss function for the forecast error reflects the sales process and depends on the basis of the features of a particular trading network Symmetric quadratic function Module function Asymmetric function Vadim Strijov Problem Statements in Time Series Forecasting 8 / 47
Asymmetric loss function Vadim Strijov Problem Statements in Time Series Forecasting 9 / 47
Noisy time series forecasting There is a historical time series of the volume off-takes (i.e. foodstuff). Let the time series be homoscedastic (its variance is the time-constant). Using the loss function one must forecast the next sample. Vadim Strijov Problem Statements in Time Series Forecasting 10 / 47
The time series and the histogram Vadim Strijov Problem Statements in Time Series Forecasting 11 / 47
The forecasting algorithm Let there be given: the historgam H = {X i,g i },i = 1,...,m; the loss function L = L(Z,X); for example, L = Z X or L = (Z X) 2. The problem: For given H and L, one must find the optimal forecast value X. Solution: m X = arg min g i L(Z,X i ). Z {X 1,...,X m} i=1 Result: X is the optimal forecast of the time series. Vadim Strijov Problem Statements in Time Series Forecasting 12 / 47
The sales time series is non-stationary There is a trend total increase or decrease in sales volume, periodic component week and year cycles, aperiodic component promotional actions and holidays, life cycle of goods mobile phones. Vadim Strijov Problem Statements in Time Series Forecasting 13 / 47
Year seasonality Vadim Strijov Problem Statements in Time Series Forecasting 14 / 47
Week seasonality Vadim Strijov Problem Statements in Time Series Forecasting 15 / 47
How to create sample set of periodical series Weighting function includes neighborhood points to the sample set. Vadim Strijov Problem Statements in Time Series Forecasting 16 / 47
Holidays and week-ends Vadim Strijov Problem Statements in Time Series Forecasting 17 / 47
Promotional actions Vadim Strijov Problem Statements in Time Series Forecasting 18 / 47
Promotional profile extraction Hypothesis: the shape of the profile (excluding the profile parameters) does not depend of duration of the action. Problem: to forecast the customer demand during the promotional action. Vadim Strijov Problem Statements in Time Series Forecasting 19 / 47
Algorithmic composition On forecasting one must consider: trend of time series, periodical components of time series, aperiodical components, as well as the fact that the time series contain empty values there is no information about the stock, empty values new position on the stock, outliers and errors. Vadim Strijov Problem Statements in Time Series Forecasting 20 / 47
The periodic components of the multivariate time series The time series: Periods: energy price, consumption, daytime, temperature, humidity, wind force, holiday schedule. one year seasons (temperature, daytime), one week, one day (working day, week-end), a holiday, aperiodic events. Vadim Strijov Problem Statements in Time Series Forecasting 21 / 47
Source time series, one week 2.5 x 104 2.4 2.3 2.2 2.1 Value 2 1.9 1.8 1.7 1.6 1.5 20 40 60 80 100 120 140 160 Hours Vadim Strijov Problem Statements in Time Series Forecasting 22 / 47
The autoregressive matrix to forecast periodic time series There given the time series {s 1,...,s τ,...,s T 1 }, the length of a period is κ. One must to forecast the next sample T. The autoregressive matrix: its i-th row is a period of samples, its j-th column is a phase of the period and they map into the time series sample number such that (i 1)κ τ; let mod T κ = 0; X (m+1) (n+1) = s T s T 1... s T κ+1 s (m 1)κ s (m 1)κ 1... s (m 2)κ+1............ s nκ s nκ 1... s n(κ 1)+1............ s κ s κ 1... s 1. Vadim Strijov Problem Statements in Time Series Forecasting 23 / 47
The autoregressive matrix and the linear model s T s T 1... s T κ+1 X = s (m 1)κ s (m 1)κ 1... s (m 2)κ+1 (m+1) (n+1)............ s nκ s nκ 1... s n(κ 1)+1............ s κ s κ 1... s 1. In a nutshell, X = In terms of linear regression: s T 1 1 y m 1 y = Xw, x m+1 1 n X m n. y m+1 = s T = w T x T m+1. Vadim Strijov Problem Statements in Time Series Forecasting 24 / 47
The autoregressive matrix, five week-ends 5 10 15 Days 20 25 30 35 40 2 4 6 8 10 12 14 16 18 20 22 24 Hours Vadim Strijov Problem Statements in Time Series Forecasting 25 / 47
Model generation Introduce a set of the primitive functions G = {g 1,...,g r }, for example g 1 = 1, g 2 = x, g 3 = x, g 4 = x x, etc. The generated set of features X = g 1 s T 1... g r s T 1... g 1 s T κ+1... g r s T κ+1 g 1 s (m 1)κ 1... g r s (m 1)κ 1... g 1 s (m 2)κ+1... g r s (m 2)κ+1..................... g 1 s nκ 1... g r s nκ 1... g 1 s n(κ 1)+1... g r s n(κ 1)+1...................... g 1 s κ 1... g r s κ 1... g 1 s 1... g r s 1 Vadim Strijov Problem Statements in Time Series Forecasting 26 / 47
The one-day forecast, an example 2.3 2.2 2.1 2 x 10 4 History Forecast Value 1.9 1.8 1.7 1.6 1.5 5 10 15 20 Hours Vadim Strijov Problem Statements in Time Series Forecasting 27 / 47
Ill-conditioned matrix, or curse of dimensionality Assume we have hourly data on price/consumption for three years. Then the matrix X is (m+1) (n+1) 156 168, in details: 52w 3y 24h 7d; for 6 time series the matrix X is 156 1008, for 4 primitive functions it is 156 4032, m << n. The autoregressive matrix could be considered as ill-conditioned and multi-correlated. The model selection procedure is required. Vadim Strijov Problem Statements in Time Series Forecasting 28 / 47
How many parameters must be used to forecast? The color shows the value of a parameter for each hour. Parameters 22 20 18 16 14 12 10 8 6 4 2 5 10 15 20 Hours Estimate parameters w(τ) = (X T X) 1 X T y, then calculate the sample s(τ) = w T (τ)x m+1 for each τ of the next (m+1-th) period. Vadim Strijov Problem Statements in Time Series Forecasting 29 / 47
The sample set and its indexes There are given: the design matrix X = [x 1,...,x i,...,x m ] T, the target vector y and the data generation hypothesis y N(f,B), the type of models {f = X A w A A J}. The indexes of objects are {1,...,i,...,m} = I, the split I = B 1 B K ; features are {1,...,j,...,n} = J, the active set A J. Vadim Strijov Problem Statements in Time Series Forecasting 30 / 47
Hour by Hour Energy Forecasting Data: historical consumption and prices, multivariate time series. To forecast: hour-by-hour, the next day consumption and price. Solution: the autoregressive model generation and model selection. Vadim Strijov Problem Statements in Time Series Forecasting 31 / 47
Source time series, one week 2.5 x 104 2.4 2.3 2.2 2.1 Value 2 1.9 1.8 1.7 1.6 1.5 20 40 60 80 100 120 140 160 Hours Vadim Strijov Problem Statements in Time Series Forecasting 32 / 47
Quality of the forecasting model Mean absolute percentage error MAPE = 1 m m f i y i y i i=1 Mean squared error MSE = 1 m. m (f i y i ) 2. i=1 Vadim Strijov Problem Statements in Time Series Forecasting 33 / 47
Structure of energy consumption Vadim Strijov Problem Statements in Time Series Forecasting 34 / 47
Similarity of daily consumption Vadim Strijov Problem Statements in Time Series Forecasting 35 / 47
Sunrise bias: one-year daytime and consumption Vadim Strijov Problem Statements in Time Series Forecasting 36 / 47
Biased and original daytime to fit consumption over years Vadim Strijov Problem Statements in Time Series Forecasting 37 / 47
One-hour line, day-by-day during a year: autoregressive analysis Vadim Strijov Problem Statements in Time Series Forecasting 38 / 47
Daily similarity Vadim Strijov Problem Statements in Time Series Forecasting 39 / 47
Energy consumption one-week forecast Vadim Strijov Problem Statements in Time Series Forecasting 40 / 47
Time series with a bubble, example 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 The World Bank. Global Economic Monitor 2010. http://data.worldbank.org/data-catalog/global-economic-monitor. Vadim Strijov Problem Statements in Time Series Forecasting 41 / 47
Time series with a bubble, example 2 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 50 100 150 200 The World Bank. Global Economic Monitor 2010. http://data.worldbank.org/data-catalog/global-economic-monitor. Vadim Strijov Problem Statements in Time Series Forecasting 42 / 47
Time series with no bubble, example 3 1 0.9 0.8 0.7 0.6 0.5 0.4 0 50 100 150 200 The World Bank. Global Economic Monitor 2010. http://data.worldbank.org/data-catalog/global-economic-monitor. Vadim Strijov Problem Statements in Time Series Forecasting 43 / 47
Time series with no bubble, example 4 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0 50 100 150 200 The World Bank. Global Economic Monitor 2010. http://data.worldbank.org/data-catalog/global-economic-monitor. Vadim Strijov Problem Statements in Time Series Forecasting 44 / 47
Event forecasting; One must forecast a 1,T+1 M. There are N time series of length T (denote the element a n,t M) as the matrix a 1,1 a 1,2... a 1,T 1 a 1,T a 1,T+1 a 2,1 a 2,2... a 2,T 1 a 2,T a 2,T+1 A =........ a N,1 a N,2... a N,T 1 a N,T a N,T+1 Denote the time-lag and for the time series [a 1,t ],t { +1,...,T} form the matrix A t = a 1,t a 1,t +1... a 1,t 2 a 1,t 1 a 2,t a 2,t +1... a 2,t 2 a 2,t 1....... a N,t a N,t +1... a N,t 2 a N,t 1 and vectorize it to obtain the sample x t x t = [a 1,t,a 2,t,...,a N,t,a 1,t +1,...,a N,t 1 ] T. Set y t a 1,t. Vadim Strijov Problem Statements in Time Series Forecasting 45 / 47
Event forecasting is a classification problem Introduce the data set D = (X, y), where X = x T 1 x T 2. x T T and y = y 1 y 2. y T. Treat this classification problem as the logistic regression f(w,x) = or as another classification problem. The error function S(w) = i I 1 1+exp( Xw) y y i lnf(w,x i )+(1 y i )ln(1 f(w,x i ). Vadim Strijov Problem Statements in Time Series Forecasting 46 / 47
See mvr.svn.sourceforge.net/viewvc/mvr/lectures/strijov2012iam.metu.part1.pdf or for short bit.ly/k3i8zj The next 1 parameter estimation and bayesian inference, 2 feature generation, 3 feature selection. Vadim Strijov Problem Statements in Time Series Forecasting 47 / 47