Forecasting with R A practical workshop

Similar documents
Using Temporal Hierarchies to Predict Tourism Demand

Lecture 1: Introduction to Forecasting

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Intermittent state-space model for demand forecasting

Package thief. R topics documented: January 24, Version 0.3 Title Temporal Hierarchical Forecasting

Improving forecasting by estimating time series structural components across multiple frequencies

at least 50 and preferably 100 observations should be available to build a proper model

Advances in promotional modelling and analytics

Old dog, new tricks: a modelling view of simple moving averages

Package MAPA. R topics documented: January 5, Type Package Title Multiple Aggregation Prediction Algorithm Version 2.0.4

STAT 436 / Lecture 16: Key

477/577 In-class Exercise 5 : Fitting Wine Sales

Complex exponential Smoothing. Ivan Svetunkov Nikolaos Kourentzes Robert Fildes

Time Series and Forecasting

Forecasting. Dr. Richard Jerz rjerz.com

Lecture 2: Univariate Time Series

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

The SAB Medium Term Sales Forecasting System : From Data to Planning Information. Kenneth Carden SAB : Beer Division Planning

Time Series and Forecasting

A time series is called strictly stationary if the joint distribution of every collection (Y t

FinQuiz Notes

Some Personal Perspectives on Demand Forecasting Past, Present, Future

Elements of Multivariate Time Series Analysis

Forecasting using exponential smoothing: the past, the present, the future

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

ISyE 691 Data mining and analytics

Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique

Forecasting intermittent data with complex patterns

Automatic forecasting with a modified exponential smoothing state space framework

Statistical Methods for Forecasting

Package TimeSeries.OBeu

Demand Forecasting. for. Microsoft Dynamics 365 for Operations. User Guide. Release 7.1. April 2018

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

Introduction to Statistical modeling: handout for Math 489/583

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Forecasting. Simon Shaw 2005/06 Semester II

Automatic Forecasting

Summary of OLS Results - Model Variables

Complex Exponential Smoothing

Chapter 13: Forecasting

Austrian Inflation Rate

Univariate linear models

Model Selection. Frank Wood. December 10, 2009

Lecture 5: Estimation of time series

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Development of Demand Forecasting Models for Improved Customer Service in Nigeria Soft Drink Industry_ Case of Coca-Cola Company Enugu

FORECASTING USING R. Dynamic regression. Rob Hyndman Author, forecast

Forecasting: Principles and Practice. Rob J Hyndman. 12. Advanced methods OTexts.com/fpp/9/2/ OTexts.com/fpp/9/3/

Time series can often be naturally disaggregated

Modeling and forecasting global mean temperature time series

The prediction of house price

Forecasting. BUS 735: Business Decision Making and Research. exercises. Assess what we have learned

Forecasting Casino Gaming Traffic with a Data Mining Alternative to Croston s Method

Modified Holt s Linear Trend Method

Operations Management

Effective Strategies for Forecasting a Product Hierarchy

Assistant Prof. Abed Schokry. Operations and Productions Management. First Semester

CP:

Complex Exponential Smoothing

A Multistage Modeling Strategy for Demand Forecasting

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Homework 4. 1 Data analysis problems

APPLIED TIME SERIES ECONOMETRICS

Time Series: Theory and Methods

Frequency Forecasting using Time Series ARIMA model

Nonparametric time series forecasting with dynamic updating

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Modeling Spatial Relationships Using Regression Analysis

Chapter 3: Regression Methods for Trends

Multilevel Statistical Models: 3 rd edition, 2003 Contents

The ARIMA Procedure: The ARIMA Procedure

A State Space Framework For Automatic Forecasting Using Exponential Smoothing Methods

New York University Department of Economics. Applied Statistics and Econometrics G Spring 2013

Bangor Business School Working Paper (Cla 24)

A Short Introduction to Curve Fitting and Regression by Brad Morantz

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Ch. 12: Workload Forecasting

Forecasting Bangladesh's Inflation through Econometric Models

Statistics: A review. Why statistics?

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Improved Holt Method for Irregular Time Series

TIMES SERIES INTRODUCTION INTRODUCTION. Page 1. A time series is a set of observations made sequentially through time

Forecasting: principles and practice. Rob J Hyndman 4 Hierarchical forecasting

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

An Evaluation of Methods for Combining Univariate Time Series Forecasts

Automatic Autocorrelation and Spectral Analysis

Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa. Forecasting demand 02/06/03 page 1 of 34

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

Do we need Experts for Time Series Forecasting?

Lecture Data Science

Forecasting: Principles and Practice. Rob J Hyndman. 1. Introduction to forecasting OTexts.org/fpp/1/ OTexts.org/fpp/2/3

Information Sharing In Supply Chains: An Empirical and Theoretical Valuation

Estimating AR/MA models

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

Box-Jenkins ARIMA Advanced Time Series

Decision 411: Class 9. HW#3 issues

Statistics 910, #5 1. Regression Methods

More About Exponential Smoothing

Transcription:

Forecasting with R A practical workshop International Symposium on Forecasting 2016 19 th June 2016 Nikolaos Kourentzes nikolaos@kourentzes.com http://nikolaos.kourentzes.com Fotios Petropoulos fotpetr@gmail.com http://fpetropoulos.eu

A b o u t u s Nikos Associate Professor at Lancaster University Member of the Lancaster Centre for Forecasting Research interests: temporal aggregation and hierarchies, model selection and combination, intermittent demand, promotional modelling and supply chain collaboration Forecasting blog: http://nikolaos.kourentzes.com Fotios Assistant Professor at Cardiff University Forecasting Support Systems Editor of Foresight Director of the International Institute of Forecasters Research interests: behavioural aspects of forecasting and improving the forecasting process, applied in the context of business and supply chain Nikos and Fotios are the founders of the Forecasting Society (www.forsoc.net) 2

O u t l i n e o f t h e w o r k s h o p 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO Have fun and enjoy your day! 3

S e c t i o n 1 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 4

O v e r v i e w o f R S t u d i o 5

S e c t i o n 2 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 6

S e c t i o n 3 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 7

S e c t i o n 4 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 8

E x p o n e n t i a l S m o o t h i n g ( ets) The state space implementation of exponential smoothing considers the following combinations of error, trend and seasonality: Error: Additive or Multiplicative Trend: None, Additive or Multiplicative (damped or not) Season: None, Additive or Multiplicative The usual notation is ETS(Error, Trend, Season), so for instance: ETS(A,N,N) has additive errors, no trend and no season SES ETS(M,M,M) has all components multiplicatively 9

E x p o n e n t i a l S m o o t h i n g ( ets) We typically optimise ETS using MLE or equivalently minimise the augmented sum of squared errors criterion: For additive errors r(x t-1 ) = 1, so this is equal to the well known MSE: This is used to optimise both the smoothing parameters and the initial values. 10

E x p o n e n t i a l S m o o t h i n g ( ets) Having a likelihood allows us to use information criteria to select the best ETS model out of the 30 possible alternatives. A common choice is Akaike s Information Criterion: Given that time series often have limited sample size a better selection is to use AICc that is corrected for sample size. This is the default option in the forecast package. 11

A R I M A ( a u t o. a r i m a ) The function auto.arima allows automatic specification of SARIMA models. This is done as follows: Test for stationarity in a seasonal context using OCSB (up to 1 seasonal difference) Test for stationarity using KPSS (up to 2 differences) Difference appropriately based on the test results Start from a reasonable AR and MA order and search neighbouring specifications (max AR & MA order: 5, max SAR & SMA order: 2) Compare alternative models using AICc (default) and pick best. 12

T B AT S ( t b a t s ) TBATS uses Box-Cox transformation, exponential smoothing, trigonometric seasonality and ARMA errors: Box-Cox transform Deterministic and stochastic trend Trigonometric seasonlity ARMA errors 13

M u l t i p l e A g g r e g a t i o n P r e d i c t i o n A l g o r i t h m ( m a p a ) Step 1: Aggregation Step 2: Forecasting Step 3: Combination 1 Y ETS Model Selection l 1 1 b 1 s + 1 Yˆ k 2 k 3... k K 2 Y 3 Y... Y K ETS Model Selection ETS Model Selection... ETS Model Selection l 2 b 2 s l 3 b s 3... l b s 2 3 K K K 1 K 1 K 1 K l b s Strengthens and attenuates components Estimation of parameters at multiple levels Robustness on model selection and parameterisation 14

M u l t i p l e A g g r e g a t i o n P r e d i c t i o n A l g o r i t h m ( m a p a ) Transform states to additive and to original sampling frequency Combine states (components) Produce forecasts 15

T h e t a m e t h o d ( t h e t a ) First a time series is decomposed using classical multiplicative decomposition: Deterministic decomposition Stochastic decomposition In TStools to allow the seasonal pattern to evolve a pure seasonal model is used instead: Obviously when γ 0 then it is the deterministic case. 16

T h e t a m e t h o d ( t h e t a ) Then the deseasonalised time series is broken down in two lines: a linear trend long term trend 2 x (deseasonalised data - linear trend) inflate variability Each series is forecasted separately using linear regression and single exponential smoothing and their forecast is then combined: 17

T h e t a m e t h o d ( t h e t a ) Finally the forecast of the deseasonalised time series is re-seasonalised with the indices calculated previously to give the final forecast: 18

S e c t i o n 5 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 19

Croston s m e t h o d From the original series we first construct a nonzero demand series (z) 20

Croston s m e t h o d 5 19 3 3 26 1 11 2 11 1 The we create an interval series by counting every how many periods there is demand (x). 21 21

Croston s m e t h o d Forecast with SES 22

Croston s m e t h o d We divide the estimated demand and interval to produce the Croston forecast Demand Interval 23

S B A Syntetos and Boylan [2005] proposed an approximation that corrects the inversion bias in Croston s method. Croston Smooth demand size Smooth demand interval SBA Smoothing parameter of intervals 24

T S B M e t h o d The demand probability is equal to 1 when demand occurred. This series is as long as the original series 25

T S B M e t h o d The forecast is the product of the demand and probability estimates 26

T S B M e t h o d The decline in the forecast is because TSB models the obsolescence of the item. 27

C l a s s i f i c a t i o n For an ID time series we can calculate the non-zero demand (z) and the demand interval (x). Using these we can define: p v x s z z 2 Average demand interval Coefficient of variation of non-zero demand squared Using these we can classify the time series into groups better modelled with Croston s method or with SBA. 28

C l a s s i f i c a t i o n Coefficient of variation of non-zero demand squared Time series with low variability of demand and relatively low intermittency should be forecasted with Croston s method. The rest should be forecasted with SBA. Average demand interval 29

S e c t i o n 6 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 30

Sales S i m p l e r e g r e s s i o n Period Advertising (x) Sales (y) 1 15.0 153 2 17.5 198 3 12.0 147 4 8.5 104 5 9.5 131 6 12.5 159 7 14.5 160 8 11.0 124 Sales vs advertising 210 190 170 150 130 110 90 70 7.0 9.0 11.0 13.0 15.0 17.0 19.0 Advertising y = a + b x 31

L i n e a r r e g r e s s i o n o n t r e n d Time (t) Period Sales (y) 1 Q2-2007 270 2 Q3-2007 1119 3 Q4-2007 2315 4 Q1-2008 1703 5 Q2-2008 717 6 Q3-2008 6892 7 Q4-2008 4363 8 Q1-2009 3793 9 Q2-2009 5208 10 Q3-2009 7367 11 Q4-2009 8737 12 Q1-2010 8752 60000 50000 40000 30000 20000 10000 0 iphone sales over time y = a + b t The residuals should: Have mean zero Not be autocorrelated Are unrelated to the predictor variable Be normally distributed Have constant variance 32

R e s i d u a l d i a g n o s t i c s 33

M u l t i p l e r e g r e s s i o n y = b 0 + 3 i=1 b i Promo i + 3 j=1 b j+3 Promo_lagged j 34

S e c t i o n 7 1. Overview of R Studio 2. Introduction to R 3. Time series exploration Time series components, decomposition, ACF/PACF functions, 4. Forecasting for fast demand Naïve, Exponential Smoothing, ARIMA, MAPA, Theta, evaluation, 5. Forecasting for intermittent demand Croston s method, SBA, TSB, temporal aggregation, classification, 6. Forecasting with causal methods Simple and multiple regression, residual diagnostics, selecting variables, 7. Advanced methods in forecasting Hierarchical forecasting, ABC-XYZ analysis, LASSO 35

H i e r a r c h i c a l f o r e c a s t i n g Group 1 Company Group 2 Hierarchies may refer to: Product types Geographical allocation Channels SKU 1 SKU 2 SKU 3 SKU 4 SKU 5 Problem: forecasts are different at each aggregation level! Main approaches for reconciling hierarchical forecasts: Top-down approach: Forecast at the highest level and disaggregate using historical proportions Bottom-up approach: Forecast at the lowest level and aggregate the forecasts up to the required level Middle-out approach Optimal approach: optimally combines forecasts from each level 36

S h r i n k a g e e s t i m a t o r s Let us consider the two regression models from before: Two ideas: Instead of thinking X 3 being simply in or out of the model we can perceive it as a continuum, depending on the estimated coefficient c 3. Suppose we would keep the normalised coefficients small (close to zero) then the effect from variables would be minimal, i.e. our predicted variable would be less sensitive to changes in the explanatory variables. If we are unsure about including a variable we could be more conservative and include it with a smaller coefficient. Putting these together we get the so called shrinkage estimators. 37

S h r i n k a g e e s t i m a t o r s : L A S S O Although there are several one of the most popular ones is the: Least Absolute Shrinkage and Selection Operator (LASSO) The model is your conventional regression, the only difference is in how you estimate the coefficients. Using p independent variables X, we model dependent variable y that has n observations: But instead of OLS we use the lasso shrinkage estimator: Mean squared error Shrinkage of b 38

S h r i n k a g e e s t i m a t o r s : L A S S O Mean squared error Shrinkage of b As a variable is used more to fit better to the data, its coefficient will become bigger. As the coefficient becomes bigger the shrinkage penalty becomes bigger, pushing the coefficient to zero. Therefore lasso regression tries to keep variable coefficients small it balances over and underfit. 39

S h r i n k a g e e s t i m a t o r s : t h e e f f e c t o f λ The parameter λ controls the amount of shrinkage: If λ = 0, lasso becomes OLS. There is a λ that all variables will be excluded from the model. Low λ, coefficients are non-zero and large Very high λ all variable coefficients are zero Mid λ, only important coefficients are non-zero 40

H o w t o f i n d λ? Finding the λ parameter is not a trivial problem. The most common approach is to use cross-validation and pick the λ that provides good cross-validated error. What is cross-validation? 1. Take all the available in-sample data and split it into K parts (folds). 2. Fit the model in all 9 parts and test in the remaining one Test 3. Repeat until all K folds have been used as test Test Test 4. Measure the total error across all tests. This is the cross-validated error. The cross-validated error approximates the true prediction error and is more reliable than the in-sample fitting error. 41

Nikolaos Kourentzes email: nikolaos@kourentzes.com blog: http://nikolaos.kourentzes.com Fotios Petropoulos email: fotpetr@gmail.com site: http://fpetropoulos.eu Forecasting Society www.forsoc.net Lancaster Centre for Forecasting www.forecasting-centre.com/