The value of competitive information in forecasting FMCG retail product sales and category effects

The value of competitive information in forecasting FMCG retail product sales and category effects Professor Robert Fildes r.fildes@lancaster.ac.uk Dr Tao Huang t.huang@lancaster.ac.uk Dr Didier Soopramanien d.soopramanien@lancaster.ac.uk

Outline The research question Literature summary Our contributions Incorporating competitive information Account for the change of the market environment Data and experimental design Results and insights Conclusion

The Research question We forecast retailer product sales (demand) at the product level (e.g. SKU/UPC) Accurate forecasts are important for inventory planning (e.g. to avoid over-stock and out-of-stock conditions). We want to improve the accuracy! But surely this has been done!

100% The shape of the data series 60%

What has been proposed? Many retailers are using simple statistical methods to initially generate baseline forecasts and then rely on managers to make adjustments for promotional events. - Cooper et al. (1999): PromoCast to estimate the adjustments based on historical information - Fildes et al. (2009): Mechanisms to help managers improving their adjustments. Other studies proposed technically sophisticated methods trying to utilizing the price/promotional information of the focal product more effectively. Aburto and Weber (2007): ANN; Ali et al. (2009): Regression tree.

How we contribute? We incorporate competitive information Competitive price and competitive promotions Strong influencing factors on product sales The data are available Previous studies all overlooked the competitive information in forecasting We account for the change of the market environment In reality the effect of price/promotions change over time Ignoring this fact leads to forecast bias We validate our proposals

Define competitive information The high dimensionality problem: too many predictors typically 100-200 items within each product category, impossible to reduce or even estimate the model if we take them all. Method 1: we apply variable selection method Most famous stepwise regression? Heavily criticized for retaining irrelevant variables and ignoring relevant variables (see Harrell, 2001) Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996; Turlach 2000) Autometrics (Hendry and Krolzig) We use a combine of stepwise regression and LASSO (but surely there are alternative algorithms!)

Define competitive information The high dimensionality problem: too many predictors typically 100-200 items within each product category, impossible to reduce or even estimate the model Method 2: we apply the Principal Component Analysis (PCA) To condense a large number of competitive explanatory variables into a handful set of diffusion indexes (DI) Have good performance in forecasting macroeconomic variables (Stock and Watson, 2002)

Incorporate competitive information We incorporate the following competitive information Explanatory variables selected by LASSO/stepwise OR Diffusion indexes constructed by PCA) into Autoregressive Distributed Lag (ADL) models and then simplify the model following the general-to-specific modelling strategy (Hendry 1995) The econometric model has good interpretability and also proved to be effectively in other areas: Tourism data in Song and Witt (2003); Airline passenger flow data in Fildes, Wei et al. (2009).

An example: The general ADL model Start with a general model: Product Simplify Sales the (in logs) For simplicity here we do not show weekly indicators and dummies for calendar events.

An example: The general ADL model Start with a general model: Lag of Product Sales For simplicity here we do not show weekly indicators and dummies for calendar events.

An example: The general ADL model Start with a general model: Lags of own price/promotions For simplicity here we do not show weekly indicators and dummies for calendar events.

An example: The general ADL model Start with a general model: Lags of competitive price/promotions (selected by LASSO/stepwise); OR Lags of diffusion indexes (constructed by PCA)

The change of the market environment The effect of price and promotions (on product sales) change over time owing to: Economic condition (more price/promotion sensitive during economic crunch) Consumer tastes change Competitive activities New product entry And the change of any other driving factors which are related to price and promotions but not included in the model

What happens if we ignore it? If we compromise the model with constant parameters when in fact the effects of price and promotions are changing over time: The model will be subject to structural break And be exposed to forecast failure, i.e. forecasts are biased and forecast error variance also slightly inflated, overall forecasting performance are poor compared to the model s in-sample fit (Clements and Hendry, 1999)

An example of how structural break causes forecast bias 16 14 12 Simulated data (y sales, x price, x~ Unif(0,1), u~ Unif(0,1) ) y = 10 2x + u Sales 10 8 6 4 2 Consumers demand increase but they also become more price sensitive (in reality, the timing of the change is UNKNOWN y = 14 3x + u Actual 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 Weeks

An example of how structural break causes forecast bias 16 14 12 Now we build a model with constant parameters y = 10 2x + u Sales 10 8 6 4 2 y =12.4 2. 3x The deterministic mean of the model with constant parameters will be a WEIGHTED AVERAGE for the data before and after the structural break y = 14 3x + u Actual Predict 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 Weeks

An example of how structural break causes forecast bias 16 14 12 The model obviously under-forecast in the forecast period y = 10 2x + u Sales 10 8 6 4 2 0 y =12.4 2. 3x The deterministic mean of the model with constant parameters will be a WEIGHTED AVERAGE for the data before and after the structural break 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 Weeks y = 14 3x + u Forecast bias Actual Predict

Test for structural break ADL models with LASSO/step wise OR diffusion factors: subject to structural break 100% 50% 0% Percentage of Models Subject to Structural Break (Chow test, a=0.05) 96% 92% 83% 88% 91% 88% 92% 98% 100% 100% 90% 95% 100% 100% Specification Rolling

Offsetting the forecast bias Models subject to structural break are exposed to forecast bias. If we can mitigate this bias, we may improve the forecasting performance. One way is to allow the parameters to varying over time: E.g. AR(1): y = int+ βx + u; βt = ηβt 1 + et ; int = r int + ε t t 1 Performance is poor- the presumed function form can hardly explain how the effect of price and promotions change over time. t

Offsetting the forecast bias Models subject to structural break are exposed to forecast bias. If we can mitigate this bias, we may be able to improve the forecasting performance. Alternatively, we estimate and then offset the forecast bias! Intercept correction

An Example of Intercept Correction 16 Estimate the forecast bias based on the data around the forecast origin. E.g. we take an average of the errors, assuming they are ALL caused by forecast bias 14 12 10 Sales 8 6 4 2 Estimate the forecast bias Forecast bias Actual Predict 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 Weeks

An Example of Intercept Correction 16 14 12 10 Then we offset the bias in the forecast period using the estimated bias Sales 8 6 4 2 Offset the forecast bias Actual Predict 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 Weeks

A trade-off against forecast bias Models subject to structural break are exposed to forecast bias. Rather than offsetting the forecast bias, we may take a tradeoff between the forecast bias and (reduced) forecast error variance by combining the forecasts generated by the models with various lengths of estimation window- Estimated Window Combining (EWC) (Pesaran and Timmermann, 2007)

An example of combining forecasts 16 14 12 Simulated data (y sales, x price, x~ Unif(0,1), u~ Unif(0,1) ) y = 10 2x + u Sales 10 8 6 4 2 Ideally, we would use the data AFTER the structural break, but the break time is UNKNOWN y = 14 3x + u Actual Predict 0 Weeks

An example of combining forecasts 16 Simulated data (y sales, x price, x~ Unif(0,1), u~ Unif(0,1) ) 14 Sales 12 10 8 6 4 2 0 We can only use the data close to the forecast origin- the model may not be subject to structural break, but will have inflated forecast error variance (because of less information used) Forecast 1 Actual Predict Weeks

Estimation with full sample data 16 Simulated data (y sales, x price, x~ Unif(0,1), u~ Unif(0,1) ) 14 Sales 12 10 8 6 4 2 0 On the other extreme, we can use ALL the data in the sample, thus we have biased forecasts but the forecast error variance is smaller (compared to the previous scenario) Forecast 2 Actual Predict Weeks

Combining forecasts Thus we can take a trade-off between (incurring) forecast bias and (reducing) forecast error variance: we estimate the same model with various estimation windows: y = int+ βx Estimate the model using data [80, 100], generate forecasts as Forecast 1 Estimate the model using data [1, 100], generate forecasts as Forecast 2 Finally we take an average of forecast 1 and forecast 2, the final forecasts may be more accurate (explained by the philosophy of forecast combination)

Data Dominick s Finer Foods, a large retail chain in Chicago area in the U.S (available from Chicago University website) Unit sales, price, and promotions at the UPC level; weekly data Promotions include Simple price discount (75%), Bonus buy (25%), and Coupons (less than 1%), we use one variable to represent. Aggregate across 83 stores based on All Commodity Volume (the revenue of the store) 122 items in 6 product categories including Soft Drinks, Frozen-Juices, Canned Soup, Bath Soap, Front-end-Candies, and Bathroom Tissue. Items are selected with relatively large sales volumes

Experiment design Fixed window rolling forecast : Estimation period- 120 weeks; forecast 1, 1-4, and1-12 weeks ahead 70 rolling events for each item Model specification: 200 weeks; Ideally the model could be re-specified every time. This can be simplified assuming foreknowledge of the data, and the model that would ideally be selected (Fildes et al. 2009) Error measures: MAPE, symmetric MAPE and MASE, AvgRelMAE

Candidate models and results Candidate models of two dimensions: 1) competitive information and 2) offsetting forecast bias Models of 2 dimensions No competitive information Ignoring the change of market environment Intercept Correction (IC) 25.6% symmetric MAPE ADL-OWN; Basetimes-lift ADL-OWN-IC 32.6% Estimation window combining (EWC) ADL-OWN-EWC LASSO/stepwise ADL ADL-IC ADL-EWC Diffusion Factor ADL-DI ADL-DI-IC ADL-DI-EWC Here we show the symmetric MAPE results for forecast horizon 1-12 weeks ahead, results based on other error measures are similar.

Candidate models and results Incorporating competitive information does improve forecasting accuracy. ADL and ADL-DI both outperform ADL-OWN. Models of 2 dimensions No competitive information Ignoring the change of market environment Intercept Correction (IC) Estimation window combining (EWC) symmetric MAPE ADL-OWN 25.6% ADL-OWN-IC ADL-OWN-EWC LASSO/stepwise ADL + 23.8% ADL-IC ADL-EWC Diffusion Factor ADL-DI ++ 23.0% ADL-DI-IC ADL-DI-EWC Better performance

Candidate models and results Accounting for the change of the market environment does improve forecasting accuracy. Models with IC and EWC all outperform their counterparts. Models of 2 dimensions Ignoring the change of market environment Intercept Correction (IC) Estimation window combining (EWC) symmetric MAPE 23.9% 24.1% No competitive ADL-OWN 25.6% ADL-OWN-IC ++ ADL-OWN-EWC + information 23.0% 23.3% LASSO/stepwise ADL 23.8% ADL-IC ++ ADL-EWC + 22.5% 22.8% Diffusion Factor ADL-DI 23.0% ADL-DI-IC +++ ADL-DI-EWC + IC and EWC improve the performance of the models with and without competitive information Better performance

Results and insights We can improve the forecasting accuracy by incorporating competitive information: PCA and LASSO/stepwise ADL-DI versus ADL-OWN 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -6.1% -4.6% -3.5% Non-promoted -14.6% -12.0% -8.6% ADL versus ADL-OWN 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -3.3% -0.2% 1.2% Non-promoted -10.2% -7.0% -4.6% ADL and ADL-DI substantially outperform ADL-OWN when the focal product is not being promoted. A possible reason is retailers try to avoid promoting competing products at the same time, so if the focal product is being promoted, their tend to be less promotional information on other competitive items

Results and insights We can improve the forecasting accuracy by offsetting potential forecast bias using Intercept Correction (IC) and Estimation Window Combining (EWC) ADL-OWN-IC versus ADL-OWN 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -2.0% -0.8% 0.7% Non-promoted -9.6% -10.2% -9.7% ADL-OWN-EWC versus ADL- OWN 1-12 wks ahead 1-4 wks ahead 1 week ahead Promoted -0.3% 0.0% -0.6% Non-promoted -8.3% -8.0% -7.1% In the absence of competitive information, by offsetting the potential forecast bias of the ADL-OWN model, we achieve substantially higher forecasting accuracy, mainly for the forecast period when the focal product is not being promoted.

Results and insights We can improve the forecasting accuracy by offsetting potential forecast bias using Intercept Correction (IC) and Estimation Window Combining (EWC) ADL-EWC versus ADL 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted 0.8% 0.6% -0.9% Non-promoted -3.2% -4.2% -4.4% ADL-IC versus ADL 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -1.1% -0.9% -0.7% Non-promoted -4.6% -5.1% -4.6% WITH competitive information, by offsetting the potential forecast bias of the ADL model, we achieve substantially higher forecasting accuracy, mainly for the forecast period when the focal product is not being promoted.

Results and insights We can improve the forecasting accuracy by offsetting potential forecast bias using Intercept Correction (IC) and Estimation Window Combining (EWC) ADL-DI-EWC versus ADL-DI 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -1.7% -2.8% -4.0% Non-promoted -6.2% -7.3% -6.8% ADL-DI-IC versus ADL-DI 1-12 wks 1-4 wks 1 week ahead ahead ahead Promoted -3.7% -5.0% -4.9% Non-promoted -7.8% -8.3% -6.9% WITH competitive information, by offsetting the potential forecast bias of the ADL-DI model, we achieve substantially higher forecasting accuracy, mainly for the forecast period when the focal product is not being promoted.

Summary We can improve the forecasting accuracy by Incorporating competitive information PCA and LASSO/stepwise Accounting for the change of the market environment. Intercept Correction (IC) and Estimation Window Combining (EWC) The advantage of the new models mainly come from the forecast period when the focal product is not on promotion The best model is the ADL model with diffusion indexes and intercept correction (i.e. ADL-DI-IC)

Thank you! Questions? Professor Robert Fildes r.fildes@lancaster.ac.uk Dr Tao Huang t.huang@lancaster.ac.uk Dr Didier Soopramanien d.soopramanien@lancaster.ac.uk