On Decomposition and Combining Methods in Time Series Forecasting

On Decomposition and Combining Methods in Time Series Forecasting M.Tech Project First Stage Report Submitted in partial fulfillment of the requirements for the degree of Master of Technology by Unmesh Deshmukh Roll No: 05329012 under the guidance of Prof. Bernard Menezes Kanwal Rekhi School of Information Technology Indian Institute of Technology, Bombay Mumbai

Abstract Improving the accuracy of forecasting process is necessary to uplift the quality of management decisions. From the earlier work in this area one can conclude that preprocessing (decomposition) and post-processing (combining) when applied to the time series result in increase in the accuracy of the forecasts which is the main aim of the forecasting process. Here we make a survey of the decomposition and combining methods that have been proposed in the past. We also provide the results of the experiments carried out by us with changed definition of trend that incorporates finite look-ahead and the one in which we forecast using the trend series instead of the demand series. We also provide the frequency distribution plots of values forecasted using all experts and those forecasted by Top K (fixed K) experts. We observed that Top K experts have their frequency distribution closer to the actual value for most of the points meaning top K method is indeed beneficial. 1 Introduction Many management and control decisions are often influenced by the current market situation and on how it is expected to change in the near future. An accurate prior estimate of the possible changes in the key factors (e.g. estimate of probable increase in demand of a product) can go a long way in improving the quality of decisions taken. Forecasting deals with the prediction of such future events and normally uses some sophisticated statistical techniques on the maintained historical data to derive that prediction. Forecasting methods used in practice vary from domain to domain. Here we restrict ourselves to time series forecasting. The challenges here are: Decreasing the error in the forecast as much as possible Finding/Developing a relatively inexpensive, easy to maintain forecasting system that guarantees desired accuracy The problem of time series forecasting can be described as: To predict the future values x t+h, given the past values x t, x t 1, x t 2,,x t w+1 of the time series, where w is the length of the window and h 1 is the forecasting horizon. To obtain these predictions, normally statistical models such as ARIMA (Auto-regressive Integrated Moving Average) models[bgg03], exponential smoothing models etc. are used. These models are typically trained by using historical data and can then be used to get the forecasts for future points. The main aim here is to improve the forecasting accuracy. Many techniques such as decomposition, combining etc. are used for this purpose. 2 Literature Survey A time series is a sequence of observations taken sequentially in time. It can be seen as being composed of three components: Trend(change in the local mean of time series), Seasonality(repetition 1

of a particular pattern of observations after certain fixed time interval) and Irregular Component(Random noise). This calls for decomposition methods that will generate individual components series from the original series. These methods form one important part of the literature on forecasting. There exist various decomposition methods e.g. Holt-Winter method, Exponential Smoothing method, Exponential smoothing with a damped multiplicative trend [Tay03] etc. that are used to get these component series from the original series. Some of them are discussed in the next section. Another important issue covered in the literature on forecasting is the problem of combining vs model selection. Normally there are many forecasting methods(hereafter referred to as experts ) available, and so one has to decide whether to select only a single expert or to combine the forecasts of different experts in some way to get the final forecast. [Cle89] provides an exhaustive review of combining methods applied in different domains. The primary conclusion here is that forecast accuracy can be substantially improved through combination of multiple individual forecasts. The reason behind this was assumed to be that one expert could not entirely capture the details of underlying process but multiple experts could capture different aspects of it and so should give better result. The somewhat surprising finding is that simple combining methods perform as good as the sophisticated ones. This puzzling observation and some other interesting questions such as in which conditions are combined forecasts most effective, which methods to combine (similar or different methods) and accuracy improvement vs combining cost are raised by [Arm89] in his work. Armstrong also suggests methods like Rule-based Combining which demand development of empirically validated and fully disclosed rules for the selection and combination of methods. These rules can be updated from the knowledge of the situation. [HE05] have looked at the question of whether combining is always beneficial. They carried out certain set of experiments in that direction and concluded that the advantage of combinations is not that the best are better than the best individual forecasts, on average, but that selecting among combinations is less risky than selecting among individual forecasts. The recent work by [ZY04] says that combining tends to reduce the prediction error when there is a difficulty in identifying the best model. Although many model selection criteria exist, all of them suffer from the problem of instability. A model is unstable when a slight change in the data may result in selection of different model thus resulting in high variability in the forecasts. [ZY04] has identified two measures of stability of a model: sequential stability (measure of consistency of selection for different sample sizes) and perturbation stability (a minor perturbation of the data should not change the selected model(outcome) dramatically.) They have also proposed an algorithm called AFTER ( Aggregated Forecast Through Exponential Re-weighting ) Algorithm for combining forecasts. In this method the final forecast is a convex combination of forecasts made by different experts and each expert is given a weight that is updated at each point. The weight is given based on the previous performance of the expert and so experts performing well on the earlier points get higher weight-age. Moving to the methods used for decomposition and combining, lot of work has already been 2

done by [MGR + 05],[Set05] and [Sin05]. [MGR + 05] used decomposition techniques followed by application of multiple statistical as well as neural experts for each component series. All the resulting forecasts are joined by using cartesian product that yields a large number of experts. This exponential complexity of the number of experts prompted them to come up with techniques such as Greedy Elimination, Greedy Accretion and BFS Greedy Accretion that reduce the search effort required to find the best combination. [Set05] and [Sin05] further developed their work by inventing methods for dynamic combination of experts such as TopK, DTOPK (Dynamic TOP K), ETOPK (Exponential Top K). They also demonstrated that rank-based combining performs as good as weight-based combining [ZY04]. They made several important contributions such as ranking similarity measure to cluster a group of series and observance of inconsistent rank phenomenon. They have also used different decomposition methods by changing the definitions of trend, seasonality etc. parameters. The application of these methods resulted in average improvement of 10% over Holt-Winter method which is a standard method of forecasting. 3 Forecasting techniques and concepts The main objective of the time series analysis is to model a process, which is generating the data, to provide compact description and to understand the generating process. Decomposition(preprocessing) and combining(post-processing) when applied on the time series result in improvement of the accuracy of forecasts [MGR + 05]. After decomposing the series, statistical experts such as ARIMA etc. are applied to get the forecasts of component series. Forecasts of the individual components can be combined to get the final forecast by using operators + and. So there are 8 possible ways of doing this. Two of them are shown below: D t = T t S t I t Purely multiplicative model D t = T t + S t + I t Purely additive model We will now look at some of the existing methods used for this purpose. 3.1 Smoothing models As said earlier, time series is assumed to be composed of trend, seasonality and irregular component. We will now discuss about some of the methods for extracting these components from original series: 3.1.1 Holt s Method This method is used when a series has no seasonality and exhibits some form of trend. For a series x t, the k step ahead forecast function is given by x t+k = L t + kt t 3

where L t is the current level and T t is the current slope, the updating equations for level and slope are based on simple exponential smoothing and are as follows L t = αx t + (1 α)(l t 1 + T t 1 ) where 0 < α < 1 T t = β(l t L t 1 ) + (1 β)t t 1 where 0 < β < 1 The reasonable starting values for level and slope are L 1 = x 1, T 1 = x 2 x 1. 3.1.2 Holt-Winter Method The Holt-Winter s method is used for the time series that has both trend and seasonal components. There are two variants of this method, additive and multiplicative. The seasonality is multiplicative if the magnitude of the seasonal variation increases with an increase in the mean level of the time series. It is additive if the seasonal effect does not depend on the current mean level of the time series. The basic Holt-Winter forecasting method with multiplicative seasonality (S t ), trend (T t ) and seasonal index (I t ) is described by S t = α(d t /I t p ) + (1 α)(s t 1 + T t 1 ) T t = β(s t S t 1 ) + (1 β)t t 1 I t = γ(d t /S t ) + (1 γ)i t p Here p is the number of observation points in a cycle (p = 4 for quarterly data). α,β and γ are the smoothing constants. The forecast at time t for time t + i is: D t+i = (S t + i T t )I t p+i 3.1.3 Exponential Smoothing with damped multiplicative trend Like the seasonality component trend can also be additive or multiplicative. The multiplicative trend is modelled by smoothing successive ratios of the local level instead of the subtraction of levels done in the additive formulation. It has been observed that damped version of Holt s method gives good results as it offsets the overshooting of forecast beyond the data. On the similar line, [Tay03] has introduced the damped version (using extra parameter φ) of multiplicative trend formulation. This formulation is described by S t = αx t + (1 α)(s t 1 R φ t 1 ) R t = γ(s t /S t 1 ) + (1 γ)r φ t 1 and the forecast is given by X t+m = S t R t P m i=1 φi where S t is the level, R t is the ratio that models the multiplicative trend and X t+m is the m-step ahead forecast. 4

3.2 Decomposition and Combining Decomposition (preprocessing) and combining (post-processing) are the most important concepts in forecasting. Variety of methods have been developed for both of them and have yielded significant improvement. 3.2.1 Decomposition Methods The mathematical formulation of a practical basic decomposition method is as follows: 11 i=0 T t = D t i ( 12 Dt S t = avg, D (t p), D (t 2p), D ) (t 3p), T t T (t p) T (t 2p) T (t 3p) I t = D t T t S t where p is seasonality period. [Sin05] have experimented with some variations of the standard definitions of trend and seasonality. They have introduced a weight-based decomposition strategy that gives higher weight-age to the present data while averaging to compute the seasonality. The modified definition (called 4321 decomposition) of the seasonality component that takes into account only a finite past is S t = 0.4 Dt T t + 0.3 S t p + 0.2 S t 2p + 0.1 S t 3p where p is the seasonality period. Another variation of seasonality definition is based on the exponential smoothing method and uses a parameter α. That way the older data gets less and less weight. The definition of seasonality then is as follows: S t = h i Dt +α S T t p +α 2 S t 2p +α 3 S t 3p +... t [1+α+α 2 +α 3 +...] Here the denominator serves to normalize the values. [Sin05] also suggested the change in definition of trend from one-sided trend to centered trend. But that implies we should know equal number of points from the future and the past. So the approach suggested is to forecast some points say k using some standard technique such as holt-winter and use these forecasted values to compute the trend along with (12-k) actual values. This is called k-step lookahead. The experiments section contains the details of how this method was implemented by us and some other variations of the same. 5

3.2.2 Combining Methods Combining tends to reduce the variability in the forecast and so is used widely in forecasting. The simplest method is to use mean/median to combine the forecasts of different experts. Other strategies such as Greedy Elimination, Greedy Accretion etc. are also proposed. In the elimination approach, at each step an expert whose absence decreases the MAPE by largest amount is eliminated. Accretion follows the exact reverse of this approach. 3.2.3 Weight-based combining In this approach, forecasts of different experts are given weights prior to combining typically based on their performance in the past. AFTER[ZY04] is an example of such method. AFTER method considers forecasts of all experts for combining. It is also possible to have weight-based method that considers finite past only. 3.2.4 Rank-based combining There are other methods that select only a few experts and then use simple methods like mean/median to combine their forecasts. [Set05] developed a class of methods called Rank-based methods and they perform comparable to the weight-based combining methods. Some variants of the rankbased combining methods are: TopK: Experts are ranked on the basis of some measure of performance (such as MAPE) and the top k experts for previous instant are then combined to get the final forecast. [Set05] performed some experiments to find out the optimum K value, but no such single value was found as expected. DTOPK (Dynamic TopK): Here the parameter K is selected dynamically and may vary from point to point instead of being fixed statically. ETOPK (Exponential TopK): There are two ranking schemes: global scheme in which the entire past is considered and other is a local ranking scheme e.g. Exponential Rank which weighs the recent data more than the past data. In the experiments carried out, [Set05] observed a Inconsistent Rank Phenomenon meaning the local ranking at times moves away from global ranking. This implies that merely using Top K experts may not be sufficient as it is based on global ranking and so they came up with ETOPK method based on EWMAPE (Exponentially Weighted MAPE) error measure. EWMAPE is defined as shown below: EWMAPE(α,T) = T t=1 APE(t) α(t t) T ;α 1 t=1 α(t t) In effect, ETopK method of combining is a TopK method, but instead of using a uniform window of past performance measure (APE), it is using an exponentially decaying window, 6

thereby introducing a new parameter α. The TopK method is a special case of ETopK method with α = 1. 1C2C method: Analysis of the K values used by DTOPK algorithm against time done by [Set05] resulted in one interesting observation - The K value used to drop suddenly in the ending time instants. This indicated the inability of DTOPK method to detect the fact that lower K values can perform well at earlier time instants as well. This inability stems from the fact that DTOPK method considers the whole past performance while evaluating K. So the question now is to find the time boundary at which this shift in the optimal K value happens. Effectively time series can be seen as being made of clusters of time instants and for all instants in a cluster a particular value of K is optimal. The time boundary divides the series into two clusters otherwise the entire series can be considered as one single cluster and hence the name 1C2C (1 cluster 2 cluster). [Set05] provide an algorithmic formulation to find the time boundary and to decide whether to use 1C or 2C model. 3.3 Error Measures As the main aim in forecasting is to increase the accuracy of the forecasts, error measures are very important from a forecaster s point of view. There are variety of error measures that are used in practice. Some of them are listed below: 1. Mean Squared Error(MSE) : MSE = N (observation j prediction j ) 2 /N j=1 2. Root Mean Squared Error(RMSE) : RMSE = N (observation j prediction j ) 2 /N j=1 3. Normalized Mean Squared Error(NMSE) : N N (observation j prediction j ) 2 / (observation j mean) 2 j=1 4. Mean Absolute Percentage Error(MAPE) : j=1 N i=0 MAPE = 100 ( forecasted i actual i )/actual i N 7

5. Unbiased Absolute Percentage Error (UAPE) aka Symmetric Mean Absolute Percentage Error(SMAPE): UAPE = 100 N N i=0 forecasted i actual i (actual i + forecasted i )/2 6. Median Absolute Percentage Error (MdAPE): This measure is almost similar to the MAPE, but in MAPE mean is used for summarization whereas in MdAPE median is used for summarization across series. For the experiments that will be discussed in the coming sections MAPE is used as error measure. 4 Experiments Performed 4.1 Distribution plots of the forecasted values With the intent of studying the distribution of values predicted by various forecasting experts, distribution plots of the forecasts were plotted for four series: Abraham, Furniture, Beer and HSales series. We plotted the distribution graphs for every other point after skipping 20 points in the time series. Thus we plotted the distribution graphs for point 0, point 20, point 40 and so on. We also plotted the graphs that indicate the distribution of forecasts generated by Top K (K=10000, fixed statically) and compared them with those of all experts. These graphs are obtained by dividing the range from maximum forecasted value to minimum forecasted value into some fixed number of intervals (Here 100). Then based on the value, each forecast was classified in a particular interval and the frequency count of that interval was incremented. The graphs shown in figure 3 demonstrate the case when top K expert work well. This means if we take the mean of values predicted by top K experts and mean of values predicted by all experts, the former will be closer to actual value as compared to the later. The graphs shown in figure 6 demonstrate the case when top K experts don t perform well as compared to all forecasting experts. We made some interesting observations that are as follows: Spread of distribution generated by top K experts is less as compared to spread of distribution generated by all forecasting experts. Top K approach appears to work well in many cases meaning that if we take the mean of forecasts given by top K experts and those given by all experts the former is closer to actual value than the later. This can be easily observed in figure 3 just by visual inspection. For furniture series, all distribution plots were of the shape of a spike indicating very high confidence among experts and for such cases we may not need combining methods and thus save on the combining costs. Later in an attempt to find any locality in the shape of distribution (a group of points in close vicinity sharing some common shape of distribution) and/or periodicity (some shape of distribution 8

Figure 1: Distribution of values forecasted by all experts Figure 2: Distribution of values forecasted by Top 10000 experts Figure 3: Abraham Series: Point 20 Actual value=112539 possibly repeating after some fixed time interval) we plotted the distribution plots (generated by all experts) of all points in the series. For abraham series we did not found any locality or periodicity in the distribution plots. But for the Beer series, from visual inspection we can say that the plots showed periodicity (period=12) in the shape of the distribution to some extent especially in the later half of the series. We plan to explore the implications of this observation in future. 4.2 Different trend definitions The experiments were carried out on 21 time series. They exhibit seasonality with a period of 12. Different experts were used for each component of the series. In all we used 86 trend experts, 33 seasonality experts and 34 experts for the irregular component. This accounts to 86 33 34 = 96492 experts in total. 4.2.1 Trend definition with look-ahead The change in the definition of trend from one-sided to centered is suggested in the work by [Sin05]. But this shift to the definition of centered trend implies that we should know points in the future for calculation of trend. We can have variants of this approach as just one-point lookahead, two-point lookahead and so on. In the cheating case, where we know all the points in the series beforehand, calculation of trend is not a problem but for the non-cheating case we can estimate the values by using some standard technique such as Holt-Winter method. We conducted experiments for the cheating case and the results are as shown in the table 1. For the details about naming convention 9

Figure 4: Distribution of values forecasted by all experts Figure 5: Distribution of values forecasted by Top 10000 experts Figure 6: Abraham Series: Point 80 Actual value=148932 of the forecasting models refer to Appendix A. Results of non-cheating case for the same approach can be found in [Sin05]. 4.2.2 Forecasting trend component In the normal approach of one-step lookahead, while forecasting the demand value at instant t, we forecast it using some standard method and use that value to calculate the trend at point t-1. Now since we know the actual value at instant t-1, we can find seasonality and irregular component. All these components can then be forecasted and combined to get the final forecast at point t. In this experiment we tried another approach which is as follows: When we want to forecast the demand value at instant t, we know the trend up to instant t-2 and actual demand values up to instant t-1. Now instead of forecasting the demand series, we forecast the trend series using 86 trend experts and combine there forecasts using DTOPK algorithm at instant t-1. Using the forecasted trend and actual value, we calculate the seasonality and irregular component at instant t-1. We now again repeat the forecasting cycle and get the values of components at instant t and then combine them to get a final forecast at time t. After we get the final demand series forecast at time t we can calculate the more accurate trend value at instant t-1 using the newly forecasted demand series value. The results of this approach are compared with those of earlier decomposition techniques in the table 2. The last column in that table corresponds to the results of this approach. For the details about naming convention of the forecasting models refer to Appendix A. 10

Series HW T0-1.S1-1 T0-2.S1-1 T0-3.S1-1 T0-4.S1-1 T0-5.5.S1-1 abraham 3.26 2.63 2.49 2.68 2.64 2.64 beer 2.52 2.01 1.94 1.96 1.92 1.96 dry 8.26 8 7.63 7.8 7.76 7.9 equip 0.97 0.7 0.67 0.65 0.61 0.57 fortif 7.9 7.25 6.91 7.2 7.26 7.15 gasoline 2.7 2.32 2.33 2.23 2.21 2.12 hsales 10.17 7.4 7.36 6.91 6.75 6.34 merchandise 0.85 0.71 0.7 0.68 0.65 0.64 motorparts 2.21 1.42 1.35 1.32 1.31 1.29 newcar 4.13 3.45 3.34 3.36 3.33 3.29 paper 5.19 4.62 4.55 4.51 4.46 4.46 red 9.57 8.29 7.97 8.43 8.41 8.6 rose 15.12 12.04 11.91 11.78 11.47 11.74 shoe 3.02 2.78 2.69 2.81 2.83 2.84 software 3.86 3.18 3.1 3.14 3.05 2.91 spaper 8.39 7.37 7.17 6.68 6.56 6.59 spark 12.79 12.27 12.08 12.7 12.58 12.64 stores 0.92 0.86 0.84 0.81 0.79 0.77 sweet 15.8 14.54 14.23 13.85 13.66 14.25 total 0.92 0.6 0.6 0.59 0.57 0.54 wine 7.55 6.59 6.46 6.78 6.63 6.75 Improvement 0 16.01 18.26 18.07 19.4 19.95 Table 1: Decomposition with finite lookahead 11

Series HW T1-0.S1-1 T1-1[HW].S1-1 T1-2[HW].S1-1 T1-1[DT].S1-1 T2-1.S1-1 abraham 3.26 2.8 2.95 2.98 2.96 2.94 beer 2.52 2.35 2.26 2.22 2.21 2.23 dry 8.26 8.68 8.58 8.63 8.66 8.61 equip 0.97 0.75 0.76 0.77 0.75.75 fortif 7.9 7.91 7.97 7.98 7.91 7.95 gasoline 2.7 2.55 2.55 2.62 2.55 2.55 hsales 10.17 7.84 7.95 7.91 7.86 7.82 merchandise 0.85 0.78 0.77 0.76 0.77 0.77 motorparts 2.21 1.54 1.56 1.56 1.53 1.53 newcar 4.13 3.92 3.88 3.82 3.86 3.85 paper 5.19 5.16 5.16 5.06 5.01 4.94 red 9.57 9.31 9.39 9.41 9.34 9.33 rose 15.12 13.07 13.1 13.2 13.03 12.94 shoe 3.02 3.02 3.1 3.08 3.1 3.16 software 3.86 3.39 3.47 3.5 3.44 3.39 spaper 8.39 8.1 8.14 8.13 8.09 8.16 spark 12.79 13.3 13.08 13.21 13.21 13.2 stores 0.92 0.91 0.92 0.9 0.92.91 sweet 15.8 15.8 15.62 15.34 15.43 15.43 total 0.92 0.65 0.65 0.66 0.66 0.66 wine 7.55 7.42 7.34 7.34 7.37 7.32 improvement 0 8.19 7.91 7.95 8.3 8.46 Table 2: MAPEs using various definitions of trend in Decomposition 5 Conclusion and Future Work The earlier work in this field has underlined the fact that decomposition and combining leads to increase in the forecasting accuracy. The experiments that we carried out signify that: Comparison between frequency distribution plots of values forecasted by all experts and those forecasted by top K experts revealed that for most of the points top K approach works better. Changing the definition of trend from one-sided to centered gives improvement in the accuracy as indicated in the table 1. The approach in which we forecast the trend first rather than the demand series gives some improvement as compared to earlier approaches as can be seen from table 2. 12

We plan to explore the below mentioned directions of work in future: From the distribution plots we observed that for some points, actual value was outside of the range generated by maximum and minimum forecasted value at that point. Such points can be treated as outliers. We plan to verify whether other outlier detection procedures indeed classify those points as outliers. In the experiment described in the previous section in which we forecast the trend first, we combine the forecasts of the 86 trend experts prior to finding seasonality and irregular component. Another approach is to get the seasonality and irregular component for each trend forecast. This will result in T (S + I) forecasts that can then be combined to get the final forecast at instant t. This approach is currently being explored. Currently we are using just MAPE as the error measure for our experiments. We plan to try different error measures like MdAPE (Median APE) etc. in future. MdAPE may work well for series that have more outliers as median is not affected by outlier. We may also include neural networks in our expert pool as they are known to capture some nontrivial patterns in the data and study their effect on accuracy. Our final aim is to come up with a rule-based system wherein we have rules that will guide our choice of experts to be used, decision to use combining or model selection and selection of experts to be used in combining etc References [Arm89] [BGG03] [Cle89] [HE05] J. Scott Armstrong. Combining Forecasts: The End of the Beginning or the Beginning of the End? International Journal of Forecasting, 1989. G. E. Box, G.M.Jenkins, and G.C.Reinsel. Time Series Analysis. Pearson Education, 2003. Robert T. Clemen. Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 1989. Michele Hibon and Theodoros Evgeniou. To combine or not to combine: selecting among forecasts and their combinations. International Journal of Forecasting, 2005. [MGR + 05] Bernard Menezes, Pankaj Gulhane, Timma Reddy, Kalam Shah, and S.A.Soman. Forecasting using Decomposition and Combination of Experts. Technical Report, Kanwal Rekhi School of Information Technology, IIT Bombay, 2005. [Set05] Abhishek Seth. On Using a Multitude of Time Series Forecasting Models. Master s Thesis, Kanwal Rekhi School of Information Technology, IIT Bombay, 2005. 13

[Sin05] [Tay03] [ZY04] Rajveer Singh. On Using Various Decomposition Methods in Time Series Forecasting. Master s Thesis, Kanwal Rekhi School of Information Technology, IIT Bombay, 2005. James W. Taylor. Exponential smoothing with a damped multiplicative trend. International Journal of Forecasting, 2003. Hui Zou and Yuhong Yang. Combining time series model for forecasting. International Journal of Forecasting, 2004. 14

Appendix A Naming Convention for Forecasting Models The naming convention used for our forecasting models is described here. The name of the model consist of two parts: Trend part and Seasonality part separated by. (period). The general form of the name of model is as follows: Here, TX-L[Forecast Method].SX s -P T marks the start of the trend component part X can take the following values: 0 Cheating case (The one where we know the entire series beforehand) 1 Data series is forecasted 2 Trend series is forecasted followed by combining 3 Trend series is forecasted and combining NOT used prior to finding the seasonality and irregular component L is the look-ahead value so L = 0,1,2,... [Forecast Method]: This contains a two letter abbreviation of the forecasting method within square brackets if lookahead is used e.g. HW for Holt-Winter method. S marks the start of the seasonality component part X s indicates the type of window (window of past values). It can take following values: 0 Linear window (weights change linearly within the window) 1 Exponential window (weights change exponentially within the window) P stands for the value of the parameter α for exponential window 15