International Journal of Statistics and Applications 2015, 5(2): 91-97 DOI: 10.5923/j.statistics.20150502.07 Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression Usoro Anthony Effiong Department of Mathematics and Statistics, Akwa Ibom State University, Mkpat Enin, Nigeria Abstract This paper considered estimation of long-range parameters of a seasonal model using regression approach. Multiple linear regression model was deduced from SARIMA (5, 0, 0)x(0, 1, 0) 4 model. The data used were quarterly data of Nigerian gross domestic products from 1997 to 2012, CBN Statistical Bulletin, 2012(x10 6 ). The Multiple linear regression model was fitted to the data, and the necessary diagnostic check through ACF and PACF revealed model reliability. From the model, forecast function for gross domestic products was obtained. From the basic statistics, the values of the forecast give better results. Keywords SARIMA Model, Multiple Regression, Time Series 1. Introduction Seasonality in a time series is a regular pattern of changes that repeats over S time-periods, where S defines the number of time-periods until the pattern repeats again. For example, there is seasonality in monthly data for which high values tend always to occur in some particular months and low values tend always to occur in other particular months. In this case, S=12(months per year) is the span of the periodic seasonal behaviour. For quarterly data, S=4 time periods per year. A seasonal ARIMA model or SARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is SARIMA (p, d, q)x(p, D, Q) s, with p=non-seasonal AR order, d=non-seasonal differencing, q=non-seasonal MA order, P=seasonal AR order, D=seasonal differencing, Q=seasonal MA order, and S=time span of repeating seasonal pattern. The above model is, Φ(B)ψ(B s ) = θ(b)θ(b s )e t (1) The non-seasonal components are: AR: Φ(B) = 1 - Φ 1 B Φ 2 B 2 -... - Φ p B p (2) MA: θ(b) = 1 + θ 1 B + θ 2 B 2 +... + θ q B q (3) The seasonal components are: Seasonal AR: ψ(b s ) = 1 - ψ 1 B s ψ 2 B 2s -... - ψ p B Ps (4) * Corresponding author: toskila2@yahoo.com (Usoro Anthony Effiong) Published online at http://journal.sapub.org/statistics Copyright 2015 Scientific & Academic Publishing. All Rights Reserved Seasonal MA: Θ(B s ) = 1 + Θ 1 B s + Θ 2 B 2s +... + Θ Q B Qs (5) In the left hand side of the equation 1, the seasonal and non-seasonal AR components multiply each other, and on the right hand side, the seasonal and non-seasonal MA components multiply each other [16]. In statistics, linear regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X. A case of one explanatory variable is Simple Linear Regression. For more than one explanatory variable, it becomes Multiple Linear Regression, [5]. In linear regression, data are modelled using linear predictor functions, and unknown model parameters. Such models are linear models [8]. Linear regression was the first type of regression analysis studied rigorously and extensively in practical applications [20]. Linear regression has many practical uses. Most applications fall into one of the following two broad categories; prediction/forecasting and measuring the strength of relationship between the dependent and independent variables. In statistics and econometrics, a distributed lag model is a model for time series data in which a regression equation is used to predict current values of a dependent variable based on both the current and lagged (past period) values of an explanatory variable, [9]. The simplest way to estimate parameters associated with distributed lags is by ordinary least squares, assuming a fixed maximum lag p, assuming identically independently distributed errors [10]. In time series, if there are long-range parameters, it becomes easier to estimate the parameters associated with the distributed lags of the dependent variables using regression method. This is a case of univariate time series,
92 Usoro Anthony Effiong: Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression and the model is autoregressive model. If the distributed variables include both the dependent and independent variables, the model becomes multivariate time series model, [17]. Many researchers have used regression approach to estimate parameters of time series model with both short range and long-range dependence. Long Range Dependence, also called long memory or long- range persistence, is a phenomenon that may arise in the spatial or time series data. It relates to the rate of statistical dependence, with the implication that this decays more slowly than an exponential decay, typically a power-like decay. LRD has application in various fields, such as internet traffic modelling, econometrics, hydrology, linguistics and the earth sciences, (Wikipedia, the free encyclopedia). Different mathematical definitions of LRD are used for different context and purposes, [2], [7], [15], [3], [18], [13]. [14] carried out log periodogram regression of time series with long-range dependence. The paper discussed the estimation of multiple time series models, which allow elements of the spectral density matrix to tend to infinity or zero. A form of log-periodogram regression estimate of differencing and scale parameters was proposed, which according to the paper, can provide modest efficiency improvements over a previously proposed method (for which no satisfactory theoretical justification seems previously available) and further improvements in a multivariate context when differencing parameters are a priori equal. [1] proposed statistical methods for data with long-range dependence. [6] obtained efficient location and regression estimation for long-range dependence regression models. [11] obtained M-estimators in linear models with long-range dependence errors. [19] estimated a regression model with long memory stationary errors. [12] estimated parameters in linear regression with long-range dependence errors. [21] modelled long memory time series. In time series modelling, it is very common that the order of stationary time series model is always limited to maximum of order 2. This is evident in most of the research works and publications in the areas of time series. This does not negate the fact that parameterization may be necessary in some cases to completely specify a model. Parameterization means the process of deciding and defining the parameters necessary for a complete or relevant specification of a model. Sometimes, a model with short-range dependence may need possible extension to accommodate more parameters. Graph 1. Plot of the original Graph 2. Plot of the differenced
International Journal of Statistics and Applications 2015, 5(2): 91-97 93 This paper intends to apply the reduced form of the regression variables to estimate the parameters of the SARIMA models with long-range dependence in the form of multiple linear regression model, so as to check if the deduced regression model has compared favourably with the direct method of estimating multiplicative SARIMA model. 2. Method The initial investigation requires the application of [4] methodology. For proper identification and choice of a model, plots of the original, differenced series, ACF and PACF are necessary. Graphs 1 and 2 are the plots of the original and differenced series. The above ACF and PACF of the first order seasonal differenced series suggest SARIMA (5, 0, 0) x (0, 1,0) 4. The model is specified with long-range parameters because the PACF values are significant up to lag 5, and insignificant from the sixth lag. The ACF show gradual decay in its values from the first lag to subsequent lags. The model requires parameterization (increase in the number of parameters up to the fifth lag due to the significant effect of the PACF at lag 5). The general form of the model is, Graph 3. ACF of the Differenced Series (DXt) Graph 4. Plot PACF of the differenced Series DXt
94 Usoro Anthony Effiong: Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression The above model is expanded as follows, (1 - Φ 1 B Φ 2 B 2 Φ 3 B 3 Φ 4 B 4 Φ 5 B 5 )(1 B 4 ) = Є t (6) -4 - Φ 1-1 + Φ 1-5 Φ 2-2 + Φ 2-6 Φ 3-3 + Φ 3-7 Φ 4-4 + Φ 4-8 - Φ 5-5 + Φ 5-9 = Є t. -4 = Φ 1 (-1 - -5 ) + Φ 2 (-2 - -6 ) + Φ 3 (-3 - -7 ) + Φ 4 (-4 - -8 ) + Φ 5 (-5 - -9 ) = Є t. (7) Let -4 = Y t, -1-5 = Y t-1, -2-6 = Y t-3, -4-8 = Y t-4, -5-9 = Y t-5. Equation 7 reduces to Y t = Φ 1 Y t-1 + Φ 2 Y t-2 + Φ 3 Y t-3 + Φ 4 Y t-4 + Φ 5 Y t-5 +Є t (8) Equation 8 is a multiple linear regression model with the lags of Y t as the independent variables, while Y t is the dependent variable. The usual assumption Є t ~(0, σσ ee 2 ). 3. Analyses and Results The regression of Y t on Y t-1, Y t-2, Y t-3, Y t-4, and Y t-5 produces the following parameter estimates for the predictive model of Y t, YY tt = 0.980YY tt 1 + 0.141YY tt 2 0.063YY tt 3 0.806YY tt 4 + 0.707YY tt 5 (9) Table 1. Regression Estimates Predictor Coefficient St. Dev T P S = 0.3079 Y t-1 0.9798 0.1035 9.47 0.000 Y t-2 0.1412 0.1365 1.03 0.306 Y t-3-0.0631 0.1399-0.45 0.654 Y t-4-0.8058 0.1405-5.74 0.000 Y t-5 0.7073 0.1080 6.55 0.000 Table 2. Analysis of Variance Source DF SS MSS F-ratio P Regression 5 37.8853 7.5771 79.92 0.000 Error 50 4.7403 0.0948 Total 55 42.6256 Table 3. Basic Statistics Variable N N* Mean Median SD SE Mean 64 0 4.225 3.185 3.017 0.377 Est 55 9 4.707 4.004 2.959 0.399 e t 55 9 0.0411 0.0264 0.2934 0.0396 Equation 9 is the predictive model of 8. The parameters of the model lie within unit circle. This implies there is no violation of invertibility condition of a stationary time series. The regression estimates in Table 1 indicate the significant effect of Y t-1, Y t-4 and Y t-5. This further justifies the parameterization of the model to include up to the fifth lagged variable. Analysis of variance, as shown in Table 2 indicates overall fitness of the model into the data. Table 3 presents the basic statistics of the actual, estimated values of and as well as residual values. This clearly shows that the assumption of the error is not violated, as the mean is approximately zero with minimum standard deviation. In addition, the model has given good estimates compared to the actual values. Evidence is shown in Table 4 and graph 5. T Original Table 4. Original and Estimated values of Estimated T Original Estimated 1 0.98 33 3.17 3.5619 2 1.05 34 3.40 3.2286 3 1.05 35 3.92 3.4838 4 1.11 36 4.08 4.0043 5 0.91 37 3.99 4.1582 6 0.99 38 4.43 3.9982 7 1.02 39 4.99 4.8129 8 1.06 40 5.17 5.1707 9 1.09 41 4.74 5.1057 10 1.17 1.1600 42 4.85 5.0012 11 1.18 1.2067 43 5.52 5.3049 12 1.24 1.2499 44 5.54 5.5797 13 1.65 1.0972 45 5.54 5.3174 14 1.67 1.7163 46 5.72 5.8447 15 1.65 1.7360 47 6.46 6.3320 16 1.73 1.7039 48 6.58 6.6101 17 1.64 1.8410 49 5.46 6.2538 18 1.72 1.6929 50 5.87 5.5939 19 1.73 1.6416 51 6.61 6.3879 20 1.80 1.7537 52 6.85 6.5800 21 1.86 2.0714 53 7.43 6.5363 22 1.94 1.8930 54 8.04 7.6514 23 1.97 1.9431 55 9.06 8.9824 24 2.02 2.0525 56 9.46 9.3210 25 2.44 1.9678 57 8.55 8.7997 26 2.48 2.5025 58 9.44 8.9960 27 2.48 2.5293 59 9.86 9.9857 28 2.52 2.5518 60 9.55 10.005 29 2.63 2.6561 61 9.14 9.6063 30 2.59 2.6797 62 9.84 9.6444 31 2.99 2.5540 63 10.97 10.6751 32 3.20 2.9811 64 10.59 11.1501
International Journal of Statistics and Applications 2015, 5(2): 91-97 95 Graph 5. Plot of actual and estimated values Graph 6. ACF of the Residual Graph 7. PACF of the Residual
96 Usoro Anthony Effiong: Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression Apart from the basic statistics of the error values displayed in Table 3, the analysis requires further explanation about the behaviour of the error values after estimation of the model parameters. The ACF and PACF as shown in Graphs 6 and 7 were necessary to check for the distribution of the error. It is evident that the error (e t ) ~NN(0, σσ ee 2 ). 4. Forecasts Equation 9 is the estimated model of equation 8. This is the reduced form of equation 7. By substitution, the equation 7 becomes -4 = 0.98(-1 - -5 ) + 0.1412(-2 - -6 ) 0.0631(-3 - -7 ) 0.8058(-4 --8 ) + 0.7073(-5 - -9 ) the model to a multiple linear regression model, whose parameters can be estimated with ordinary least squares method. The reduced form of the model included Y t-1, Y t-2, Y t-3, Y t-4 and Y t-5 lagged variables of the observed time series variable Y t, with associated parameters Φ 1, Φ 2, Φ 3, Φ 4 and Φ 5 respectively. The parameters of the regression model were estimated, and the estimates obtained from the fitted model are compared favourably with the actual values of the gross domestic products (see graph 5). Diagnostic check through the ACF and PACF of the residual values is a clear indication that there is much improvement in this model and approach adopted on the previously proposed models for Nigerian Gross Domestic Products. The values of the forecast obtained from the model are more accurate and reliable planning purposes. XX tt = 0.980-1 + 0.1412-2 0.0631-3 + 0.1942-4 0.2727-5 0.1412-6 + 0.0631-7 + 0.8058-8 0.7073-9. The forecast equation is given by, XX tt (kk)= 0.980+k-1 + 0.1412+k-2 0.0631+k-3 + 0.1942+k-4 0.2727+k-5 0.1412+k-6 + 0.0631+k-7 + 0.8058+k-8 0.7073+k-9 Where t is the time for each forecast, k is the lead time. Table 5. Quarterly Forecast of Nigerian Gross Domestic Products Lead (k) Quarters Actual Forecast 1 X 63 10.9700 10.6752 2 X 64 10.5900 11.1500 3 X 65-9.8789 4 X 66-10.7359 5 X 67-11.2782 6 X 68-10.9189 7 X 69-10.3284 8 X 70-11.0041 5. Summary and Conclusions The method of modelling in this paper is not at variance with [4] approach to time series modelling. Preliminary investigation was carried out with the plots of ACF and PACF for proper choice of the model. The ACF and PACF of the first order differencing suggested SARIMA (5, 0, 0) X(0, 1, 0). This implies that the PACF of the seasonally differenced series exhibited significant cut off up to the fifth lag, and became insignificant from the sixth lag to the last. Parameterization was required in the choice of the model. With the parameterization, there was a long-range dependence on the parameters. The objective was to reduce REFERENCES [1] Beran, Jan. (1992): Statistical methods for data with long range dependence. Statistical Science 7, pp 404-1047. [2] Beran, Jan (1994): Statistics for Long Memory Processes. CRC press. [3] Beran et al. (2013): Long memory processes: Probabilistic Properties and Statistical methods. Springer. [4] Box, G. E. P. And Jenkins, G. M. (1976): Time Series Analysis; Forecasting and Control 1st Edition, Holden-day, san Francisco. [5] David A. Freedman (2009): Statistical Models; Theory and Practice, Cambridge University Press p.26. [6] Dahihaus, R. (1995): Efficient location and regression estimation for long range dependence regression models. Annals of Statistics 23, pp1029-1047. [7] Doukhan et al. (2003): Theory and Applications of Long Range Dependence. Birkhäuser [8] Hilary L. Seal (1967): The Historical development of the Gauss Linear model. Biometrika 54(1/2): 1-24. [9] Jeff B. Cromwell, et al (1994): Multivariate Test for Time Series Models. SAGE publications, Inc ISBN 0-8039-5440-9. [10] Judge, George, et al (1980): The theory and practice of Econometrics, Wiley Publications. [11] Koul, H. L. (1992): M-estimators in linear models with long-range dependence errors. Statistics Probability Letters, 14, pp. 153-164. [12] Liudas Giraitis and Hira Koul (1997): Estimation of the dependence parameters in linear regression with long range dependence errors. Stochastic Process and their Applications Vol. 71, Issue 2, pp 207-224. Elsevier doi:10.1016/s0304-41 49(97)00061-6. [13] Malamud, Bruce D. And Turcotte, Donald L. (1999): Self-Affine Time Series. I. Generation and Analyses. Advances in Geophysics 40: 1-90. doi:10.1016/s0065-2687 (08)60293-9.
International Journal of Statistics and Applications 2015, 5(2): 91-97 97 [14] Robinson, P. M. (1995): Log-Periodogram Regression of Time Series with Long Range Dependence. The Annals of Statistics. Volume 23, No.3, pp.1048-1072. [15] Samorodnitsky, Gennady (2007): Long range dependence. Foundation and Trends in Stochastic Systems. [16] Https://onlinecourses.science.psu.edu/stat 510/node/67. [17] Usoro, A. E and Omekara, C. O (2008): Bilinear Autoregressive Vector Models and their Application to Revenue Series. Asian Journal of Mathematics and Statistics 1(1): 50-56. [18] Witt, Annette and Malamud, Bruce D. (2013): Quantification of Long-Range Persistence in Geophysical Time Series; Conventional and Benchmark-Based Improvement Techniques. Surveys in Geophysics (springer) 34(5):51-651. doi:10.1007/s10712-012-9217-8. [19] Yajma, Y. (1988): On estimation of a regression model with long memory stationary errors. Annals of Statistics 16, pp 791-807. [20] Yan, Xin (2009): Linear Regression Analysis: Theory and Computing. World Scientific, pp. 1-2. [21] Yushihiro Yajima (1985): On the estimation of long memory time series models. Australian Journal of Statistics, Vol. 27 Issue3, pp303-320.