FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Similar documents
Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Agriculture Update Volume 12 Issue 2 May, OBJECTIVES

STUDY ON MODELING AND FORECASTING OF MILK PRODUCTION IN INDIA. Prema Borkar

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Univariate ARIMA Models

ARIMA modeling to forecast area and production of rice in West Bengal

Time Series I Time Domain Methods

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

AE International Journal of Multi Disciplinary Research - Vol 2 - Issue -1 - January 2014

Estimation and application of best ARIMA model for forecasting the uranium price.

Problem Set 2: Box-Jenkins methodology

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Dynamic Time Series Regression: A Panacea for Spurious Correlations

5 Autoregressive-Moving-Average Modeling

Scenario 5: Internet Usage Solution. θ j

Classic Time Series Analysis

FE570 Financial Markets and Trading. Stevens Institute of Technology

STAT Financial Time Series

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Study on Modeling and Forecasting of the GDP of Manufacturing Industries in Bangladesh

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

AR(p) + I(d) + MA(q) = ARIMA(p, d, q)

Time Series Analysis -- An Introduction -- AMS 586

Sugarcane Productivity in Bihar- A Forecast through ARIMA Model

Minitab Project Report - Assignment 6

Univariate Time Series Analysis; ARIMA Models

A stochastic modeling for paddy production in Tamilnadu

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Implementation of ARIMA Model for Ghee Production in Tamilnadu

7. Forecasting with ARIMA models

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

The Identification of ARIMA Models

Automatic seasonal auto regressive moving average models and unit root test detection

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Suan Sunandha Rajabhat University

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

Gaussian Copula Regression Application

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Box-Jenkins ARIMA Advanced Time Series

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Forecasting Egyptian GDP Using ARIMA Models

Quantitative Finance I

ARIMA Models. Richard G. Pierse

at least 50 and preferably 100 observations should be available to build a proper model

Ch 6. Model Specification. Time Series Analysis

IDENTIFICATION OF ARMA MODELS

Forecasting. Simon Shaw 2005/06 Semester II

Forecasting Bangladesh's Inflation through Econometric Models

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Author: Yesuf M. Awel 1c. Affiliation: 1 PhD, Economist-Consultant; P.O Box , Addis Ababa, Ethiopia. c.

Lecture 19 Box-Jenkins Seasonal Models

Advanced Econometrics

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

Final Examination 7/6/2011

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

Applied time-series analysis

Econometrics II Heij et al. Chapter 7.1

Review Session: Econometrics - CLEFIN (20192)

Forecasting the Prices of Indian Natural Rubber using ARIMA Model

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Empirical Market Microstructure Analysis (EMMA)

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

International Journal of Advancement in Physical Sciences, Volume 4, Number 2, 2012

Lecture on ARMA model

TMA4285 December 2015 Time series models, solution.

Statistical analysis and ARIMA model

ISSN Original Article Statistical Models for Forecasting Road Accident Injuries in Ghana.

Forecasting using R. Rob J Hyndman. 2.5 Seasonal ARIMA models. Forecasting using R 1

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

ARIMA Modelling and Forecasting

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

AR, MA and ARMA models

Lecture 2: Univariate Time Series

Evaluation of Some Techniques for Forecasting of Electricity Demand in Sri Lanka

Econometric Forecasting

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

Romanian Economic and Business Review Vol. 3, No. 3 THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

UNIVARIATE TIME SERIES ANALYSIS BRIEFING 1970

Application of ARIMA Models in Forecasting Monthly Total Rainfall of Rangamati, Bangladesh

Rainfall Forecasting in Northeastern part of Bangladesh Using Time Series ARIMA Model

Prediction of Grain Products in Turkey

Unit root problem, solution of difference equations Simple deterministic model, question of unit root

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

3 Theory of stationary random processes

Transcription:

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL B. N. MANDAL Abstract: Yearly sugarcane production data for the period of - to - of India were analyzed by time-series methods. Autocorrelation and partial autocorrelation functions were calculated for the data. Appropriate Box-Jenkins autoregressive integrated moving average model was fitted. Validity of the model was tested using standard statistical techniques. The forecasting power of autoregressive integrated moving average model was used to forecast sugarcane production for three leading years. Key words: ACF = autocorrelation function, ARIMA = autoregressive integrated moving average, ARMA = autoregressive moving average, PACF = partial autocorrelation function, sugarcane. Introduction: Autoregressive Integrated Moving Average (ARIMA) model was introduced by Box and Jenkins (hence also known as Box-Jenkins model) in s for forecasting a variable. An effort is made in this paper to develop an ARIMA model for sugarcane production in India and to apply the same in forecasting sugarcane production for the three leading years. ARIMA method is an extrapolation method for forecasting and, like any other such method, it requires only the historical time series data on the variable under forecasting. Among the extrapolation methods, this is one of the most sophisticated methods, for it incorporates the features of all such methods, does not require the investigator to choose the initial values of any variable and values of various parameters a priori and it is robust to handle any data pattern. As one would expect, this is quite a difficult model to develop and apply as it involves transformation of the variable, identification of the model, estimation through non-linear method, verification of the model and derivation of forecasts. In what follows, we first explain the ARIMA model, then develop the same for sugarcane production using yearly data for India during - to - and finally apply the same to forecast the values of the variable during the future years. Theoretical Basis of Time-Series Analysis: A time series is a set of values of a continuous variable Y (Y, Y,...,Y n ), ordered according to a discrete index variable t (,,..., n). The term time-series comes from econometric studies in which the index variable refers to intervals of time measured in a suitable scale. However, it must be clearly stated that this direct reference to time is not required: actually, any different meaning can be attributed to the index variable, provided PhD scholar, IASRI, New Delhi-, mandal_stat@rediffmail.com

that it is able to order the Y values. In general, in a given time series the following can be recognized and separated (): ) a regular, long-term component of variability, termed trend, that represents the whole evolution pattern of the series; ) a regular, short-term component whose shape occurs periodically at intervals of s lags of the index variable, currently known as seasonality, because this term is also derived by applications in economics; ) an AR(p) autoregressive component of p order, which relates each value Z t = Y t (trend and seasonality) to the p previous Z values, according to the following linear relationship Z = φ Z + φ Z + + φ Z + ε () t t t... p t p t where φi i (i=,..., p) are parameters to be estimated and ε t is a residual term; and ) a MA(q) moving average component of q order, which relates each Z t value to the q residuals of the q previous Z estimates Z = ε θ ε θ ε θ ε () t t t t... q t q where θ i (i=,..., q) are parameters to be estimated. The theory of time-series analysis has developed a specific language and a set of linear operators. According to Box and Jenkins (), a highly useful operator in time-series theory is the lag or backward linear operator (B) defined by BZt = Zt Consider the result of applying the lag operator twice to a series: B( BZ ) = BZ = Z t t t Such a double application is indicated by B, and, in general, for any integer k, it can be written k B Zt = Zt k By using the backward operator, Equation [] can be rewritten as Zt φ Zt φzt... φpzt p = εt = φ( B) Zt () where φ (B) is the autoregressive operator of p order defined by φ B = φ B φ B φ B ( )... p p Similarly, Equation [] can be written as Zt = ε t θε t θε t... θqε t q = θ ( B) ε t ()

where θ (B) indicates the moving average operator of q order defined by θ B = θ B θ B θ B ( )... q q The autoregressive and moving average components can be combined in an autoregressive moving average (ARMA) (p, q) model Z = φ Z + φ Z +... + φ Z + ε θ ε θ ε... θ ε t t t p t p t t t q t q or in lag operator form ( φ B φ B... φ B ) Z = ( θ B θ B... θ B ) ε. p q p t q t Finally, φ( B) Z = θ ( B) ε () t t In a preliminary analysis of a series it is useful to independently evaluate the long- and short-term periodic components, which are essential to define the regular structure of the series. The trend component can be evaluated by fitting a regular function, a polynomial, or a more complicated general function. The seasonal component can be estimated by a seasonal decomposition procedure, which calculates a seasonal index based on the ratio of the observed values to the moving average. In the final stage of series modeling, however, both the trend and the seasonal component will be integrated in the ARMA (p, q) process (). For the trend, such an integration is obtained by using the difference linear operator ( ), defined by Yt = Yt Yt = Yt BYt = ( B) Yt A single application of the operator corrects the data for a linear increasing trend, whereas its repeated use for d times corrects for a trend that can be fitted by a d-order d polynomial. The stationary series Z t obtained as the dth difference ( ) of Y t, d d Z = Y = ( B) Y t t t can be then modeled by an ARMA (p,q) process. The combined use of the operator and the ARMA (p, q) process results in an ARIMA (p, d, q) model. Furthermore, ARIMA can account for the seasonal component of s lag period, by using both correlations between Z t and Z t-s values and those between the corresponding residuals ε t and ε t-s. In mathematical terms, therefore, a seasonal ARIMA model is an ARIMA (p,d,q) model whose residuals ε t can be further modeled by an ARIMA(P,D,Q) s structure with linear operators (P,D,Q) being functions of the B s operator.

The operators of a seasonal ARIMA model, defined as (p,d,q) (P,D,Q) s, can be expressed as follows: p AR (p) nonseasonal operator of p order, φ( B) = φ B φ B... φ B ; s sp AR (P) seasonal operator of P order, φ( B) = φb... φp B ; q MA (q) nonseasonal operator of q order, θ ( B) = θ B θ B... θ B ; MA (Q) seasonal operator of Q order, θ ( B) = θ B s θ B s... θ B Qs ; and d d difference operator of d order, = ( B). The Box-Jenkins methodology for analyzing and modeling time series is characterized by three steps: ) Model identification, ) parameter estimation, and ) model validation. Model identification defines the (p, d, q) orders of the AR and MA components, both seasonal and nonseasonal. In this step, fundamental analytical tool is the autocorrelation functions. The autocorrelation function (ACF) and partial ACF (PACF) are very important for the definition of the internal structure of the analyzed series. The ACF ρ (k) at lag k of the Z t series is the linear correlation coefficient between Z t and Z t-k, calculated for k =,,... p q Q ρ k = C ov( Z, Z ) t t k Var( Z ) Var( Z ) t t k The PACF is defined as the linear correlation between Z t and Z t-k, controlling for possible effects of linear relationships among values at intermediate lags. Theoretically, both an AR (p) process and an MA (q) process should be associated with well-defined patterns of ACF and PACF, usually decreasing exponential or alternate in sign or decreasing sinusoidal patterns. A precise correspondence between ARMA (p, q) processes and defined ACF and PACF patterns is more difficult to recognize. When the order of at least one of the two components (AR or MA) is clearly detectable, however, the other can be identified by attempts in the following step of parameter estimation. Finally, the existence of a seasonal component of length s is underlined by the presence of a periodic pattern of period s in the ACF. Once a suitable ARIMA (p, d, q) (P,D,Q) s structure is identified, subsequent steps of parameter estimation and model validation must be performed. Parameter estimates are usually obtained by maximum likelihood, which is asymptotically correct for time series. Estimators are usually sufficient, efficient, and consistent for Gaussian distributions and are asymptotically normal and efficient for several non-gaussian distribution families. Validation of the goodness of fit of an ARIMA model can be developed according to the following steps: ) Evaluation of statistical significance of parameters by the usual comparison between the parameter value and the standard deviation of its estimate. For a test statistic that is

valid only asymptotically, a parameter whose value exceeds twice its standard error can be considered significant. ) Analysis of the ACF of residuals. In this step, residuals (ε t ) are considered as a new time series, and ACF and PACF are estimated to be sure that values at lag k > are not statistically different from zero. For prediction purposes, ARIMA models are different from the analytical functions of time: Z t =f (t), because ARIMA forecasting uses previous values of the series and errors in the previous estimates. Actually, this peculiarity of ARIMA forecasting is valid in the short term because parameters of the model cannot account, in the long term, for changes in the dynamics of the series. Building ARIMA model for sugarcane production data and Forecasting: To fit an ARIMA model requires a sufficiently large data set. In this study, we used the data for sugarcane production for the period - to -. As we have earlier stated that development of ARIMA model for any variable involves three steps: identification, estimation and verification. Each of these three steps is now explained for sugarcane production. Year Sugarcane production (million tonnes) Year Sugarcane production (million tonnes) -. -. -. -. - -. -. -. -. -. -. -. -. -. -. -. -. -. -. -. - -. -. -. -. -. -. -. -. - -. -. -. -. -. -. -. -.

-. -. -. -. -. -. -. -. -. -. -. -. -. -. -. Model identification: ARIMA model is estimated only after transforming the variable under forecasting into a stationary series. The stationary series is the one whose values vary over time only around a constant mean and constant variance. There are several ways to ascertain this. The most common method is to check stationarity through examining the graph or time plot of the data. Fig reveals that the data is nonstationary. Non-stationarity in mean is corrected through appropriate differencing of the data. In this case difference of order was sufficient to achieve stationarity in mean. The newly constructed variable X t can now be examined for stationarity. The graph of X t was stationary in mean. The next step is to identify the values of p and q. For this, the autocorrelation and partial autocorrelation coefficients of various orders of X t are computed (Table ). The ACF and PACF (fig and ) shows that the order of p and q can at most be. We entertained three tentative ARIMA models and chose that model which has minimum AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). The models and corresponding AIC and BIC values are ARIMA (p, d, q) AIC BIC...... So the most suitable model is ARIMA (,, ) as this model has the lowest AIC and BIC values. Model estimation and verification: Model parameters were estimated using SPSS package. Results of estimation are reported in table. The model verification is concerned with checking the residuals of the model to see if they contain any systematic pattern which still can be removed to improve on the chosen ARIMA. This is done through examining the autocorrelations and partial autocorrelations of the residuals of various orders. For this purpose, the various correlations upto lags were computed and the same along with their significance which is tested by Box-Ljung test are provided in table. As the results indicate, none of these correlations is significantly different from zero at a reasonable level. This proves that the selected ARIMA model is an appropriate model. The ACF and PACF of the

residuals (fig and ) also indicate good fit of the model. So the fitted ARIMA model for the sugarcane data is.... t t t t t Z Z Z Z ε = + + + () YEAR, not periodic....... Value sugarcane_production Graph of sugarcane production data Figure : Time plot of sugarcane production data Lag Number... -. -. ACF Lower Confidence Upper Confidence Coefficient ACF of differenced data Figure : ACF of differenced data

PACF of differenced data Partial ACF... -. Coefficient Upper Confidence Lower Confidence -. Lag Number Figure : PACF of differenced sugarcane data Lag Autocorrelation Std.Error Lag Partial Autocorrelation Std.Error.... -.. -.. -.. -.......... -.. -.. -...... -...... -.. -.. -.. -.... -.......... Table : Autocorrelations and partial autocorrelations Estimates Std Error t Approx Sig Non-Seasonal Lags AR.... AR -.. -.. Constant.... Number of Residuals Number of Parameters

Residual df Adjusted Residual Sum of Squares. Residual Sum of Squares. Residual Variance. Model Std. Error. Log-Likelihood -. Akaike's Information Criterion (AIC). Schwarz's Bayesian Criterion (BIC). Table : Estimates of the fitted ARIMA model Lag Autocorrelation Std.Error Box-Ljung Statistic df Sig. Value -..... -..... -..... -.......... -.......... -..... -..... -..... -.................... Lag Partial Autocorrelation Std.Error -.. -.. -.. -.... -.... -.... -.. -..

-...... Table : Autocorrelation and partial autocorrelations of residuals ACF of residuals.. Coefficient Upper Confidence Lower Confidence ACF. -. -. Lag Number Figure : ACF of residuals of fitted ARIMA model PACF of residuals Partial ACF... -. Coefficient Upper Confidence Lower Confidence -. Lag Number Figure : PACF of residuals of fitted ARIMA model

Forecasting with ARIMA model: ARIMA models are developed basically to forecast the corresponding variable. There are two kinds of forecasts: sample period forecasts and post-sample period forecasts. The former are used to develop confidence in the model and the latter to generate genuine forecasts for use in planning and other purposes. The ARIMA model can be used to yield both these kinds of forecasts. Sample period forecasts: The sample period forecasts are obtained simply by plugging the actual values of the explanatory variables in the estimated equation ().The explanatory variables here are the lagged values of Z t and the estimated lagged errors. The so obtained values for Z t together with the actual values of Z t are shown in table. Year Actual production (Million tonnes) Estimated production (Million tonnes) Residual Lower CL Upper CL -..... -.. -... -. -... -.. -... -..... -.. -... -..... -.. -... -..... -.. -... -.... -.. -... -..... -..... -..... -..... -.. -... -..... -..... -..... -..... -.. -... -..... -.. -... -..... -.. -... -..... -..... -.. -... -.. -... -.....

-..... -..... -.. -... -.. -... -.. -... -..... -..... -..... -..... -..... -.... -.. -... -..... -..... -.. -... -..... -.. -... -.. -... -..... -.. -... -..... -.. -... -..... -..... -..... Table : Actual and estimated values of sugarcane production and % confidence limit (CL) To judge the forecasting ability of the fitted ARIMA model, important measures of the sample period forecasts accuracy were computed. The Mean Absolute Percentage Error (MAPE) for sugarcane production turns out to be.. This measure indicates that the forecasting inaccuracy is low. Post sample forecasts: The principal objective of developing an ARIMA model for a variable is to generate post sample period forecasts for that variable. This is done through using equation (). The forecasts for sugarcane production during to are given in lower part of table. Conclusions: ARIMA model offers a good technique for predicting the magnitude of any variable. Its strength lies in the fact that the method is suitable for any time series with any pattern of change and it does not require the forecaster to choose a priori the value of any parameter. Its limitations include its requirement of a long time series. Often it is called a

Black Box model. Like any other method, this technique also does not guarantee perfect forecasts. Nevertheless, it can be successfully used for forecasting long time series data. In our study the developed model for sugarcane production was found to be ARIMA (,, ). From the forecast available by using the developed model, it can be seen that forecasted production for the year - is lower than - but in later years the production increases. The validity of the forecasted values can be checked when the data for the lead periods become available. The model can be used by researchers for forecasting of sugarcane production in India. However, it should be updated from time to time with incorporation of current data. References: ) Box, G.E.P., and G. M. Jenkins.. Time series analysis: forecasting and control. Holden Day, San Francisco, CA. ) Brockwell, P.J., and Davis, R. A.. Introduction to time series and forecasting. Springer. ) Kendall, M. G., and A. Stuart.. The advanced theory of statistics. Vol.. Design and Analysis and Time-Series. Charles Griffin & Co. Ltd., London, United Kingdom.