FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL S.Poyyamozhi 1, Dr. A. Kachi Mohideen 2. 1 Assistant Professor and Head, Department of Statistics, Government Arts College (Autonomous), Kumbakonam 612 002. Tamilnadu, India. 2 Assistant Professor, Department of Statistics, Periyar EVR College (Autonomous), Trichy 620 023. Tamilnadu, India. ABSTRACT Cotton is an important crop in India. This study focuses on forecasting the cultivated area and production of cotton in India using Autoregressive Integrated Moving Average (ARIMA) model. Time Series data covering the period of 1955-2015 was used for this study. The study revealed that ARIMA (0, 1, 0) are the best fitted model for forecasting of cotton production in India respectively. The analysis shows that if present growth rates continue then the cotton production in the year 2025 will be 42.53 million bales of 170 kg of each respectively. Key words: Forecasting, Autoregressive Integrated Moving Average (ARIMA) model, cotton production. 1. INTRODUCTION Cotton is the most important crop of the world. India ranks first in the world in respect of acreage of with about 8 million hectares under cotton cultivation and fourth in total seed cotton production. In India Gujarat is the largest producer of cotton. Cotton requires an average annual temperature of over 16ºC and an annual rainfall of at least 50 cm distributed throughout the growing season. A daily minimum temperature of 16ºC is required for germination and 21 27ºC for proper vegetative growth. It can tolerate temperatures as high as 43ºC but does not do well if the temperature falls below 21ºC. Crop area estimation and forecasting of crop yield are an essential procedure in supporting policy decision regarding land use allocation, food security and environmental issues. Statistical techniques able to provide crop forecast with reasonable precessions well in advanced. Various approaches have been used for forecasting such agricultural systems. Concentration have been given on the univariate time series models known as Auto Regressing Integrated Moving Average (ARIMA) models, which are primarily due to world of Box and Jenkins (1970). Conducted an empirical study of modeling and forecasting time series data of rice production in Pakistan by Box and Jenkins (1976). Similar studies have been done by Rachana et al. (2010) for forecasting pigeon pea production in India by using ARIMA Modeling and N. M. F. Rahman (2010) for forecasting of boro rice production in Bangladesh. Najeeb Iqbal et al. (2005) also use the ARIMA Model for forecasting wheat area and production in Pakistan.Debnath et.al for forecasting area production and yield of cotton in India (2013).To reveal the growth pattern and to make the best forecast of cotton area, production and yield in India by using ARIMA Model for the years 2016-2020. The present study has been carried out on the basis of cotton area, production and yield data pertaining the period 1955 to 2015.Which were collected from secondary (Directorate of Economics and Statistics, Department of Agriculture and Cooperation) source. The time series data of cotton area, production and yield were modeled by Box-Jenkins type Stochastic Autoregressive Integrated Moving Average (ARIMA) processes. 70

2. ARIMA MODEL A time series is a set of numbers that measures the status of some activity over time. It is the historical record of some activity, with measurements taken at equally spaced intervals with a consistency in the activity and the method of measurement. 2.1 Moving Average Process: Moving average models were first considered by Slutsky (1927) and Wold (1938). The Moving Average Series can be written as We call such a series a moving average of order q and abbreviate the name to MA (q). Where, is the series of errors 2.2 Auto-Regressive Process: is the original series and Yule (1926) carried out the original work on autoregressive processes. Autoregressive processes are as their name suggests regressions on themselves. Specifically, a p th - order autoregressive process satisfies the equation The current value of the series is a linear combination of the most recent past values of itself plus an innovation term that incorporates everything new in the series at time that is not explained by the past values. Thus, for every, we assume that isindependent of 2.3 Autoregressive Integrated Moving Average (ARIMA) model The Box and Jenkins (1970) procedure is the milestone of the modern approach to time series analysis. Given an observed time series, the aim of the Box and Jenkins procedure is to build an ARIMA model. In particular, passing by opportune preliminary transformations of the data, the procedure focuses on Stationary processes. In this study, it is tried to fit the Box-Jenkins Autoregressive Integrated Moving Average (ARIMA) model. This model is the generalized model of the non-stationary ARMA model denoted by ARMA (p, q) can be written as Where, is the original series, for every, we assume that is independent of. A time series is said to follow an integrated autoregressive moving average (ARIMA) model if the d th difference is a stationary ARMA process. If follows an ARMA (p, q) model, we say that is an ARIMA(p, p, q) process. Fortunately, for practical purposes, we can usually take d = 1 or at most 2. Consider then an ARIMA (p, 1, q) process. With we have 2.4 Box and Jenkins procedures i.preliminary analysis: create conditions such that the data at hand can be considered as the realization of a stationary stochastic process. ii. Identification: specify the orders p, d, q of the ARIMA model so that it is clear the number of parameters to estimate. Recognizing the behavior of empirical autocorrelation functions plays an extremely important role. iii. Estimate: efficient, consistent, sufficient estimate of the parameters of the ARIMA model (maximum likelihood estimator). iv. Diagnostics: check if the model is a good one using tests on the parameters and residuals of the model. Note that also when the model is rejected, still this is a very useful step to obtain information to improve the model. v. Usage of the model: if the model passes the diagnostics step, then it can be used to interpret a phenomenon, forecast. 2.5 Jarque-Bera Test We can check the normality assumption using Jarque-Bera (1978) test, which is a goodness of fit measure of departure from normality, based on the sample kurtosis (k) and Skewness(s). The test statistics Jarque-Bera (JB) is defined as ( ) Where n is the number of observations and k is the number of estimated parameters. The statistic JB has an asymptotic chisquare distribution with 2 degrees of freedom, and can be used to test the hypothesis of Skewness being zero and excess kurtosis being zero, since sample from a normal distribution have expected Skewness of Zero and expected excess kurtosis of zero. 2.6 Ljung-Box test Ljung-Box Test can be used to check autocorrelation among the residuals. If a model fit well, the residuals should not be correlated and the correlation should be small. In this case the null hypothesis is 71

is tested with the Box-Ljung statistic Where, N is the no of observation used to estimate the model. This statistic Q* approximately follows the chi-square distribution with df, where q is the no of parameter should be estimated in the model. If Q* is large (Significantly large from zero), it is said that the residuals autocorrelation are as a set are significantly different from zero and random shocks of estimated model are probably auto-correlated. So one should then consider reformulating the model. 3. RESULTS AND DISCUSSION The ARIMA model was applied according to four steps namely model specification, model estimation, diagnostic checking and forecast. Sixty years data of cotton area, production and yield was used for modeling purpose and last Five years data are used for model validation purpose. The model specification involved the plots of the auto correlation function (ACF), partial auto correlation function (PACF) and the plot of the differenced series. The autocorrelation functions of 1st differenced time series presented in Figure 1 shows stationary for cotton area, production and yield, as the autocorrelation declines faster than the auto correlation of undifferenced series. Now it is clear that ACFs of all the 1st differenced series decline rapidly. 3.1 Model identification and diagnostic checking Observing the nature of ACF and PACF plots of the series and their theoretical properties, the order of auto-regression and moving average process of cotton area, production and yield series are selected by estimating the ARIMA models at different p, d, q values using SPSS 17. Models are selected by comparing minimum values of Root mean squared error (RMSE), Akaike Information Criterion (AIC), Schwartz s Bayesian Criterion (SBC), Normalized BIC, Mean absolute error (MAE) and Mean absolute proportion percent error (MAPPE) and maximum values of R 2. The selected models are ARIMA (0,1,0), ARIMA (0,1,3), ARIMA (0,1,4) for cotton area, ARIMA (0,1,1), ARIMA(0,1,4), ARIMA (1,1,4) for cotton production and ARIMA (0,1,1), ARIMA (0,1,3), ARIMA (1,1,1) for cotton yield respectively. So, from the Table 1 it is revealed that ARIMA (0, 1, 0) ARIMA (1, 1, 4) and ARIMA (0, 1, 1) are the best fitted model for forecasting of cotton area, production and yield in India respectively. 72

Table 1 Diagnostic Tools and Model Selection Criteria for Cotton Area, Production and Yield of Best Fitted Models. Area ARIMA(0,1,0) 0.998 0.247 20.690 0.169-2.124 ARIMA(0,1,3) 0.993 0.267 17.380 0.181-2.060 ARIMA(0,1,0) 0.992 0.264 18.830 0.181-2.060 Production ARIMA(0,1,1) 0.947 0.762 22.541 0.451-0.260 ARIMA(0,1,4) 0.958 0.650 23.800 0.402-0.272 ARIMA(1,1,4) 0.958 0.649 23.393 0.402-0.272 Yield ARIMA(0,1,1) 0.203 165.102 14.393 116.305 10.513 ARIMA(0,1,3) 0.269 163.049 14.552 118.750 10.691 ARIMA(1,1,1) 0.269 163.049 14.552 118.750 10.691 3.2 Model estimation At the identification stage one or more models are tentatively chosen that seem to provide statistically adequate representation of the available data. Then attempt are made to obtain precise estimates of parameters of the best fitted model by least squares as advocated by Box and Jenkins by using standard computer packages like SPSS 17 which are presented in the Table 2. Table-2 Estimation of Parameters for Cotton Area, Production and Yield of Best Fitted Models AREA Type Co-efficient Standard deviation T value ARIMA(0,1,0) constant -3.221 7.18-0.435 ARIMA(0,1,3) MA1-0.0310 0.1326-0.21 MA2 0.1901 0.1400 1.40 MA3-0.2101 0.1345-1.7 constant 0.08743 0.06337 1.5 ARIMA(0,1,4) MA1 0.0171 0.1371 0.15 MA2 0.1478 0.1470 1.09 MA3-0.168 0.1386-1.30 MA4 0.0940 0.1401 0.67 constant 0.0821 0.05642 1.50 PRODUCTION ARIMA(0,1,1) MA1-0.0179 0.1660-0.12 constant 0.5106 0.2641 1.98 ARIMA(0,1,4) MA1 0.039 0.1721 0.30 MA2 0.3476 0.2651 2.5 MA3-0.5300 0.1671-3.71 MA4 0.0672 0.1403 0.47 constant 0.4848 0.1223 1.8 ARIMA(1,1,4) AR1 0.8943 0.1629 0.5 MA1 0.9121 0.2632 0.52 MA2 0.3171 1.7101 1.69 MA3-0.7373 1.7712-1.21 MA4 0.3681 0.1941 0.41 constant 0.0604 0.5831 0.51 73

YIELD ARIMA(0,1,1) MA1 0.2131 0.9831 1.41 constant 6.830 3.333 2.01 ARIMA(0,1,3) MA1 0.2215 0.1400 1.7 MA2 0.0984 0.1421 0.71 MA3-0.3141 0.1501-2.16 constant 6.625 4.141 1.60 ARIMA(1,1,1) AR1 0.1578 0.7510 0.21 MA1 0.3691 0.701 0.61 Constant 5.611 2.702 2.09 3.3 Forecast of cotton area, production and yield Ten year forecast of cotton area, production and yield are estimated by using the best model and presented in the Table 4. Table-4 Forecast of Cotton Area, Production and Yield in India Year Area production Yield 2016 10.71 37.90 520 2017 10.89 37.98 522 2018 10.96 37.99 524 2019 10.59 38.21 527 2020 11.08 38.63 528 2021 11.51 39.19 524 2022 11.69 39.50 527 2023 11.71 41.40 530 2024 11.59 41.79 531 2025 11.70 42.53 532 4. CONCLUSION From this study, we conclude that cotton area, production and yield in the year 2016 was 10.71 million hectares, 37.90 million bales of 170 kg of each and 520 kg/hectare respectively. The analysis found that if the present growth rates continue then the cotton area, production and yield in the year 2025 will be 11.70 million hectares, 42.53 million bales of 170 kg of each and 532 kg/hectare respectively. The conclusion from the study is that, total cropped area can be increased in future, if land reclamation and conservation measures are adopted. The projection shows that cotton will play vital role to improve economic growth of India in future. REFERENCES 1. Box, G. E. P., & Jenkins, G. M. (1970). Time Series Analysis, Forecasting and Control. San Francisco, Holden- Day, California, USA. 2. Box, G.E.P., G.M. Jenkins (1976). Time Series Analysis, Forecasting and Control. San Francisco, Holden Day, California, USA. 3. Iqbal. N, Bakhsh. K, Maqbool. A, Ahmad. A. S(2005). Use of the ARIMA Model for Forecasting Wheat Area and Production in Pakistan. Journal of Agriculture & Social Sciences, 1813 2235/01 2:pp.120 122. 4. M. K. Debnat, Kartic Bera, P. Mishra(2013). Forecasting Area, Production and Yield of Cotton in India using ARIMA Model. Journal of Space Science & Technology Volume 2, Issue, pp.17-20. 5. Muhammad, F., M. Siddique, M. Bashir, S. Ahmad(1992). Forecasting Rice Production in Pakistan using ARIMA Models. J. of Animal Plant Sci.; 2:pp.27 31. 6. N. M. F. Rahman (2010), Forecasting of boro rice production in Bangladesh: An ARIMA approach, Journal of Bangladesh Agricultural University. 8(1): pp.103 112. 7. Pindyck, R., S. Daniel, L. Rubinfed (1991). Economic Models and Economic Forecasts, 3rd ed. McGraw Hill International editions (Economic Survey): New York, USA. 8. Rahman. N. M. F (2010). Forecasting of Boro Rice Production in Bangladesh: An ARIMA Approach. J. Bangladesh Agril. Univ.; 8(1):pp.103 112. 9. Wankhade. R, Mahalle. S, Gajbhiye. S, Bodade.V.M(2010). Use of the ARIMA Model for Forecasting Pigeon Pea Production in India. International Review of Business and Finance; 2(1):pp.97 102. 74