Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model M. K. Debnath 1, Kartic Bera 2 *, P. Mishra 1 1 Department of Agricultural Statistics, Bidhan Chanda Krishi Vishwavidyalaya, Mohanpur Nadia, West Bengal, India 2 Remote Sensing & GIS, Vidyasagar University, Midnapur, West Bengal, India Abstract Cotton is an important crop in India. India is the second largest exporter of cotton behind the US. This study focuses on forecasting the cultivated area and production of cotton in India using Autoregressive Integrated Moving Average (ARIMA) model. Time Series data covering the period of 1950 2010 was used for the Study. The study revealed that ARIMA (0, 1, 0) ARIMA (1, 1, 4) and ARIMA (0, 1, 1) are the best fitted model for forecasting of cotton area, production and yield in India The analysis shows that if the present growth rates continue then the cotton area, production and yield in the year 2020 will be 10.92 million hectares, 39.19 million bales of 170 kg of each and 527 kg/hectare Keywords: ARIMA, forecasting, auto correlation function, Akaike information criterion *Author for Correspondence E-mail: 1kbrsgis@gmail.com INTRODUCTION Cotton is the most important crop of the world. India ranks first in the world in respect of acreage of with about 8 million hectares under cotton cultivation and fourth in total seed cotton production. In India Gujarat is the largest producer of cotton. Cotton requires an average annual temperature of over 16ºC and an annual rainfall of at least 50 cm distributed throughout the growing season. A daily minimum temperature of 16ºC is required for germination and 21 27ºC for proper vegetative growth. It can tolerate temperatures as high as 43ºC but does not do well if the temperature falls below 21ºC. Crop area estimation and forecasting of crop yield are an essential procedure in supporting policy decision regarding land use allocation, food security and environmental issues. Statistical techniques able to provide crop forecast with reasonable precessions well in advanced. Various approaches have been used for forecasting such agricultural systems. Concentration have been given on the univariate time series models known as auto regressing integrated moving average (ARIMA) models, which are primarily due to world of Box and Jenkins (1970). Among the stochastic time series models ARIMA types are very powerful and popular as they can successfully describe the observed data and can make forecast with minimum forecast error. These types of models are very difficult to identify and estimate. Muhammad et al. conducted an empirical study of modeling and forecasting time series data of rice production in Pakistan [1]. Similar studies have been done by Rachana et al. (2010) for forecasting pigeon pea production in India by using ARIMA Modeling and N. M. F. Rahman [2] for forecasting of boro rice production in Bangladesh. Najeeb Iqbal et al. [3] also use the ARIMA Model for forecasting wheat area and production in Pakistan. To reveal the growth pattern and to make the best forecast of cotton area, production and yield in India, appropriate time series model that can be able to describe the observed data successfully are necessary. MATERIALS AND METHODS The present study has been carried out on the basis of cotton area, production and yield data pertaining the period 1950 51 to 2010 11 JoPC(2013) 16-20 STM Journals 2013. All Rights Reserved Page 16

Forecasting Parameters of Cotton using ARIMA Model Debnath et al. which were collected from secondary (Directorate of Economics and Statistics, Department of Agriculture and Cooperation) source. The time series data of cotton area, production and yield were modeled by Box- Jenkins type stochastic autoregressive integrated moving average (ARIMA) process. In general, an ARIMA model is characterized by the notation ARIMA (p,d,q) where, p, d and q denote orders of auto-regression, integration (differencing) and moving average A first order auto-regressive process is denoted by ARIMA (1,0,0) or simply AR(1) and is given by, y t = + 1 y t- 1 + t and a first order moving average process is denoted by ARIMA (0,0,1) or simply MA(1) and is given by y t = - 1 t- 1 + t. Alternatively, the model ultimately derived, may be a mixture of these processes and of higher orders as well. Thus a stationary ARMA (p, q) process is defined by the equation: y t = 1 y t-1 + 2 y t-2 + + p y t-p - 1 t-1-2 t-2 + - q t-q + t where t s are independently and normally distributed with zero mean and constant variance 2 for t = 1,2,...n. The ARIMA models are often written in backshift notation. The time series data of cotton area, production and yield were modeled by Box-Jenkins type stochastic autoregressive integrated moving average (ARIMA) process. The Box-Jenkins type ARIMA process [4] can be defined as φ (B)(Δ d y t μ ) =θ(b)ε t, Here, y t denotes cotton area, production and yield in million hectares,million bales and kg/hectare respectively, μ is the mean of Δ d y t, φ (B) = 1 φ 1 B. φ p B p, θ (B) = 1 θ 1 B... θ q B q, θ i denotes the i th moving average parameter, φ i denotes the i th autoregressive parameter and B denote the difference and back-shift operators The basic stages involved in developing an ARIMA model are Identification stages, Estimation stages, Diagnostic stages and forecasting stage. Appropriate values for the p, d, and q in ARIMA modeling can be partially resolved by looking at the Auto correlation function (ACF) and partial Auto Correlation Functions (PACF) for the series [5]. The stationary series is the one whose values vary over time only around a constant mean and constant variance. There are several ways to ascertain this. The most common method is to check stationarity through examining the graph or time plot of the data. Determine whether the series is stationary or not by considering the graph of ACF. If a graph of ACF of the time series values either cuts off fairly quickly or dies down fairly quickly, then the time series values should be considered stationary. If a graph of ACF dies down extremely slowly, then the time series values should be considered non-stationary. The second step is to estimate the parameters of the model. Here, the method of maximum likelihood is used for this purpose. The third step is to check whether the chosen model fits the data reasonably well. For this reason the residuals are examined to find out if they are white noise. To test if the residuals are white noise the ACF of residuals and the Ljung and Box (1978) statistic are used. In case of two or more competing models passing the diagnostic checks the best fitted model is selected using the following criteria multiple R 2, Root mean squared error (RMSE), Akaike Information Criterion (AIC), Schwart z Bayesian Criterion (SBC), Normalized BIC, Mean absolute error (MAE) and Mean absolute proportion percent error (MAPPE). By using the results of ARIMA (p, d, q), forecasts from 2011 up to 2020 are made. RESULTS AND DISCUSSION The ARIMA model was applied according to four steps namely model specification, model estimation, diagnostic checking and forecast. Last fifty-five years data of cotton area, production and yield was used for modeling purpose and last Five years data are used for model validation purpose. The model specification involved the plots of the auto correlation function (ACF), partial auto correlation function (PACF) and the plot of the differenced series. The autocorrelation functions of 1 st differenced time series presented in Figure 1 shows stationarity for cotton area, production and yield, as the autocorrelation declines faster than the auto correlation of un-differenced series. Now it is clear that ACFs of all the 1 st differenced series decline rapidly. RRJoSST (2013) 16-20 STM Journals 2013. All Rights Reserved Page 17

Model Identification and Diagnostic Checking Observing the nature of ACF and PACF plots of the series and their theoretical properties, the order of auto-regression and moving average process of cotton area, production and yield series are selected by estimating the ARIMA models at different p, d, q values using SPSS 17. Models are selected by comparing minimum values of Root mean squared error (RMSE), Akaike Information Criterion (AIC), Schwartz s Bayesian Criterion (SBC), Normalized BIC, Mean absolute error (MAE) and Mean absolute proportion percent error (MAPPE) and maximum values of R 2. The selected models are ARIMA (0,1,0), ARIMA (0,1,3), ARIMA (0,1,4) for cotton area, ARIMA (0,1,1), ARIMA(0,1,4), ARIMA (1,1,4) for cotton production and ARIMA (0,1,1), ARIMA (0,1,3), ARIMA (1,1,1) for cotton yield respectively. So, from the Table 1 it is revealed that ARIMA (0, 1, 0) ARIMA (1, 1, 4) and ARIMA (0, 1, 1) are the best fitted model for forecasting of cotton area, production and yield in India Fig. 1: ACF and PACF of Residuals of Fitted Model for Cotton Area, Production and Yield. Table 1: Diagnostic Tools and Model Selection Criteria for Cotton Area, Production and Yield of Best Fitted Models. Table 1 Model R 2 RMSE MAPE MAE Normalized BIC Area ARIMA (0,1,0) 0.994 0.246 20.590 0.170-2.110 ARIMA (0,1,3) 0.991 0.278 17.380 0.180-2.060 ARIMA (0,1,4) 0.992 0.265 18.830 0.180-2.060 Model R 2 RMSE MAPE MAE Normalized BIC ARIMA (0,1,1) 0.937 0.760 22.540 0.450-0.260 Production ARIMA (0,1,4) 0.958 0.650 23.800 0.400-0.270 ARIMA (1,1,4) 0.958 0.650 23.390 0.400-0.270 Model R 2 RMSE MAPE MAE Normalized BIC ARIMA (0,1,1) 0.202 165.100 14.390 116.320 10.510 Yield ARIMA (0,1,3) 0.269 163.060 14.550 118.750 10.690 ARIMA (1,1,1) 0.269 163.060 14.550 118.750 10.690 Note: The value of the criterion for a model with bold numerals that the model is better than other models with respect to that criterion. JoPC(2013) 16-20 STM Journals 2013. All Rights Reserved Page 18

Forecasting Parameters of Cotton using ARIMA Model Debnath et al. Model Estimation At the identification stage one or more models are tentatively chosen that seem to provide statistically adequate representation of the available data. Then attempt are made to obtain precise estimates of parameters of the best fitted model by least squares as advocated by Box and Jenkins by using standard computer packages like SPSS 17 and MINITAB which are presented in the Table 2. The calculated chi-square value in the Table 3 shows that there is no significant difference among the observed and predicted values for the area, production and yield of cotton. This implies that the selected model i.e. ARIMA (0, 1, 0) ARIMA (1, 1, 4) and ARIMA (0, 1, 1) are the best fitted model for forecasting of cotton area, production and yield in India Table 2: Estimation of Parameters for Cotton Area, Production and Yield of Best Fitted Models. Table 2 Area Type Coefficients Standard Deviation T values ARIMA (0,1,0) Constant -3.118 7.163-0.435 ARIMA (0,1,3) ARIMA (0,1,4) ARIMA (0,1,1) ARIMA (0,1,4) ARIMA (1,1,4) ARIMA (0,1,1) ARIMA (0,1,3) ARIMA (1,1,1) MA1-0.0258 0.1327-0.19 MA2 0.1805 0.1326 1.36 MA3-0.2018 0.1345-1.5 Constant 0.08847 0.06338 1.4 MA1 0.0173 0.1371 0.13 MA2 0.1487 0.1382 1.08 MA3-0.178 0.1381-1.29 MA4 0.0934 0.1401 0.67 Constant 0.08314 0.05632 1.48 Production MA1-0.0177 0.1635-0.11 Constant 0.5103 0.2641 1.93 MA1 0.041 0.1581 0.26 MA2 0.3376 0.1406 2.4 MA3-0.5299 0.1421-3.73 MA4 0.0762 0.1632 0.47 Constant 0.4738 0.2628 1.8 AR1 0.8853 1.7708 0.5 MA1 0.9018 1.7714 0.51 MA2 0.3076 0.1943 1.58 MA3-0.7473 0.5836-1.28 MA4 0.3683 0.8808 0.42 Constant 0.0604 0.1103 0.55 Yield MA1 0.2018 0.1421 1.42 Constant 6.829 3.332 2.05 MA1 0.2175 0.1356 1.6 MA2 0.0992 0.1386 0.72 MA3-0.3243 0.1503-2.16 Constant 6.615 4.152 1.59 AR1 0.1778 0.7509 0.24 MA1 0.3598 0.703 0.51 Constant 5.608 2.695 2.08 RRJoSST (2013) 16-20 STM Journals 2013. All Rights Reserved Page 19

Table 3: Validation of Model Using Χ2 Test. Year Area Production Yield Observed Predicted Observed Predicted Observed Predicted 2006 9.14 9.67 22.63 24.82 421 423.73 2007 9.41 9.21 25.88 26.64 467 467.41 2008 9.41 9.55 22.28 23.03 403 407.67 2009 10.31 10.31 23.93 24.41 395 400.55 2010 11.00 10.74 33.50 34.79 518 516.55 χ 2 value 1.000 0.990 0.997 Forecasts of Cotton Area, Production and Yield Ten year forecast of cotton area, production and yield are estimated by using the best model and presented in the Table 4. Table 4 shows that cotton area, production and yield in the year 2011 was 10.72 million hectares, 31.90 million bales of 170 kg of each and 516 kg/hectare The analysis found that if the present growth rates continue then the cotton area, production and yield in the year 2020 will be 10.92 million hectares, 39.19 million bales of 170 kg of each and 527 kg/hectare Table 4: Forecast of Cotton Area, Production and Yield in India for the Period of 2011 to 2020 at 95% Level. Year Area Production Yield 2011 10.72 31.90 516 2012 10.81 32.56 515 2013 10.65 35.35 515 2014 10.52 36.34 516 2015 10.58 36.50 517 2016 10.65 37.24 518 2017 10.71 37.58 520 2018 10.77 38.20 522 2019 10.84 38.63 524 2020 10.90 39.19 527 This projection is important as it helps inform good policies with respect to relative production, price structure as well as consumption of cotton in the country. The conclusion from the study is that, total cropped area can be increased in future, if land reclamation and conservation measures are adopted. The projection shows that cotton will play vital role to improve economic growth of India in future. REFERENCES 1. Box, G.E.P., G.M. Jenkins. Time Series Analysis, Forecasting and Control. San Francisco, Holden Day, California, USA, 1976. 2. Iqbal. N, Bakhsh. K, Maqbool. A, Ahmad. A. S. Use of the ARIMA Model for Forecasting Wheat Area and Production in Pakistan. Journal of Agriculture & Social Sciences, 2005; 1813 2235/01 2:120 122p. 3. Muhammad, F., M. Siddique, M. Bashir, S. Ahmad. Forecasting Rice Production in Pakistan using ARIMA Models. J. of Animal Plant Sci. 1992; 2:27 31p. 4. Pindyck, R., S. Daniel, L. Rubinfed. Economic Models and Economic Forecasts, 3rd ed. McGraw Hill International editions (Economic Survey): New York, USA. 1991. 5. Rahman. N. M. F. Forecasting of Boro Rice Production in Bangladesh: An ARIMA Approach. J. Bangladesh Agril. Univ. 2010; 8(1):103 112p. 6. Wankhade. R, Mahalle. S, Gajbhiye. S, Bodade.V.M., Use of the ARIMA Model for Forecasting Pigeon Pea Production in India. International Review of Business and Finance, 2010; ISSN 0976 5891, 2(1):97 102p. JoPC(2013) 16-20 STM Journals 2013. All Rights Reserved Page 20