A Hybrid ARIMA and Neural Network Model to Forecast Particulate Matter Concentration in Changsha, China Guangxing He 1, Qihong Deng 2* 1 School of Energy Science and Engineering, Central South University, Changsha, Hunan China 2* School of Energy Science and Engineering, Central South University, Changsha, Hunan China Corresponding email: qhdeng@csu.edu.cn Abstract: Autoregressive integrated moving average (ARIMA) is a popular linear models in time series forecasting during the past years. Recent research activities with artificial neural networks (ANNs) suggest that ANNs could be a good selection when the predictor and predictand were not the simple linear relationship. Due to the complex linear and non-linear patterns, there were no ideal methods only using linear or non-linear regression to forecast the particulate matter concentration. In view of the situation, a hybrid methodology that combines both ARIMA and ANN models was developed to improve the forecast accuracy in this paper. The impact of wind direction and the traffic vehicle to the particulate matter concentration was introduction to model by defining wind-weighted-traffic road length density. The paper used road length density, which was obtained using geographic information system (GIS), as a proxy due to the absence of the traffic vehicle date. To demonstrate the utility of the technique, daily average PM10 concentration monitored at a site in Changsha in 2008 was utilized. First we use ARIMA model to model the linear component and then a neural network model was developed to model the residuals from the ARIMA model using wind-weighted-traffic road information around the monitor station. The results indicated that hybrid model can be an effective way to improve the PM10 forecasting accuracy comparing with the single ARIMA model. The approach demonstrates the potential to be applied to other areas of the word. Keywords: particle; air quality; GIS; ARIMA; ANN 1 Introduction Various tools have been adapted by different researchers to predict air pollutants concentration. It has been observed that many multi-parameter meteorological models either under predict or over predict the air pollutants concentration. In the air quality forecasting study, time-series analysis is good choice. ARIMA model is a regression technique which has been successfully applied to model air pollutants[1,2,3]. Although these models are quite flexible as they can represent several different types of time series, their major limitation is the pre-assumed linear form of the model. The approximation of linear models is not always satisfactory. For example, the air pollutant concentrations are influenced by several factors in the atmosphere and prediction using linear models may not always give acceptable results. As an alternative, nonlinear models have been proposed in the literature. Artificial neural networks are one of the potential examples of nonlinear models that are applied
to predict air pollutant concentrations[4,5]. ANN models have been applied to analyze air pollutant concentrations. These approaches are widely used in the air pollution concentration forecasting. In this study, a hybrid methodology is therefore proposed to tackle the problem of modeling the air pollutant time series with linear and nonlinear patterns[6,7]. For this, the concepts from ARIMA model and nonlinear dynamical systems theory are utilized. The proposed technique is applied to the time series of PM10 concentrations in ambient air measured at a site in Delhi during 2008. In order to compare the forecasting efficiency of the proposed hybrid model, ARIMA and nonlinear models are also developed individually and the results of these models are then compared with the hybrid model. 2. Methodology 2.1. Study area and available data Changsha (111.54E-114.25E, 27.85N-28.68N) with a population of 6.37 million, the capital of Hunan Province, is one of the most polluted cities in China. The city is located in the south of China. Its average temperature is 25-30 o C with peak temperature above 40 o C during summer month and 0-10 o C with the lowest temperature below 0 o C in winter. Changsha is one of the highly polluted cities in China, as reported by the Hunan Provincial Environmental Protection Bureau. Daily average concentration of PM10 and was monitored using TEOM 1400a during 2008 at the railway station monitoring station. The meteorological data including wind speed, temperature, pressure and relative humidity were obtained from Changsha weather bureau. The location of monitor station and the traffic road were illustrated in Fig. 1. Fig 1 Location of the air quality monitoring station 2.2ARIMA model ARIMA linear models have dominated many areas of time series forecasting. ARIMA
is the most popular linear model for forecasting time series. It has enjoyed great success during the last three decades. As the application of these models is very common, it is described here briefly. The linear function is based upon three parametric linear components: autoregression (AR), integration (I), and moving average (MA). The ARIMA models also have the capability to include external independent or predictor variables. In this study, the meteorological variables were not included in the prediction model. The ARIMA model was obtained using the Times Series Forecasting System tool of the SPSS16.0 software. 2.3 Artificial neural networks (ANN) model An ANN can be viewed as a computer system that is made up of several simple and highly interconnected processing elements (McClelland, 1986) which process information by their dynamic state response to inputs, as illustrated in Fig. 2. They provide a powerful tool for problems difficult to solve by traditional approaches. The most extensively studied and used ANN models are the multilayer feed forward networks, which allow information transfer only from an earlier layer to the next consecutive layers. Each neuron receives incoming signals from external variables or every neuron in the previous layer and there is a weight associated with each incoming signal. For this model, the meteorological variables are the additional independent variables to define the patterns of the air quality time series. Output layer Input layer Hidden layer Fig 2. The structure of the ANN model 2.4 Hybrid ARIMA ANN model The combination of the ARIMA and ANN models was performed to use each model capability to capture different patterns in the air quality data. The methodology consisted of two steps: in the first step, an ARIMAX model was developed to forecast PM10; in the second step, an ANN model was developed to describe the residuals from the ARIMAX model. The hybrid model was built using the tool of the SPSS and MATLAB software.
2.5 The calculation of wind-weighted road length To measure the impacts of traffic on air pollution concentrations, road length was summed up over circular buffers around the sampling location, using GIS functions. Several circular buffers with radius of 100 to 500 m are calculated with interval of 100m. To account for wind direction impacts on pollution concentrations, sixteen wind directions are considered, each buffer was also sub-divided into sixteen-sectors. Hourly wind direction data have been downloaded from the weather bureau. In order to calculate the wind-weighted road length, only the up-wind direction road length was calculated. The above approach was based on the assumption that the air pollution concentration was only affected by the up-wind, for example when the wind direction was north, we regard the sampling location air pollution was affected only by the road of north direction. 3. Results and discussion Fig 3 Diurnal variation of PM10 at Changsha in 2008 The time series of PM10 concentration observed in 2008 was plotted in Fig. 1. The daily average concentration of PM10 varies between about 15 and 400ug/m 3. Maximum concentration was observed in winter, with minimum concentration in summer. Most days in winter months, the levels have exceeded the regulatory limits stipulated by China country standard (150ug/m3). It can be observed from The minimum concentration observed during the period of summer. Fig. 7 reveals that the observed concentration pattern has high fluctuation with low values during the months of July, August and September. The reason being that may be the pressure in winter was high which is not benefit for the diffusing and dilution of air pollution. On the contrary, in summer the low pressure leads to deposition as well as dispersion of pollutants resulting in low concentration values.
(a) Fig4 Comparison of computed and observed PM10 concentration (a) model1 (b) model2 (b) The model results of the two methods, are given in Fig 4 (a) and (b), respectively. The model, only using meteorological variables, was not showed in the present work. Fig. 4 reflects that the models forecasted values are following the observed trend quite well. However, the model1 is slightly under-predicting the observed concentrations especial for the highest concentration in January. The maximum concentration predicted by model1 is found to be 311.5ug/m3 while the minimum value is 18.7ug/m 3. The maximum concentration predicted by model2 is found to be 322.9 ug/m3 while the minimum value is 10.8 ug/m3.from the analysis above, we can found that Model 2 is the better one of the two prediction models. 4 Conclusions In this paper, a hybrid approach is developed to forecast the air pollutant concentrations. This is achieved by extracting the unique features of linear autoregressive model and nonlinear model. The hybrid model is applied to the time series of PM10 concentration. The autoregressive and nonlinear models were also applied in order to compare the results of the hybrid model. The prediction performance results show that the hybrid model was better when considering the road
length and meteorological parameter than only using the meteorological parameters models. The developed model can also be applied to predict other pollutants concentrations. The comparison between forecasted and observed values by models for Changsha suggests that this model can be reliably used for air pollution predictions in word other cities under the condition of the information of vector road and some meteorological parameters are available. An ARIMA model was used to analyze the linear part of the problem and then the non-linear part from the ARIMA model were modeled by using a neural network model. Acknowledgments This study was supported by the National Key Project of Scientific and Technical Supporting Programs of China (No. 2008BAJ12B03) Reference [1] Saffarini G and Odat S. 2008. Time series analysis of air pollution in Al-Hashimeya Town Zarqa. Journal of Earth and Environmental Sciences,1, 63-72 [2] Ismail M, Mohd Z.I, et al. 2011. Time series analysis of surface ozone monitoring records in Kemaman, Malaysia, 40, 411-417 [3] Kumar A, and P. Goyal, 2011. Forecasting of daily air quality index in Delhi. Science of the Total Environment, 409, 5517-5523 [4] Jianga D, Zhang Y, et al. 2004. Progress in developing an ANN model for air pollution index forecast. Atmospheric Environment, 38, 7055-7064. [5] Shiva S.M.N, and Khare M. 2005. Modelling urban air quality using artificial neural network. Clean Technology Environment Policy,7,116-126. [6] Luis A. D, Ortega J.C, et al. 2008. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile. Atmospheric Environment, 42, 8331-8340. [7] Faruk D.O. 2010. A hybrid neural network and ARIMA model for water quality time series prediction. Engineering Applications of Artificial Intelligence, 23, 586-594.