arxiv: v1 [stat.me] 5 Nov 2008

Similar documents
New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series

at least 50 and preferably 100 observations should be available to build a proper model

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

TRANSFER FUNCTION MODEL FOR GLOSS PREDICTION OF COATED ALUMINUM USING THE ARIMA PROCEDURE

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Comparing the Univariate Modeling Techniques, Box-Jenkins and Artificial Neural Network (ANN) for Measuring of Climate Index

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Suan Sunandha Rajabhat University

Minitab Project Report - Assignment 6

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Univariate ARIMA Models

Lecture 19 Box-Jenkins Seasonal Models

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

MCMC analysis of classical time series algorithms.

5 Autoregressive-Moving-Average Modeling

The ARIMA Procedure: The ARIMA Procedure

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Ross Bettinger, Analytical Consultant, Seattle, WA

A stochastic modeling for paddy production in Tamilnadu

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Forecasting Network Activities Using ARIMA Method

Chapter 6: Model Specification for Time Series

Forecasting the Prices of Indian Natural Rubber using ARIMA Model

Sugarcane Productivity in Bihar- A Forecast through ARIMA Model

TMA4285 December 2015 Time series models, solution.

ARIMA modeling to forecast area and production of rice in West Bengal

Forecasting. Simon Shaw 2005/06 Semester II

Implementation of ARIMA Model for Ghee Production in Tamilnadu

SAS/ETS 14.1 User s Guide. The ARIMA Procedure

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

Paper SA-08. Are Sales Figures in Line With Expectations? Using PROC ARIMA in SAS to Forecast Company Revenue

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Trend and Variability Analysis and Forecasting of Wind-Speed in Bangladesh

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

UNIVARIATE TIME SERIES ANALYSIS BRIEFING 1970

Ch 6. Model Specification. Time Series Analysis

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Available online at ScienceDirect. Procedia Computer Science 72 (2015 )

Time Series Analysis Model for Rainfall Data in Jordan: Case Study for Using Time Series Analysis

Chapter 8: Model Diagnostics

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

ARIMA model to forecast international tourist visit in Bumthang, Bhutan

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

5 Transfer function modelling

Time Series I Time Domain Methods

Estimation and application of best ARIMA model for forecasting the uranium price.

Time Series Analysis -- An Introduction -- AMS 586

Using PROC ARIMA in Forecasting the Demand and Utilization of Inpatient Hospital Services

3 Time Series Regression

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

STAT Financial Time Series

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Asitha Kodippili. Deepthika Senaratne. Department of Mathematics and Computer Science,Fayetteville State University, USA.

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Time Series Analysis of Currency in Circulation in Nigeria

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH

Study of Time Series and Development of System Identification Model for Agarwada Raingauge Station

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Lesson 2: Analysis of time series

FE570 Financial Markets and Trading. Stevens Institute of Technology

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

Modelling Multi Input Transfer Function for Rainfall Forecasting in Batu City

Ross Bettinger, Analytical Consultant, Seattle, WA

Application of ARIMA Models in Forecasting Monthly Total Rainfall of Rangamati, Bangladesh

Decision 411: Class 9. HW#3 issues

Scenario 5: Internet Usage Solution. θ j

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Analysis. Components of a Time Series

Part II. Time Series

Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression

Investigating Seasonality in BLS Data Using PROC ARIMA Joseph Earley, Loyola Marymount University Los Angeles, California

BUSI 460 Suggested Answers to Selected Review and Discussion Questions Lesson 7

Forecasting Egyptian GDP Using ARIMA Models

Exercises - Time series analysis

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

The Identification of ARIMA Models

Romanian Economic and Business Review Vol. 3, No. 3 THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

Box-Jenkins ARIMA Advanced Time Series

9. Using Excel matrices functions to calculate partial autocorrelations

SCIENCE & TECHNOLOGY

Handling Missing Data on Asymmetric Distribution

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

MODELLING TIME SERIES WITH CONDITIONAL HETEROSCEDASTICITY

Solar irradiance forecasting for Chulalongkorn University location using time series models

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Acta Universitatis Carolinae. Mathematica et Physica

Forecasting USD/IQD Future Values According to Minimum RMSE Rate

Forecasting Precipitation Using SARIMA Model: A Case Study of. Mt. Kenya Region

RELATIONSHIP BETWEEN EL NIÑO SOUTHERN OSCILLATION INDEX AND RAINFALL (QUEENSLAND, AUSTRALIA)

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

Transcription:

arxiv:0811.0659v1 [stat.me] 5 Nov 2008 Estimation of missing data by using the filtering process in a time series modeling Ahmad Mahir R. and Al-khazaleh A. M. H. School of Mathematical Sciences Faculty of Science and Technology Universiti Kebangsaan Malaysia 43600 UKM Selangor D. E., MALAYSIA e-mail: mahir@ukm.my e-mail: ahmed 2005kh@yahoo.com Abstract: This paper proposed a new method to estimate the missing data by using the filtering process. We used datasets without missing data and randomly missing data to evaluate the new method of estimation by using the Box - Jenkins modeling technique to predict monthly average rainfall for site 5504035 Lahar Ikan Mati at Kepala Batas, P. Pinang station in Malaysia. The rainfall data was collected from the 1 st January 1969 to 31 st December 1997 in the station. The data used in the development of the model to predict rainfall were represented by an autoregressive integrated moving - average (ARIMA) model. The model for both datasets was ARIMA(1, 0,0)(0, 1, 1) s.the result checked with the Naive test, which is the Thiel s statistic and was found to be equal to U = 0.72086 for the complete data and U = 0.726352 for the missing data, which mean they were good models. Keywords and phrases: ARIMA model, monthly average rainfall, filtering process and, forecasting method. 1. Introduction Time series is a set of observations recorded over a time. Autoregressive integrated Moving Average models or ARIMA models are especially suited to short-term forecasting because most ARIMA models place heavy emphasis on the recent past rather than the distant past. An observed series theoretically consists of two parts: the first part is the series generated by real process, and the second noise which is the result of outside disturbances. Elimination of this noise is the main aim of a time series analysis. Early developments to eliminate noise came from introducing the autoregressive approach and moving average approach (ARIMA). The Box -Jenkins procedure consists of the implementation or completion of several steps, or stages: identification, estimation, diagnostic checking and forecasting. In the identification of an appropriate Box -Jenkins model: changing the data, if necessary, into a stationary time series and determining the tentative model by analyzing the autocorrelation and partial autocorrelation function (2; 5). The estimation for the constant and the coefficients 1

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series2 of the equation must be obtained. The main purpose of this investigation is to analyze the data collected automatically and to evaluate a predictive model and then produce a set of forecast for site at which the data was collected. According to Pankratz, in his study, Box - Jenkins method produced the best forecast for 74% of the series that he evaluated (4).The cost associated with the Box - Jenkins approach in a given situation is generally greater than many other quantitative methods. The Box - Jenkins model is the most general way of approaching forecasting and unlike other models, there is no need to assume, initially a fixed pattern and it is not limited to specific kind of pattern. These models can be fitted to any set of time series data by selecting the appropriate value of the parameters p, d, q to suit individual series. A problem frequently encountered in data collection is a missing observations or observations may be virtually impossible to obtain, either because of time or cost constrains. In order to replace that observations, there are several different options available to the researchers. Firstly, replace with the mean of the series. Secondly replace with the nave forecast. Also replace with a simple trend forecast. Finally replace with an average of the last two known observations that bound the missing observations. 2. Description of the data set The rainfall data was collected from the 1 st January 1969 to 31 st December 1997 in the station. In this research the data on rainfall amount were collected and recorded daily. The monthly averages were calculated by finding the sum of all the amount of rainfall in that particular month and divide it by the number of days in that month for each year. 3. Methodology for missing observations A problem frequently encountered in data collection is a missing observation in a data series. In order to replace that observation, there are several different options available to the researchers. Firstly, replace with the mean of the series. This mean can be calculated over the entire range of the sample. Secondly, replace with the naive forecast. Naive model is the simplest form of a Univariate forecast model, this model uses the current time period s value for the next time period, that is Ŷ t+1 = Y t. Also, replace with a simple trend forecast. This is accomplished by estimating the regression equation of the form Y t = a + b.t (where t is the time) for the periods prior to the missing value. Then use the equation to fit the time periods missing. Finally, replace with an average of the last two known observations that bound the missing observations. This paper suggested new method to estimates the missing data by using the filtering process (1). The filtering process is:

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series3 y t = 1 M M w w i+1x t i = w i+1 x t i (1) i+1 i=0 i=0 where w i+1 = w i+1/ w i+1 is the weight and M is the number of observations in a moving average. We substitute w i 1 = ϕi where ϕ is the correlation of the entire data. Therefore, the corresponding moving average is y t = ϕx t + ϕ 2 x t 1 +... + ϕ M x t m. (2) where x t is the original observations. We transformed the complete data by using equation (2) and we built an appropriate model. After that we assumed there are holes spaced randomly in the data. If y s were missed (where s is index of the hole), we substituted the average of the complete data instead of x s then we calculate the future value y s = ϕȳ + ϕ 2 x s 1 +... + ϕ M x s m. (3) Then we built the model for the data that contained the holes. We applied the same model on the new data. We compared the result of the model for the two datasets by using Box-Jenkins ARIMA model in the next section. 4. Box - Jenkins ARIMA models The Box - Jenkins method is a procedure for accomplishing the model past values of the time series variable and past values of the error terms. The Box - Jenkins approach consists of extracting the predictable from the observed data through a series of iterations. The most common ARIMA model included three parameters: p, d, and q where p is the number of autoregressive parameters, d is the number of differencing parameters and q is the number of moving average parameters. A general ARIMA model is in the form: z t = C + ϕ 1 z t 1 + ϕ 2 z t 2 +... + ϕ p z t p + a t θ 1 a t 1... θ q a t q. (4) where: t: is the periodic time z t : is the numerical value of an observation ϕ i : for i = 1, 2,...p are the autoregressive parameters θ j for j = 1, 2,..., q are the moving average parameters a t : is the shock element at time t To estimate the parameters ϕ i and θ j for a fixed p and q we perform the linear multiple regression ẑ t = µ + ϕ 1 z t 1 + ϕ 2 z t 2 +... + ϕ p z t p θ 1 a t 1... θ q a t q. (5)

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series4 Fig 1. Plot of the original data Fig 2. Plot of the Transform data There are two phases to the identification of an appropriate Box - Jenkins model: changing the data if necessary into a stationary time series and determining the tentative model by observing the behavior of the autocorrelation and partial autocorrelation function. A stationary time series is that it does not contain trend, that is, it fluctuates around a constant mean. By looking at a time series plot (see figure 1 plot of data without transformation and difference). The rainfall data in Pinang was in need of a transformation. By taking logarithm it will transform the series into a stationary time series as can be seen in the figure (2). The first differencing was for seasonal part by subtraction the values of two adjacent observations in the series that is,z t = Y t = Y t Y t 12 for seasonal. We can write the differencing by the operator of differencing as the following Bz t = z t 1. After transformation, it is clear that the observations fluctuate around the constant mean. Box and Jenkins suggest the number of Lag to be no more than n/4 autocorrelations, the autocorrelation coefficient measures the correlation between a set of observations and a lagged set of observation in a time series. The autocorrelation between z t and z t+k measures the correlation between the pairs (z 1, z 1+k ),(z 2, z 2+k ),...,(z n, z n+k ) The sample autocorrelation

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series5 Table 1 Parameters for complete data Parameter Estimate Standard Error t Value Approx Pr > t Lag MA(1, 1) 0.85667 0.02923 29.31 <0.0001 12 AR(1, 1) 0.15889 0.05410 2.94 0.0035 1 coefficients r k is an estimate of ρ k where r k = (zt z)(z t+k z) (zt z) 2. (6) with z t :the data from the stationary time series. z t+k : the data from k time period ahead of t z: the mean of the stationary time series. The estimated partial autocorrelation function PACF is used as a guide, along with the estimated autocorrelation function ACF, in choosing one or more ARIMA models that might fit the available data. The idea of partial autocorrelation analysis is that we want to measure how ẑ t and ẑ t+k are related. The equation that gives a good estimate of the partial autocorrelation is ˆϕ kj = ˆϕ k 1,j ˆϕ kk ˆϕ k 1,k j. (7) for k = 3, 4,...; j = 1, 2,..., k 1 We can find the shape of the ACF and PACF in a seasonal model as you see in the figures (3) and (4). So, the multiplicative seasonal ARIMA model (p, d, q) (P, D, Q) s is a generalization and is considered as an extension of the method to series in which a patterns repeat seasonally over time, where the parameters (p, d, q) are for no seasonal and the parameters (P, D, Q) s are for the seasonal parts. Once a stationary time series has been selected (the ACF cuts off or dies down quickly), we can identify a tentative model by examining the behavior of the ACF and PACF. In the mixed model both the ACF and PACF dies down exponentially. The figures of the ACF and PACF as you in see in figures (3) and (4) 5. Results The t-statistics as you can see in Table 1 and also in Table 2.associated with Θ 12 and ϕ are greater in absolute value than 2, therefore, indicating that these parameters should be retained in the model for both datasets. We deduced from the foregoing tables the first model for the complete data is z t = 0.159z t 1 a t + 0.857a t 12

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series6 Fig 3. Autocorrelation Function Fig 4. Partial Autocorrelation Function Table 2 Parameters for missing data Parameter Estimate Standard Error t Value Approx Pr > t Lag MA(1, 1) 0.85383 0.02969 28.76 <0.0001 12 AR(1, 1) 0.17818 0.05390 3.31 0.0010 1

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series7 and the model with missing data is z t = 0.178z t 1 a t + 0.854a t 12 At the estimation stage, we get the precise estimates of a small number of parameters. Then tentatively we choose an ARIMA (1, 0, 0)(0, 1, 1) s model. We fit these models to the data to get precise estimate of parameters:ϕ 1 for non seasonal AR part, and Θ 12 for MA coefficient for seasonal parameter. We dropped the mean µ from the model since the mean of working series is -0.00207 and the standard deviation is 0.897664 for the complete data and for the dataset with missing data the mean is -0.00051 and the standard deviation is 0.889339. Also we note that since the first value of z t that can be calculated is z 13 = z13 z 1 where b=13 z since the t-test s z/ = 0.00207 n b+1 0.897664/ = 0.04726 which is less than 432 13+1 z 2 for the complete data. The t-test s z/ = 0.00051 n b+1 0.889339/ = 0.01175 432 13+1 which is less than 2 for the missing data. We conclude that z is statistically close to zero and that it should be omitted from the model for the two datasets. 6. Diagnostic checking At the diagnostic checking stage, we used the Ljung-Box statistic (denoted by Q as in Equation (8) to check the adequacy of the model by examining the autocorrelation and partial autocorrelation of the residuals (2; 6). Q = n (n + 2) K (n l) 1 rl 2 (a ). (8) l=1 here n = (n d) where n is the number of observations in the original time series, r l (a ) is the sample autocorrelation of the residuals at lag l and d is the degree of non seasonal differencing used to transform the original time series values into stationary time series values. The p-values associated with Q indicate that the model z t = 0.159z t 1 a t + 0.857a t 12 is adequate for the complete data since the p-value is greater than 0.05 and less than the chi square for values of K equal 6,12, 18, 24 and 36. For example, since d=0 is the degree of differencing for the non seasonal, the n used to calculate Q is n = (n d) = 432 0 = 432 Therefore, if we let K=6, Q = n (n + 2) 6 (n L) 1 rl 2(a )= (336)(336 + 2) [ (336 1) 1 (.01576) 2 L=1 +(336 2) 1 (0.08733) 2 + (336 3) 1 (0.08257) 2 + (336 4) 1 (.04915) 2 + (336 5) 1 (0.00493) 2 + (336 6) 1 (0.0019) 2] = 5.8385165 We use the rejection point χ 2 [α] (K 0) = χ2 [0.05] (6) = 12.5916 since Q = 5.84 < 12.5916, we cannot reject the adequacy of the model by setting α = 0.05.

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series8 Table 3 Parameters for complete data Lag Q D.F P > Q Autocorrelations 6 5.84 4 0.2116-0.016 0.087 0.083-0.049 0.005-0.002 12 11.77 10 0.3004-0.069 0.021-0.023 0.054 0.065-0.064 18 23.41 16 0.1032 0.087 0.089 0.020-0.001 0.124-0.039 24 29.00 22 0.1448 0.033 0.010 0.050 0.064-0.064 0.060 30 36.54 28 0.1292-0.097-0.011 0.093-0.037 0.018-0.028 36 40.38 34 0.2089 0.017 0.068-0.006 0.064 0.027-0.022 Table 4 Parameters for complete data Lag Q D.F P > Q Autocorrelations 6 4.70 4 0.3191-0.016 0.076 0.071-0.045 0.015-0.023 12 13.02 10 0.2227-0.076 0.004-0.049 0.092 0.048-0.070 18 26.77 16 0.0441 0.073 0.114-0.009-0.033 0.135-0.035 24 31.58 22 0.0848 0.030-0.004 0.022 0.067-0.019 0.084 30 37.08 28 0.1172-0.094-0.012 0.062 0.008 0.032-0.033 36 45.83 34 0.0847 0.039 0.104-0.002 0.091-0.022-0.047 The p-value is the area under the curve of the chi-square distribution having 5 degrees of freedom to the right of Q = 5.84 Also the p-value is 0.2116. Since p value = 0.2116 > 0.05 = α, we cannot reject the adequacy of the model by setting α = 0.05 This demonstrates that comparing the p-value with α yields the same conclusion as comparing Q with χ 2 [α] (K n c). However tables (3) and (4) show that the p-value associated with Q for K=6, 12, 18, 24, 30 and 36 are all greater than 0.05 for the two datasets with and without missing data, and there are no spikes in the plot of the autocorrelation of residual figures (5) and (6), we conclude that the model is adequate. Similarly we got the Q for the model with missing data. In order to forecast the natural logarithm of the monthly amount of rainfall in the next 2 years (months 337 through 349), we note that since z t = y t y t 12 where y t = lny t we can express the model for the complete data as, and the model y t = y t 12 + 0.159(y t 1 y t 13) a t + 0.857a t 12 y t = y t 12 + 0.178(y t 1 y t 13 ) a t + 0.854a t 12 for the data with holes. By using the least squares point estimates; these forecasts are shown in table (5). Several models were examined. Results of estimating monthly average of rainfall forecasting with 95% confidence interval were presented in the table (5).

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series9 Fig 5. Autocorrelation Plot of Residuals for the complete data Fig 6. Partial Autocorrelation Plot of Residuals for the complete data

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series 10 Table 5 Parameters for Complete Data Obs F.C.D 95% C. D. F. M.D 95% C. D. 349 4.1241 2.8177 5.4305 4.2168 2.9284 5.5052 350 4.1645 2.8417 5.4873 4.1798 2.8711 5.4884 351 4.5674 3.2442 5.8907 4.5886 3.2792 5.8979 352 5.2507 3.9275 6.5740 5.2552 3.9458 6.5645 353 5.0672 3.7439 6.3904 5.0774 3.7681 6.3867 354 4.7286 3.4054 6.0519 4.7334 3.4241 6.0428 355 4.5431 3.2199 5.8663 4.5432 3.2339 5.8525 356 5.1298 3.8065 6.4530 5.1340 3.8247 6.4434 357 5.4154 4.0922 6.7387 5.3290 4.0197 6.6384 358 5.5640 4.2408 6.8872 5.5598 4.2505 6.8692 359 5.4724 4.1492 6.7957 5.4668 4.1574 6.7761 360 4.6711 3.3478 5.9943 4.6728 3.3635 5.9822 Obs=Observation F. C.D= Forecast for complete data F. M.D= Forecast for missing data 7. Theil s statistics for accuracy of the forecast The accuracy of the forecast was examined by using the Theil s U test which compares the accuracy of ARIMA model to that of a naive model. It simply uses the actual value for the last time period Y t as a forecast for Ŷt+1, the formula for Theils U is (4): U = RMSE(ARIMA). (9) RM SE(naive) where (RMSE) is the Root of the Mean Squared Error as being mathematically defined in the Eq.(10). RMSE = 1 n (e t ) n 2. (10) where n is the number of observations in the series and e is an error term The result as in the tables (6) and (7) for both models ARIMA (1, 0, 0)(0, 1, 1) s and Naive for the MSE and RMSE. Therefore, Theil s is equal U = 0.720864 for the complete data and equal U = 0.726352 for the data with missing observation. These were less than 1, which means the model chosen was a good model. Since, a Theil s U greater than 1.0 indicates that the forecast model is worse than the naive model; a value less than 1.0 indicates that it is better. The closer U to 0 the better model that we have [6]. We observed that the values approximately close to each other which means that the method used to estimate the missing data was convenient at least on the data which used in this paper. t=1

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series 11 Table 6 Values of MSE and RMSE for the ARIMA and Nave for the complete data Model(complete data) MSE RMSE ARIMA(1,0, 0)(0, 1, 1) s 0.441654 0.66457 Naive 0.849915 0.921908 Table 7 Values of MSE and RMSE for the ARIMA and Nave with missing observation Model(missing data) MSE RMSE ARIMA(1,0, 0)(0, 1, 1) s 0.429536 0.65539 Naive 0.814152 0.902304 8. Conclusion This paper investigates the application of Box and Jenkins technique to predict monthly average for rainfall in Pinang station by using the suggested new method to estimate the missing value. Model parameters are estimated using Autoregressive Integrated Moving Average (ARIMA) model in a period from 1st Jan 1969 to 31st Dec 1997. The model was tested in forecasting with the observed monthly average data in the same period. It has been found the best estimated ARIMA model for forecasting monthly average rainfall is the ARIMA (1, 0, 0)(0, 1, 1) s model. We compared the result of this model in both datasets with and without missing data. The equations for the model without missing data is z t = 0.159z t 1 a t + 0.857a t 12 and the model with missing data is z t = 0.178z t 1 a t + 0.854a t 12 The result checked with respect to the Naive test, which the Theil s is equal U = 0.72086 for the first one and for the second one is equal U = 0.726352 that means the result is closed to each other, that is, ARIMA (1, 0, 0)(0, 1, 1) s was a good model. The results indicate that time series techniques can be used to develop highly accurate short term forecasts of the monthly average rainfall depend on the past observation for Pinang station. References [1] Gencay R., Selcuk F. and Whitcher B. (2002). An Introduction to Wavelets and other filtering methods in finance and economics, Permissions Department, Harcourt, Inc. [2] James W. T. and Kurtz, T. G. (2007). A Comparison of Univariate Time Series Methods for Forecasting Intraday Arrivals at a Call Center. Said Business School, University of Oxford.

Mahir & Al-Khazaleh /Estimation of missing data by using the filtering process in a time series 12 [3] John C. B. and David A. D. (2003). SAS for Forecasting Time Series, Second Edition. Cary, NC: Institute Inc. [4] Pankratz A. (1983). Forecasting with Univariate Box-Jenkins Models, Wiley New York. [5] Paolo B., Alberto M. and Roberto R. (1996). Forecasting of storm rainfall by combined use of radar, rain gages and linear models 1. Atmospheric Research,Vol. 42, issue 1-4, pp. 199-216 [6] Patricia E G. (1994). Introduction to Time Series Modeling and Forecasting in Business and Economics, Cgraw-Hill M. Inc. [7] Richard T. B.,Sfetsos A. and Sang-Kuck Ch.(2002). Modeling and forecasting from trend-stationary long memory models with applications to climatology 1. International Journal of Forecasting, Vol. 18, issue 2 pp. 215-226. [8] Sabry M.,Abd - El - 1 1 Latif H.,Yousif S. and Badra N.(2007). Use of Univariate Box and Jenkins Time Series Technique in Rainfall Forecasting 1. Australian Journal of Basic and Applied Sciences, (4) pp. 386-394. [9] Sfetsos A. and Coonick A. H.(1966). Univariate and multivariate forecasting of hourly solar radiation with artificial intelligence techniques 1. A Solar Energy Vol. 68, No. 2, pp. 169-178.