Univariate and Multivariate Time Series Models to Forecast Train Passengers in Indonesia

Similar documents
S-GSTAR-SUR Model for Seasonal Spatio Temporal Data Forecasting ABSTRACT

Available online at ScienceDirect. Procedia Computer Science 72 (2015 )

Univariate ARIMA Models

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

Modelling Multi Input Transfer Function for Rainfall Forecasting in Batu City

Comparison between VAR, GSTAR, FFNN-VAR and FFNN-GSTAR Models for Forecasting Oil Production

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

Analysis. Components of a Time Series

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

5 Autoregressive-Moving-Average Modeling

Modeling and forecasting global mean temperature time series

at least 50 and preferably 100 observations should be available to build a proper model

A STUDY OF ARIMA AND GARCH MODELS TO FORECAST CRUDE PALM OIL (CPO) EXPORT IN INDONESIA

Forecasting Bangladesh's Inflation through Econometric Models

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Forecasting Network Activities Using ARIMA Method

Empirical Market Microstructure Analysis (EMMA)

Automatic seasonal auto regressive moving average models and unit root test detection

FE570 Financial Markets and Trading. Stevens Institute of Technology

Frequency Forecasting using Time Series ARIMA model

Austrian Inflation Rate

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Development of Demand Forecasting Models for Improved Customer Service in Nigeria Soft Drink Industry_ Case of Coca-Cola Company Enugu

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Author: Yesuf M. Awel 1c. Affiliation: 1 PhD, Economist-Consultant; P.O Box , Addis Ababa, Ethiopia. c.

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

SOME COMMENTS ON THE THEOREM PROVIDING STATIONARITY CONDITION FOR GSTAR MODELS IN THE PAPER BY BOROVKOVA et al.

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Improved the Forecasting of ANN-ARIMA Model Performance: A Case Study of Water Quality at the Offshore Kuala Terengganu, Terengganu, Malaysia

The ARIMA Procedure: The ARIMA Procedure

Forecasting Simulation with ARIMA and Combination of Stevenson-Porter-Cheng Fuzzy Time Series

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Comparing the Univariate Modeling Techniques, Box-Jenkins and Artificial Neural Network (ANN) for Measuring of Climate Index

Ch 6. Model Specification. Time Series Analysis

S 9 Forecasting Consumer Price Index of Education, Recreation, and Sport, using Feedforward Neural Network Model

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Estimation and application of best ARIMA model for forecasting the uranium price.

MODELLING TIDE PREDICTION USING LINEAR MODEL AND ADAPTIVE NEURO FUZZY INFERENCE SYSTEM (ANFIS)IN SEMARANG, INDONESIA

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Asitha Kodippili. Deepthika Senaratne. Department of Mathematics and Computer Science,Fayetteville State University, USA.

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

Time Series Analysis of Currency in Circulation in Nigeria

A HYBRID MODEL OF SARIMA AND ANFIS FOR MACAU AIR POLLUTION INDEX FORECASTING. Eason, Lei Kin Seng (M-A ) Supervisor: Dr.

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Forecasting of the Austrian Inflation Rate

Comparison of ARCH / GARCH model and Elman Recurrent Neural Network on data return of closing price stock

Forecasting Automobile Sales using an Ensemble of Methods

Econometría 2: Análisis de series de Tiempo

STAT Financial Time Series

arxiv: v1 [stat.me] 5 Nov 2008

ECON/FIN 250: Forecasting in Finance and Economics: Section 8: Forecast Examples: Part 1

A stochastic modeling for paddy production in Tamilnadu

Intervention Model of Sinabung Eruption to Occupancy of the Hotel in Brastagi, Indonesia

Univariate linear models

Economics 618B: Time Series Analysis Department of Economics State University of New York at Binghamton

Chapter 2: Unit Roots

STAT 436 / Lecture 16: Key

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Oil price volatility in the Philippines using generalized autoregressive conditional heteroscedasticity

Econometric Forecasting

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

ARIMA model to forecast international tourist visit in Bumthang, Bhutan

2. An Introduction to Moving Average Models and ARMA Models

Lecture 5: Estimation of time series

Minitab Project Report - Assignment 6

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

Scenario 5: Internet Usage Solution. θ j

Chapter 8: Model Diagnostics

MCMC analysis of classical time series algorithms.

ANALYZING THE IMPACT OF HISTORICAL DATA LENGTH IN NON SEASONAL ARIMA MODELS FORECASTING

Estimating AR/MA models

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Box-Jenkins ARIMA Advanced Time Series

A Diagnostic for Seasonality Based Upon Autoregressive Roots

22/04/2014. Economic Research

Stationarity and cointegration tests: Comparison of Engle - Granger and Johansen methodologies

On Monitoring Shift in the Mean Processes with. Vector Autoregressive Residual Control Charts of. Individual Observation

A Comparison of Time Series Models for Forecasting Outbound Air Travel Demand *

Oil price and macroeconomy in Russia. Abstract

Statistical Methods for Forecasting

The log transformation produces a time series whose variance can be treated as constant over time.

TMA4285 December 2015 Time series models, solution.

Modeling of Foreign Direct Investment Inflow from Asean Countries To Indonesia Using Vector Autoregression (VAR) Method

ARIMA modeling to forecast area and production of rice in West Bengal

Solar irradiance forecasting for Chulalongkorn University location using time series models

Review Session: Econometrics - CLEFIN (20192)

Transcription:

PROCEEDING OF RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, MAY 0 Univariate and Multivariate Time Series Models to Forecast Train Passengers in Indonesia Lusi Indah Safitri, Suhartono, and Dedy Dwi Prastyo,, Department of Statistics, Institut Teknologi uluh Nopember Email: lusi@mhs.statistika.its.ac.id Abstract Time series model is one of quantitative methods that frequently used for forecasting a number of train passengers in certain route. In general, there are two types of time series models, i.e. univariate and multivariate time series. The objective of this paper is to apply ARIMA model as a univariate method and VARIMA as a multivariate method for forecasting a number of executive train passengers in Indonesia, particularly Surabaya-Jakarta route. The number of daily train passengers in three types of executive classes that departure from Surabaya Pasar Turi station, i.e. Argo Bromo Anggrek Pagi, Argo Bromo Anggrek Malam, and Sembrani, are used as case study. The data are consisted observations and recorded from uary st, 0 till ruary th, 0 and divided into two parts, i.e. uary st, 0 to uary 0 th, 0 and - ruary 0 as training and testing data, respectively. Root mean of squares error (RMSE) in testing data is used as criteria to select the best forecasting model. The results show that ARIMA yields more accurate forecast at two data, i.e. number of passengers at Argo Bromo Anggrek Pagi and Sembrani, whereas VARIMA gives better forecast at Argo Bromo Anggrek Malam. Hence, this result inlines with the first conclusion of M competition, i.e. statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones. Keywords: forecasting, train passengers, ARIMA, VAR, RMSE. M I. INTRODUCTION The train is known as the mode of transport that has multiple advantages, such as energy saving, landsaving, environmentally friendly, high safety levels, able to transport large amounts, as well as adaptive to technological development []. The number of train tickets is sold uncertainty (fluctuatively) everyday. In general, the demand of train tickets at the weekend usually increase compared to normal days. Moreover, the peak of train tickets demand usually occurs one to five days ahead of Idul Fitri due to an annual mudik tradition in Indonesia. The term mudik refers to the exodus of Indonesian workers from the cities back to their hometowns ahead of Idul Fitri. Not only the Muslim community of Indonesia will return to their places of origin, but also people adhering to other religions traditionally use this public holiday to visit their parents or make a short holiday. This demand peak usually continues after Idul Fitri due to they must going back after this holiday. A problem often faced by the railway operator is the large supply train ticket quotas are not appropriate to the number of train passengers. The number of train passengers usually increase in the days ahead of the national holidays or certain religious holidays. Due to the railway operator usually only provides the number of tickets as a normal day, this can lead to frustration of the passengers train because many passengers did not get a ticket. Hence, an accurate prediction of the number of rail passengers in the future is important to minimize the number of train passengers who did not get a ticket. There are some researches on forecasting the train passengers have been conducted such as Andalita [] who studied about forecasting the number of train passengers in economy class using ARIMA (Autoregressive Integrated Moving Average) and ANFIS (Adaptive Neuro Fuzzy Inference System) methods. Furthermore, Hermawan [] applied the model NN (Neural Network) for forecasting the number of railway passengers in Jabodetabek. Additionally, Rosyidah [] employed ARIMA modeling for forecasting of passenger trains on DAOP IX Jember. M - 9

ISBN 98-0-9-0-9 In this research, forecasting the number of train passengers in executive class with univariate and multivariate time series. The data will be used is the number of train passengers in three executive trains, i.e. Argo Bromo Anggrek Pagi, Argo Bromo Anggrek Malam and Sembrani. The univariate time series modeling used ARIMA while the multivariate time series modeling employed Vector Autoregressive Integrated Moving Average (VARIMA). Modeling of multivariate time series are not only able to predict the number of passengers in the future, but also could explain the relevance of other types of executives trains. Once the best model of univariate and multivariate time series was obtained, the models were compared based on the accuracy to predict the number of train passengers. In forecasting, multivariate methods are usually more complicated than the univariate method. However as said by Makridakis and Hibon [] that statistical methods are more sophisticated or more complicated does not always provide more accurate estimates than a simple method. Through the comparison of the two models time series derived from both methods will be obtained the best model to predict the number of rail passengers in the executive class Pasar Turi station. II. LITERATURE REVIEW A. Autoregressive Integrated Moving Average (ARIMA) ARIMA(p, d, q) model is a combination of the AR (Autoregressive) order p model and MA (Moving Average) order q model with differencing order d. ARIMA model can be used in the seasonal and nonseasonal data. The ARIMA (p, d, q) can be written as follows []: d ( B)( B) Y ( B) a, p t q t Where is a constant, p( B) ( B pb ) is polynomial backshift operator for AR q and p( B) ( B qb ) polynomial backshift operator for MA. The procedures used to obtain the forecasting value of using ARIMA consists of four steps starting from the model identification, parameter estimation, diagnostic testing and selection of the best models, forecasting. The identification can be done using ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots. This step is valid if the time series is stationary that can be visually checked through time series plot. If the data is not stationary in mean, then do differencing whereas if the data is not stationary in variance, then the Box-Cox transformation can be used. The complete steps to do ARIMA modeling can be seen in []. B. Vector Autoregressive (VAR) The VAR model with order one denoted as VAR() follows this equation []: p Yt Y a, o t- t The model in () that consists of two series can be written in matrix form as follows: Y c Y a,, t, t, t Y, t c Y, t a, t In general, the VAR model of order p denotes as VAR (p) is formulates as []: Y t Y Y a. o t- p t- p t The complete description of VAR model can be seen in []. C. Model Identification, Diagnostic Checking, and Model Selection Identification of time series model can be done by creating ACF and PACF plots for the univariate models. The identification step for the multivariate models can be done by employing MPACF (Matrix of Partial Autoregression Function) and the AIC (Akaike Information Criterion) []. M - 80

PROCEEDING OF RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, MAY 0 Checking the assumptions of model is conducted after the identification and parameter estimation steps. The purpose of this step is to determine whether the model fulfills the assumptions. The ARIMA model assumes that the residual is white noise and normally distributed whereas the VARIMA model assumes that the vector of residual is white noise and follows multivariate normal distribution []. Selection of the appropriate model was done based on smallest RMSE (Root Mean Squared Error) of out of sample data. The RMSE is calculated as follows []. ˆ n. L RMSE out of sample = Y Y l L l nl with is number of observation in out of sample data, is observation l in out sample, and is the value of l-step forecasting. III. RESEARCH METHODOLOGY A. Data and Variables The data used in the study were the data of the number passengers in daily train executive class with Surabaya-Jakarta route in the period from uary, 0, until ruary 9, 0. These data are secondary data which obtained from Pasar Turi Train Station. There are three types of train executive class data, i.e. the number of passengers in the train Argo Bromo Anggrek Pagi, the number of Sembrani train's passengers, and the number of Argo Bromo Anggrek Malam passengers. The variables in this study are denoted by Y with m is stating the type of trains and t is stating the mt, time (days). Following is the details of the variables in the study: Y : The daily number of Argo Bromo Anggrek Pagi train passengers,t Y,t Y,t : The daily number of Argo Bromo Anggrek Malam train passengers : The daily number of Sembrani train passengers. B. Steps of Analysis. ARIMA modeling with the following steps: a) Stationary inspection of data by looking at the data, ACF and PACF plots. If the data is not stationary variance, we can transform this data. If the data is not stationary in mean, we can difference this data. Formally, mented Dickey-Fuller test is used to check the data in the stationary mean. b) Identification of the model by ACF and PACF plots to determine the order of AR and MA c) Parameter estimation using OLS method. d) Selection of the best model by AIC criterion. e) Diagnosis check. f) Forecasting for data out samples with the selected order.. VAR modeling with the following steps: a) Inspection stationary of data such as the ARIMA model. b) To identify the model by MPACF and minimum AIC value thus obtained VAR order. c) Diagnosis check. d) Forecasting for data out samples with the model selected.. Compare the best model between univariate and multivariate models that have been selected. IV. ANALYSIS AND RESULTS A. Descriptive Statistics The first step in this study was dividing the data into in sample, i.e. uary, 0 until uary, 0, and out of sample data span from uary until ruary 9, 0. The descriptive statistics of the data are presented in Table. The largest average of the number of train passengers is for train ABAP (Argo Bromo Anggrek Pagi), followed by train ABAM (Argo Bromo Anggrek Malam) and Sembrani. However, the variance for Sembrani is the largest. The correlation between number of train passengers across trains are shown in Table. M - 8

ISBN 98-0-9-0-9 M - 8 TABEL. DESCRIPTIVE STATISICS FOR THE NUMBER OF TRAIN PASSENGERS Train Average Variance Minimum Maximum ABAP 0 0 9 ABAM 8 0 09 0 Sembrani 0 98 TABEL. CORRELATION OF NUMBER OF PASSENGERS OF BETWEEN TRAINS Train ABAP ABAM Sembrani ABAP - - ABAM 0. - Sembrani 0. 0. Table shows that number of passengers between trains are correlated. The strongest correlation is between ABAM and Sembrani. All the correlation are significant with p-values are less than 0.0. These facts motivated the use of multivariate time series to model the data. B. ARIMA Model ARIMA modeling begins with the identification step given the series stationary. Following is time series plot of the three train s passenger data: ABAP (left), ABAM (middle), and Sembrani (right). Based on time series plot in Figure, the data seems to be not stationary. They have high fluctuations. In order to know the stationary of these data, the Box-Cox transformation was used to check it where the results were reported in Table. Tahun Bulan Hari 0 0 0 00 00 00 00 00 00 00 Banyaknya penumpang Kereta Api Argo Bromo Pagi 8-9 -8 Year Month Day 0 0 0 00 00 00 00 00 00 Banyaknya Penumpang Kereta Api Argo Bromo Malam 8-9 -8 0 Year Month Day 0 0 0 00 00 00 00 00 00 Banyaknya Penumpang Kereta Api Sembrani 8-9 -8 0 FIGURE. TIME SERIES PLOT OF NUMBER OF PASSENGERS TRAINS TABEL. BOX-COX TRANSFORMATION Variable Rounded Value Lower Limit Upper Limit ABAP 0.8.8 ABAM..0. Sembrani..0. In the ABAP, the rounded value for the parameter estimate in Box-Cox transformation is one. This indicates that only the ABAP variables are stationary in variance. The series for ABAM and Sembrani are required to be transformed. The next step is to check the stationary in the mean. The checking can be viewed via time series plot and ACF. ACF plot shows a form of a cut off pattern. This shows that the data has not been stationary in the mean. It is necessary to do differencing, i.e. differencing order one and seven. After differencing, the next step is to look at the ACF and PACF plot of the data that has been stationary. Through ACF and PACF plot can be determined the order of ARIMA models. The ACF and PACF plots had been stationary using differencing order one and seven. The ARIMA models for the ABAP data is ARIMA (0,, ) (0,, ) because the ACF plot occurs in the lag,,,8 are cut off. The modeling for ABAM and Sembrani series need an outlier detection approach because the normality assumption in ARIMA models was not fulfilled. The model for ABAM series is ARIMA ([,,8],,)(0,,) with few outliers whereas the model for Sembrani is ARIMA (,, ) (0,, ) with few outliers. All models have fulfilled the assumptions, i.e. white noise and normality distribution. Thus, the model for Y,t is:,,. 0.0 0.8 t t t t t t Y Y Y a a a

PROCEEDING OF RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, MAY 0 The model for Y,t is: (0.9 B0. B )(0.8 B ) ( T ), t 8 t AO t Y a W I (0.0 B 0.B 0. B ) ( T ) with W I is: AO t () (8) () () () () () ().It 0.It.It.8It 0.It 90.It 00.It 8.It (9) () (8) () (0) () () (08) 8.It.It 89.It 08.It.9It 0.It.It 0.It () (89) (09) (0) (8) () (0) (9) 9.9It.It.It 9.0It.It 0.It 8.0It 00.9It () (0) (0) () (0) () (8) (9) 99.It 0.It 08.It 0.It 9.0It 00.It.It 8.It (90) () (9) (8) () () (0) (8) () 98.8It 8.8It 9.9It 9.8It 88.It 8.It.It 09.9It 88.It () (8) (80) () () () (9) () (9).It 09.It 80.It.It 9.It 00.It.It 0.It.It (99) () (8) () (0) (9) () () () 9.It 8.9It 90.It.It 8.It 9.It 9.It.It 8.It () () (8) (8) (8) ().It 0.It.8It 99.It 9.It 9.9 It. The model for Y,t is : (0. B)(0.88 B) Y a W I W I ( 0.09B 0.8B 0. B ) ( T) ( T),, t 8 t AO t ls t ( T ) with W I and W I AO t ls t ( T ) is: () () () (8) () (8) () (9).9It 9.It 8.0It.It.9It 99.It.It 8.It (0) (8) () () (08) (0) () (8).It 8.8It 9.It 89.9It 9.It.It 9.It.It (98) (9) (8) (8) (8) () () () () It 0.9It 098.0It 0.It 00.It 0It 000.It 888.9It 9.It () (0) () () (0) () () () 98.It 89.8It.It 0.It.It.It 00.0It 9.It ().8 I t. C. VARIMA Model VARIMA modeling begins with the identification step based on MACF and MPACF plots and AIC. The series for three train passenger were differenced order one and seven because the data are not stationary in the mean. These series were transformed using Box-Cox transformation because they are not stationary in variance. The minimum value of the AIC indicated that the proper model is VARIMA (0,,0)(0,,0). The results of parameter estimation of VARIMA (0,,0)(0,,0) indicates that the model has 0 parameters. However, the p-value of each parameter some parameters were not significant. So, the model need restrict some parameters. This restriction process start from the parameter estimate with the highest p-value until all parameter estimates had p-value less than Type-I error (α = 0.0). The results of parameter estimation VARIMA (0,,0)(0,,0) with restriction showed that there are 8 parameters that were significant in the model. The VARIMA (0,,0)(0,,0) model for the series of the number of train passengers is: M - 8

q ISBN 98-0-9-0-9 0 0 0. 0, 0 0. 0. 0 0. 0. 0 0. 0. 0 0 0. 0 0. 0 0 0 0. 0 B 0. 0. 0 B 0. 0. 0. B 0. 0. 0 B 0 0. 0. B 0 0 0 0 0. 0 0 0. 0 0 0. 0. 0 0. 0. 0 0. 0. 0 0 0. 0 0 0. 0 0. 0. 0 0 0. 0 0 0. 0 0 8 9 0 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 8 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0 0 B 0 0. 0 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0 0 0 0 0. 0 0 0. 0 0 0. 0 0 0 0 0. 0. 0 0 0. 0 0 9 0 0 0 0 B 0 0 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0. 0 B 0 0 0 B 0. 0 0 0. 0 0 0 0 0. 0 0 0. 0 0 0. 0 0 0. 0 0 0 0. 0 0 0. 0 0 ( B)( B ) Y a 8 9,t,t 0 0. 0 B 0 0. 0 B ( B)( B ) Y a,t,t 0 0 0. 0 0 0 ( B)( B ) Y a,t,t. Based on the model that had been established, the next step is testing the assumptions on residual. In multivariate time series modeling, to testing the assumption of white noise on the residual can be done by looking at the results of the portmanteau test. In this study, testing the white noise used AIC criterion calculated from the residual of VARIMA (0,,0)(0,,0) model. Based on Table, the smallest AIC value is in AR (0) and MA (0). This suggests that the residuals of the model have fulfilled the white noise assumption. TABEL. MINIMUM INFORMATION CRITERION OF RESIDUAL Lag MA 0 MA MA MA MA MA AR(0).8.9..89.. AR().88..9.0.. AR().9.9..8.9.9 AR().0..80..8. AR().08..0.8.9.9 AR()..0...98.099 The next assumption that must be fulfilled is multivariate normal distribution for the vector of residual. The null hypothesis of the test is that the residuals follows have multivariate normal distribution. The null hypothesis will be fail to be rejected if the p-value of the statistic test exceed the value of Type-I error. The evaluation of the multivariate normal assumption test can also be done visually through QQ plot of the residual. The assumption is fulfilled when residual plot tends to form a straight diagonal line as displayed by Figure. Moreover, if the proportion of values which generated through calculation are at least 0% greater than the value of statistic test, in this study is about 8%, the null hypothesis is fail to be rejected. Therefore, residuals of the VARIMA (0,,0)(0,,0) model have multivariate normal distribution. 8 0 8 0 0 8 dj 0 8 FIGURE. QQ PLOT FOR RESIDUALS Once the best model for ARIMA and VARIMA were obtained, the out of sample forecast can be calculated for the series. The RMSE out of the samples were reported in Table. The models for ABAP M - 8

PROCEEDING OF RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, MAY 0 and Sembrani have relatively large RMSE for both univariate and multivariate models. This indicates that the two models are less good than the model for ABAM. The comparison of RMSE out samples for univariate and multivariate models indicated that multivariate model, which is more complex, were not always resulted in better prediction than the simpler model. Often in some studies suggest that more complicated method will yield better accuracy but the reality is not always so. Table reported the forecasting value obtained from ARIMA model for each train. TABEL. THE COMPARISON RMSE OUT OF SAMPLE FOR UNIVARIATE AND MULTIVARIATE Variable RMSE Out of Sample for ARIMA RMSE Out of Sample for VARIMA ABAP..8 ABAM 9.. Sembrani 0. 0.0 TABEL. THE RESULTS OF FORECASTING FOR EACH TRAIN Dates ARIMA ARIMA Dates ABAP ABAM Sembrani ABAP ABAM Sembrani -- 8 -- 8 0 -- 9 8 -- 9 08 -- 8 9 8-- 0 8-- 9 0 9-- 9 8 9-- 0 8 98 0-- 90 0-- 9 -- 8 -- 8 0 -- 8 9 -- 0 9 9 -- 9 8 -- 8 8 8 -- 0 9 -- 8 0 -- 0 -- 8 0 98 -- 8 8 0 -- 8 80 -- 8 -- 8 8 9 8-- 99 80 0 8-- 80 88 9-- 0 9-- 0-- 8 0-- 8 -- 8 8 -- 8 0 -- 99 -- 9 99 8 -- 9 8 -- 8 -- 9 -- 0 8 -- 8 8 -- 8 -- 90 -- -- 0 V. CONCLUSIONS AND RECOMMENDATIONS The empirical results found that the best model for univariate approach to model number of passenger train are ARIMA (0,,)(0,,) for Argo Bromo Anggrek Pagi, ARIMA([,,8],,)(0,,) with outliers detection for Argo Bromo Anggrek Malam, and ARIMA (,,)(0,,) with outliers detection for Sembrani. The multivariate models appropriate to model these three series is VARIMA (0,,0) (0,,0). Based on the RMSE value for out of sample data, it can be concluded that the ARIMA model had better prediction for two series whereas the VARIMA model outperformed in one series, i.e. series for Argo Bromo Anggrek Malam. The empirical results also showed that many outliers were found in the data and influenced the forecast accuracy of both univariate and multivariate models. Hence, more detail about outlier detection can be done for further research. Moreover, non linear time series models such as ANN (Artificial Neural Network) which is flexible to overcome data with outliers also could be considered as a future research for forecasting train passengers in Indonesia. M - 8

ISBN 98-0-9-0-9 REFERENCES [] PT. Kereta Api Indonesia (0). www.kereta-api.co.id. Situs Resmi PT. Kereta Api Indonesia (Persero). [] Andalita, I. (0). Peramalan Jumlah Penumpang Keret Api Kelas Ekonomi Kertajaya Menggunakan ARIMA dan ANFIS. Tugas Akhir Statitika ITS. Surabaya. [] Hermawan, N. (0). Aplikasi Model Neural Network dan Neuron Fuzzy Untuk Peramalan Banyaknya Penumpang Kereta Api Jabotadetabek. Tugas Akhir UNY. Yogyakarta. [] Rosyiidah, U. (0). Pemodelan ARIMA Dalam Peramalan Penumpang Kereta Api Pada DAOP IX Jember. Jurnal. Jember. [] Makridakis, S., & Hibon, M. (000). The M-Competition: Results, Conclusions, and Implications. International Journal of Forecasting,, - [] W. W. Wei. Time Series Analysis (Univariate and Multivariate Methods). United States of America:Pearson Education,Inc. (00) M - 8