Forecasting Network Activities Using ARIMA Method

Similar documents
Minitab Project Report - Assignment 6

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

arxiv: v1 [stat.me] 5 Nov 2008

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Empirical Approach to Modelling and Forecasting Inflation in Ghana

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

Suan Sunandha Rajabhat University

Estimation and application of best ARIMA model for forecasting the uranium price.

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

Analysis. Components of a Time Series

Box-Jenkins ARIMA Advanced Time Series

Available online at ScienceDirect. Procedia Computer Science 72 (2015 )

at least 50 and preferably 100 observations should be available to build a proper model

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Dynamic Time Series Regression: A Panacea for Spurious Correlations

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

Application of Time Sequence Model Based on Excluded Seasonality in Daily Runoff Prediction

Prashant Pant 1, Achal Garg 2 1,2 Engineer, Keppel Offshore and Marine Engineering India Pvt. Ltd, Mumbai. IJRASET 2013: All Rights are Reserved 356

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Time Series Analysis of Currency in Circulation in Nigeria

Stat 565. (S)Arima & Forecasting. Charlotte Wickham. stat565.cwick.co.nz. Feb

ANALYZING THE IMPACT OF HISTORICAL DATA LENGTH IN NON SEASONAL ARIMA MODELS FORECASTING

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Modelling Multi Input Transfer Function for Rainfall Forecasting in Batu City

Forecasting Bangladesh's Inflation through Econometric Models

Univariate ARIMA Models

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

ARIMA model to forecast international tourist visit in Bumthang, Bhutan

Frequency Forecasting using Time Series ARIMA model

TRANSFER FUNCTION MODEL FOR GLOSS PREDICTION OF COATED ALUMINUM USING THE ARIMA PROCEDURE

Empirical Market Microstructure Analysis (EMMA)

Intervention Model of Sinabung Eruption to Occupancy of the Hotel in Brastagi, Indonesia

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Scenario 5: Internet Usage Solution. θ j

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

Comparing the Univariate Modeling Techniques, Box-Jenkins and Artificial Neural Network (ANN) for Measuring of Climate Index

Asitha Kodippili. Deepthika Senaratne. Department of Mathematics and Computer Science,Fayetteville State University, USA.

MCMC analysis of classical time series algorithms.

Forecasting Simulation with ARIMA and Combination of Stevenson-Porter-Cheng Fuzzy Time Series

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

Chapter 8: Model Diagnostics

Time Series and Forecasting

Lecture 19 Box-Jenkins Seasonal Models

A Data-Driven Model for Software Reliability Prediction

Modeling and forecasting global mean temperature time series

PREDICTING SURFACE TEMPERATURES OF ROADS: Utilizing a Decaying Average in Forecasting

Analysis of Violent Crime in Los Angeles County

Research Article Fishery Landing Forecasting Using Wavelet-Based Autoregressive Integrated Moving Average Models

Prediction of Grain Products in Turkey

Univariate and Multivariate Time Series Models to Forecast Train Passengers in Indonesia

Time Series and Forecasting

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

The log transformation produces a time series whose variance can be treated as constant over time.

Study of Time Series and Development of System Identification Model for Agarwada Raingauge Station

The Fitting of a SARIMA model to Monthly Naira-Euro Exchange Rates

The ARIMA Procedure: The ARIMA Procedure

UNIVARIATE TIME SERIES ANALYSIS BRIEFING 1970

Time Series Forecasting Model for Chinese Future Marketing Price of Copper and Aluminum

Improved the Forecasting of ANN-ARIMA Model Performance: A Case Study of Water Quality at the Offshore Kuala Terengganu, Terengganu, Malaysia

2. An Introduction to Moving Average Models and ARMA Models

STAT 153: Introduction to Time Series

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Forecasting hotspots in East Kutai, Kutai Kartanegara, and West Kutai as early warning information

Time Series Analysis of Stock Prices Using the Box- Jenkins Approach

FORECASTING. Methods and Applications. Third Edition. Spyros Makridakis. European Institute of Business Administration (INSEAD) Steven C Wheelwright

FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

Estimating AR/MA models

Author: Yesuf M. Awel 1c. Affiliation: 1 PhD, Economist-Consultant; P.O Box , Addis Ababa, Ethiopia. c.

Comparison between VAR, GSTAR, FFNN-VAR and FFNN-GSTAR Models for Forecasting Oil Production

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Forecasting Stock Prices using Hidden Markov Models and Support Vector Regression with Firefly Algorithm

STAT 436 / Lecture 16: Key

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Volume 11 Issue 6 Version 1.0 November 2011 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

Stochastic Processes

Justin Appleby CS 229 Machine Learning Project Report 12/15/17 Kevin Chalhoub Building Electricity Load Forecasting

Time Series Analysis -- An Introduction -- AMS 586

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Acta Universitatis Carolinae. Mathematica et Physica

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical Load

Development of Demand Forecasting Models for Improved Customer Service in Nigeria Soft Drink Industry_ Case of Coca-Cola Company Enugu

5 Autoregressive-Moving-Average Modeling

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Paper SA-08. Are Sales Figures in Line With Expectations? Using PROC ARIMA in SAS to Forecast Company Revenue

Trend and Variability Analysis and Forecasting of Wind-Speed in Bangladesh

LOAD FORECASTING APPLICATIONS for THE ENERGY SECTOR

Nonlinear Characterization of Activity Dynamics in Online Collaboration Websites

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

SOME COMMENTS ON THE THEOREM PROVIDING STATIONARITY CONDITION FOR GSTAR MODELS IN THE PAPER BY BOROVKOVA et al.

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Automatic Forecasting

Transcription:

Journal of Advances in Computer Networks, Vol., No., September 4 Forecasting Network Activities Using ARIMA Method Haviluddin and Rayner Alfred analysis. The organization of this paper is arranged as follows. Section II discusses the theoretical basis relevant to the work and the techniques used to perform time series analysis. Section III presents the experimental design and results obtained from the analyzed series, and Section IV concludes this paper with some recommendations on future works. Abstract This paper presents an approach for a network traffic characterization by using an ARIMA (Autoregressive Integrated Moving Average) technique. The dataset used in this study is obtained from the internet network traffic activities of the Mulawarman University for a period of a week. The results are obtained using the Box-Jenkins Methodology. The Box-Jenkins methodology consists of five ARIMA models which include ARIMA (,, ) (,, ), ARIMA (,, ) (,, ), ARIMA (,, ) (,, ), ARIMA (,, ) (,, ), and ARIMA (,, ) (,, ). In this paper, ARIMA (,, ) (,, ) was selected as the best model that can be used to model the internet network traffic. Index Terms Network traffic, forecasting. ARIMA, time II. NETWORK TRAFFIC DATA AND ARIMA (AUTOREGRESSIVE INTEGRATED MOVING AVERAGE) A forecasting method that is used to model data that continues to evolve is called a time series forecasting method. This method performs the forecasting process by analyzing the relationship between the variables to be estimated and the time variable. The time series model assumes that some patterns or combination of patterns will be repeated all the time. Thus, by identifying and extrapolating these patterns, it can be predicted. The time series forecasting method focuses on the type or pattern of data. There are four kinds of time series data patterns, namely; ) trend (T); patterns is a trend toward data in the long term can be either an increase or a decrease, ) seasonal variation (S); fluctuations of the data that is periodically occurred within one year, such as monthly, weekly, or daily, ) cycles (C); fluctuations of the data for a period of more than one year, and 4) random component (R); time series data that are influenced by seasonal variation, trends, cycles and random factors [], [], [4], Fig. a- Fig. d. series, I. INTRODUCTION Today, internet plays an important role in any organizations. Particularly, internet is used in most universities to support the teaching and learning activities. These activities are part of civitas-academica, and thus the internet infrastructure must be well organized in order to optimize the usage of internet. In this sense, Information Technology department plays an important role to ensure a high quality internet service is accessible to all university staff. As part of the organization s performance, a good planning should be outlined in order to provide an acceptable internet service. Furthermore, internet performance services are normally measured based on the capacity of the internet network and the traffic requirements based on the university activities. As a result, forecasting the demand of the internet service used to support teaching and learning activities is very crucial in order to ensure the effectiveness and efficiency of the teaching and learning processes. In this paper, the characterization of the network activities is performed in order to visualize the internet network traffic requirements []-[]. The knowledge about the network traffic requirements can be used to plan for a better infrastructure development in order to provide high-quality services and design optimization. The modeling or forecasting methodology is one of the best phenomenological used to solve problems related to construction planning and evaluation of internet services provider in the future. The purpose of this paper is to model and forecast the internet network traffic activities by using a time series Fig. a. Illustrates the types of patterns that are normally identified in a time series data patterns Trend. Manuscript received December 5, ; revised March 9, 4. Haviluddin is with the School of Natural Science, Department of Computer Science, Universitas Mulawarman, 759 Samarinda, Indonesia (e-mail haviluddin@gmail.com). Rayner Alfred is with the COESA, Universiti Malaysia Sabah, 884 Kota Kinabalu, Sabah, Malaysia (e-mail ralfred@ums.edu.my). DOI.776JACN.4.V.6 Fig. b. Illustrates the types of patterns that are normally identified in a time series data patterns Seasonal Variation. 7

Journal of Advances in Computer Networks, Vol., No., September 4 Fig. c. Illustrates the types of patterns that are normally identified in a time series data patterns Cycles. Fig.. Network traffic plot. B. ARIMA A time series is an ordered sequence of observations and many ways are used to forecast the time series data. In principle, time series model is used to predict the values of data (yt +, y+,..., yt-n) based on the data (xt +, xt+,..., xt-n). In learning a time series data, one may use a method that is based on mathematical or machine learning concepts. A time series method based on a mathematical model is one of the oldest models used in forecasting a time series data. However, this method has some difficulties when applied to a linear or non-linear model, because in fact, a time series data is rarely a linear or non-linear and often contain both. However, this model still has a very good level of accuracy for short-term forecasting with the terms time series data is non-stationary, at the linear moment [6]. On the other hand, the machine learning models can be used to perform a time series forecasting for data which is non-linear, because this model ignores stationary data. The basic principle of this method is based on real life, where a lot of problems with the data containing linearity and non-linearity simultaneously [4]-[6]. One of the famous methods used in forecasting a time series data is ARIMA. The ARIMA method is used to analyze a time series data in which it is designed by integrating the AR (autoregressive) and MA (moving average) methods. The ARIMA (p, d, q) is a general method that is formulated with respect to the data series that are stationary only, where, p is the number of processes in AR, d is the number of differencing a time series of data to be stationary, and finally, q is the number of processes in MA [6]. In this section, the ARIMA model will be briefly described. In addition, time series condition using ARIMA is series observation must be stationary. The series observation stationer expressed when the processes are not changing along with time changing, meaning that the series observation average and the time throughout are always constant. In general, a time series that is not stationary has two factors that need to be considered; means and varians. If the time series is not stationary, the next processes are transformation which is performed in varians and differencing which is performed in means. In varians, the rule for the transformation of the time series model consists of () For series only, Zt are positive, () Transformation before differencing process, and () λ value is chosen based on the Sum of Square Error (SSE) obtained from the results of series transformation. Usually, the smallest SSE value has a constant varians. Meanwhile, in means, the differencing process is performed between a specific period of data with another period of data. Fig. d. Illustrates the types of patterns that are normally identified in a time series data patterns Random Component. A. Network Traffic Data According to [], network traffic can be defined as a large network packet datasets, which are generated during the data communication over time. It can be expressed that network traffic pattern is seasonal variation where the pattern has developed significant tend at specific seasonal period. Therefore, network traffic datasets can be analyzed as a time series [], [], [5]. The manageable switch or core switch is a central of network traffic. Thus, the core switch network can be customized, such as configuring switches to become a VLAN (virtual local area network) that is able to send the packets very quickly. Its main function is to determine access core, routing, and filtering. In this research, each network traffic data was captured by the CACTI software. The network traffic has two graphic areas, indicated by green and blue colors. The green color indicates the inbound traffic and the blue color indicates the outbound traffic. The Inbound traffic shows data that comes from outside (e.g., a computer) into the network. On the other hand, the outbound traffic shows data that goes out from the network. Fig. and Fig. indicate the network traffic plot. The internet network traffic is a seasonal time series and in this paper, the ARIMA of order (p, d, q) (P, D, Q)s will be used to model the network traffic activities. Fig.. Network traffic data, capture by CACTI software. 74

Journal of Advances in Computer Networks, Vol., No., September 4 TABLE I ACF AND PACF IDENTIFICATION Model ACF PACF AR (p) cut-off after lag p MA (q) cut-off after lag q ARMA (p,q) AR (p) or MA (q) cut-off after lag q cut-off after lag p Source[4] re-identification will be needed. Stage Parameter estimation Once the ARIMA parameters estimations are obtained, the model identification can be validated. The purpose of model validation is to ensure that the right model is used. This can be done by using the t -statistics and p -value. Autocorrelation is a correlation that exists between the observations of a time series separated by k time units. Thus, the autocorrelation (ACF) data is plotted based on the lag k time units itself. Meanwhile, PACF are correlations that exist between sets of ordered data pairs of a time series. However, the values of p, d and q can be determined based on ACF and PACF, as shown Table I. Stage Model checking (hypothesis and diagnostic test) The proposed model needs to be checked for its competency before it can be used for forecasting. Many model tests in statistics can be used. One of the famous models test is based-on the Ljung-Box Q statistic. This is tested for white noise checked with p-value > α.5 and Kolmogorov Smirnov tested for normal distribution with p-value > α.5 condition. Nevertheless, p-value < α.5 or null in the ARIMA estimation should be used, but a few of normal distribution data does not spread in line. Stage 4 Forecasting The last step in the Box-Jenkins method is forecasting. The results of the ARIMA forecasting processes can be analyzed in three different options for upper limit, lower limit, and forecast values. The upper and lower limits provide 95% confidence interval. In other words, that forecast values in the confidence limit will be accepted. III. RESULTS AND DISCUSSION The results of the network traffic inbound event analysis that has taken for three days (June -, ) with samples of 7 data, Fig. 5. The first ARIMA steps is transformation process with Box-Cox with twice process, thus stationer data in varians λ = is obtained, Fig. 6. Time Series Plot of Inbound -6 5 Inbound 5 5 Date Fig. 4. ARIMA (box-jenkins) of stages. Time According to the Box-Jenkins [6] methodology, there are four forecasting stages, that includes; () identification model, () parameter estimation, () model checking (hypothesis and diagnostic test), and (4) forecasting, as shown in Fig. 4. 6 6 7 6 6 7 4 Box-Cox Plot of inbboxcox Lower C L Upper C L Lambda (using 95.% confidence) StDev Estimate.75 Lower C L Upper C L.5.4 Rounded Value. 9 8 7 6 5 Limit 4 - - Lambda 4 5 Fig. 6. ARIMA transformation with Box-Cox. 75 Fig. 5. Network traffic Inbound plots in three day June,. Stage Model identification In any kind of time series analysis, the first step is to plot the data. The data series will be carefully examined in order to determine whether the series contains a trend, seasonality, cycles or random phenomena. After that, the sample ACF and PACF of the original series are computed and examined in order to further confirm that the time series data is stationary. If the sample ACF decays very slowly, it indicates that differencing processes is needed. The risk of producing incorrect model identification will be incorrect model estimation produced and model

Journal of Advances in Computer Networks, Vol., No., September 4 The second step is to perform the autocorrelation checking by observing the ACF and PACF plots with λ >.5. Since the data is still not stationary in means, both non-seasonal and seasonal differencing processes are performed. For a non-seasonal, a differencing process with lag is performed based on the result obtained from the transformation Box-Cox. Meanwhile, a seasonal differencing process with lag is performed as shown in Fig. 7 and Fig. 8. (,, ), ARIMA (,, ) (,, ), and ARIMA (,, ) (,, ). Afterward, the most significant model is identified and tested by using significant parameters in white noise test. The test results determine a good model for analysis and forecasting. The best result in this test is obtained from the ARIMA (,, ) (,, ), as shown in Fig. 9. Afterward, the model normality is checked by using p-value > α.5 as shown in Fig.. Normality Test Autocorrelation Function for ACFInbl Normal (with 5% significance limits for the autocorrelations) 99.8 95.6 9.4 8. 7 Percent Autocorrelation.. -. Mean StDev N KS P-Value 75 7.. >.5 6 5 4 -.4 -.6 -.8 5 -. 4 5 6 7 8 9 7 8 Fig. 7. ACF plot data λ = ARIMAr 4 Fig.. Normality test parameters. Time Series Plot for InbBoxCox Partial Autocorrelation Function for ACFInbl (with forecasts and their 95% confidence limits) (with 5% significance limits for the partial autocorrelations) 7. 6.8 5.6 4.4 InbBoxCox Partial Autocorrelation 9.. -. -.4 -.6 - -.8 - -. 4 5 6 7 8 9 SE Coef.78.798 8.64 T -8.5.9 -.6 P...79 4 47.6. 6 5.5. 6 7 84 Time 96 8 44 56 In this paper, the ARIMA method is applied to forecast the internet network traffic usage. The best result is obtained from the ARIMA (,, ) (,, ) with normality test of normal and parameters are estimated with p-value < α.5. However, the results obtained shows that there exists a weak relationship residual value because of p-value >.9. However, the ARIMA estimation is quite reliable for forecasting because the iteration expression convergence relative change in each estimate is less than.. For future works, in order to improve the time series forecasting model, combining methods in machine learning and statistical concepts will be the best option since a time series data consists of both linear and non-linear data [], [5]-[9] with the assumptions Modified Box-Pierce (Ljung-Box) Chi-Square statistic 9. 9.48 48 IV. CONCLUSION AND FUTURE RESEARCH Differencing regular, seasonal of order Number of observations Original series 44, after differencing 9 Residuals SS = 744 (backforecasts excluded) MS = 55 DF = 6 Chi-Square DF P-Value 6 The fourth step is forecasting. From the best result, internet network traffic forecasting plot in Fig.. Final Estimates of Parameters Coef -.67.875 -.55 4 Fig.. Forecasting plots. Fig. 8. PACF plot data λ = Type SAR SMA Constant 48 66.9 45.9 Fig. 9. Final estimates of ARIMA (,,)(,,) parameters. The third step is called the ARIMA estimation and model checking processes. The results of ARIMA (,, ) are estimated, in which it may consist of five models; ARIMA (,, ) (,, ), ARIMA (,, ) (,, ), ARIMA (,, ) Zt = Lt + Nt where 76 ()

Journal of Advances in Computer Networks, Vol., No., September 4 Z t = time series L t = linier component model N t = non-linier component model REFERENCES [] T. C. Fu, "A review on time series data mining," Engineering Applications of Artificial Intelligence, vol. 4, pp. 64-8,. [] E. J. Palomo, J. North, D. Elizondo, R. M. Luque, and T. Watson, " special issue application of growing hierarchical SOM for visualisation of network forensics traffic data," Neural Networks, vol., pp. 75 84,. [] A. C. F. Santos, J. D. S. D. Silva, L. D. S. Silva, and M. P. D. C. Sene, "Network traffic characterization based on Time Series Analysis and computational intelligence," Journal of Computational Interdisciplinary Sciences, vol., pp. 97-5,. [4] W. W. S. Wei, Time Series Analysis Univariate and Multivariate Methods Second Edition, 6. [5] C. Li and T. W. Chiang, "Complex neurofuzzy ARIMA forecasting a new approach using complex fuzzy sets," IEEE Transactions on Fuzzy Systems, vol.,. [6] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis Forecasting and Control, 4th ed., 8. [7] M. Khashei and M. Bijari, "An artificial neural network (p, d, q) model for timeseries forecasting," Expert Systems with Applications, vol. 7, pp. 479 489,. [8] W. Qiu, X. Liu, and H. Li, "A generalized method for forecasting based on fuzzy time series," Expert Systems with Applications, vol. 8, pp. 446 45,. [9] G. S. D. S. Gomes and T. B. Ludermir, "Optimization of the weights and asymmetric activation function family of neural network for time series forecasting," Expert Systems with Applications, vol. 4, pp. 648 6446,. Haviluddin was born in Loa Tebu, East Kalimantan, Indonesia, on May 8 th, 97. He graduated from STMIK WCD Samarinda in 5 in the field of management information, and he completed a master at Gadjah Mada University, Yogyakarta in 9 in the field of computer science. He is also a lecturer in the Faculty of Natural Science, University Mulawarman, East Kalimantan, Indonesia. Currently, he is pursuing his PhD in the fieald of computer science at the School of Engineering and Information Technology, Universiti Malaysia Sabah, Malaysia. Rayner Alfred was born in Kota Kinabalu, Sabah. He completed a PhD in 8 looking at intelligent techniques to model and optimize the complex, dynamic and distributed processes of knowledge discovery for structured and unstructured data. He holds a PhD degree in computer science from York University (United Kingdom), a master degree in Computer Science from Western Michigan University, Kalamazoo (USA) and a computer science degree from Polytechnic University of Brooklyn, New York (USA). Dr. Rayner leads and defines projects around knowledge discovery and information retrieval at Universiti Malaysia Sabah. One focus of Dr. Rayner s work is to build smarter mechanism that enables knowledge discovery in relational databases. His work addresses the challenges related to big data problem How can we create and apply smarter collaborative knowledge discovery technologies that cope with the big data problem. Dr. Rayner has authored and co-authored more than 75 journalsbook chapters and conference papers, editorials, and served on the program and organizing committees of numerous national and international conferences and workshops. He is a member of the Institute of Electrical and Electronic Engineers (IEEE) and Association for Computing Machinery (ACM) societies. 77