Forecasting Stock Prices using Hidden Markov Models and Support Vector Regression with Firefly Algorithm

Similar documents
Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

A Data-Driven Model for Software Reliability Prediction

Empirical Market Microstructure Analysis (EMMA)

Econ 423 Lecture Notes: Additional Topics in Time Series 1

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

MCMC analysis of classical time series algorithms.

Minitab Project Report - Assignment 6

Forecasting Bangladesh's Inflation through Econometric Models

Suan Sunandha Rajabhat University

FE570 Financial Markets and Trading. Stevens Institute of Technology

at least 50 and preferably 100 observations should be available to build a proper model

SOME BASICS OF TIME-SERIES ANALYSIS

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Stochastic Processes

Available online Journal of Scientific and Engineering Research, 2017, 4(10): Research Article

Estimation and application of best ARIMA model for forecasting the uranium price.

ECON/FIN 250: Forecasting in Finance and Economics: Section 7: Unit Roots & Dickey-Fuller Tests

22/04/2014. Economic Research

Paper SA-08. Are Sales Figures in Line With Expectations? Using PROC ARIMA in SAS to Forecast Company Revenue

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Lecture 2: Univariate Time Series

Trend and Variability Analysis and Forecasting of Wind-Speed in Bangladesh

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Romanian Economic and Business Review Vol. 3, No. 3 THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS

Forecasting the Prices of Indian Natural Rubber using ARIMA Model

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Economics 618B: Time Series Analysis Department of Economics State University of New York at Binghamton

Forecasting Network Activities Using ARIMA Method

Forecasting Egyptian GDP Using ARIMA Models

Oil price volatility in the Philippines using generalized autoregressive conditional heteroscedasticity

Basics: Definitions and Notation. Stationarity. A More Formal Definition

arxiv: v1 [stat.me] 5 Nov 2008

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Analysis of Violent Crime in Los Angeles County

Transformations for variance stabilization

Forecasting. Simon Shaw 2005/06 Semester II

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Time Series Analysis -- An Introduction -- AMS 586

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

INTRODUCTORY REGRESSION ANALYSIS

Unit root problem, solution of difference equations Simple deterministic model, question of unit root

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

Forecasting using R. Rob J Hyndman. 2.3 Stationarity and differencing. Forecasting using R 1

Sample Exam Questions for Econometrics

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH

The data was collected from the website and then converted to a time-series indexed from 1 to 86.

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Univariate linear models

Testing for Unit Roots with Cointegrated Data

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Lecture 5: Estimation of time series

Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical Load

MODELLING TIME SERIES WITH CONDITIONAL HETEROSCEDASTICITY

Forecasting Gold Price. A Comparative Study

Time Series Forecasting Model for Chinese Future Marketing Price of Copper and Aluminum

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Modelling Wind Farm Data and the Short Term Prediction of Wind Speeds

Time Series I Time Domain Methods

Automatic seasonal auto regressive moving average models and unit root test detection

Forecasting: principles and practice 1

Predicting Time of Peak Foreign Exchange Rates. Charles Mulemi, Lucio Dery 0. ABSTRACT

Acta Universitatis Carolinae. Mathematica et Physica

ANALYZING THE IMPACT OF HISTORICAL DATA LENGTH IN NON SEASONAL ARIMA MODELS FORECASTING

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Univariate ARIMA Models

Frequency Forecasting using Time Series ARIMA model

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Pattern Matching and Neural Networks based Hybrid Forecasting System

Univariate, Nonstationary Processes

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Review Session: Econometrics - CLEFIN (20192)

Agricultural Price Forecasting Using Neural Network Model: An Innovative Information Delivery System

FORECASTING FLUCTUATIONS OF ASPHALT CEMENT PRICE INDEX IN GEORGIA

Justin Appleby CS 229 Machine Learning Project Report 12/15/17 Kevin Chalhoub Building Electricity Load Forecasting

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Implementation of ARIMA Model for Ghee Production in Tamilnadu

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Time Series and Forecasting

Gaussian Copula Regression Application

Is the Basis of the Stock Index Futures Markets Nonlinear?

IS THE NORTH ATLANTIC OSCILLATION A RANDOM WALK? A COMMENT WITH FURTHER RESULTS

Ross Bettinger, Analytical Consultant, Seattle, WA

BJEST. Function: Usage:

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Forecasting: principles and practice. Rob J Hyndman 1.1 Introduction to Forecasting

TRANSFER FUNCTION MODEL FOR GLOSS PREDICTION OF COATED ALUMINUM USING THE ARIMA PROCEDURE

n. Detection of Nonstationarity

Part 1. Multiple Choice (40 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 6 points each)

Report for Fitting a Time Series Model

Forecasting Simulation with ARIMA and Combination of Stevenson-Porter-Cheng Fuzzy Time Series

Decision 411: Class 9. HW#3 issues

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Arma-Arch Modeling Of The Returns Of First Bank Of Nigeria

Transcription:

June 20 22, 2017 Forecasting Sck Prices using Hidden Markov Models and Support Vecr Regression with Firefly Algorithm Joshua Reno S. Cantuba 1,*, Patrick Emilio U. Nicolas 1,*, and Frumencio F. Co 1 1Mathematics Department, De La Salle University *Corresponding Authors: joshua_cantuba@dlsu.edu.ph, patrick_nicolas@dlsu.edu.ph Abstract: The auregressive integrated moving average (ARIMA) model is extensively used in the fields of economics and finance for forecasting sck prices. Using vast amounts of hisrical data, it is used in the Philippine Sck Exchange (PSE) forecast future sck price movements. In this paper, two new forecasting techniques are introduced: hidden markov models (HMM) and support vecr regression with firefly algorithm (SVR-FA). Both methods are compared ARIMA in analyzing closing sck prices of five selected Philippine companies: SM, Ayala Corporations (AC), Philippine Long Distance Telephone (TEL), JG Summit (JGS), and Manila Electric Railroad and Light (MER). All five companies present closing sck price movements that present a challenge the proposed models as well as for ARIMA. The HMMs are trained using observable states that can emit possible movement of hidden states. In sck price forecasting, we may assume an underlying hidden movement that governs the actual increases (decreases) in sck prices for model estimation. The SVR-FA model uses the ε-sensitive loss function and a kernel function for analyzing the best regression hyperplane that relates the current day s closing sck prices with the future closing sck prices. Results show that SVRFA and HMM performed better than ARIMA in forecasting the closing sck prices ofthe selected companies with SVR-FA yielding forecasts with the lowest mean absolutepercentage errors (s) and mean absolute deviations (MADs). Keywords: Scks, ARIMA, HMM, MAD,, SVR-FA 1. INTRODUCTION Sck price forecasting has been a challenge due its random patterns and unpredictable movements. The primary motivation for studying such price movement is determine the times or schedules for which scks can be bought and sold at profitable positions. Several studies have been made for forecasting the trend of sck prices; although there are a lot of methods that can be used for the process, an intelligent prediction model for sck price forecasting is highly desirable (Nath& Hassan, 2005). One of the most famous models used in predicting sck prices is the Auregressive Integrated Moving Average (ARIMA) model. ARIMA has been extensively used in the fields of economics and finance. In the Philippines, the Philippine Sck Exchange (PSE) forecasts future sck movement by applying the ARIMA model on vast amounts of

June 20 22, 2017 hisrical data (Trading Economics, n. d.). Although the use of the said model has been proven trustworthy over for so many years, is it the best model available for forecasting Philippine sck prices? In this paper, we would like introduce the use of two new forecasting methods namely: HMM and Support Vecr Regression (SVR). It is also the objective of the paper use ARIMA in Philippine sck prices as the benchmark for the two proposed methods. 2. METHODOLOGY 2.1 Data Sck prices of SM, AC, TEL, JGS, and MER from January 1, 2012 February 23, 2016 (daily data) are used for the study. The opening price, closing price, highest price, and lowest price of the day were selected as variables of interest. The last ten closing sck prices from February 10, 2016 February 23, 2016 of the five selected companies were set aside test the forecasting capabilities of the constructed models and how good it will be in forecasting sck prices for at most two weeks. The other observations were then split in two parts: the training set and the testing set with the ratio of 90-10%, respectively. The ARIMA models are constructed using the closing sck prices of the entire training and testing set of each company. Various HMM and SVR FA models are constructed using the training set of each company and the best model is selected through the testing set. HMM incorporates all four variables of interest. On the other hand, similar ARIMA, SVR FA incorporates closing sck prices only. The following software are used: Murphy s Hidden Markov Model MATLAB Toolbox for HMM, Chang and Lin s LIBSVM MATLAB Toolbox for SVR and the forecast and lmtest olbox in R for the ARIMA. 2.2 Model Fitting and Forecasting The ARIMA model fitting and the validation of the presented model through series of statistical tests in our study were facilitated by the aumatic selection option of R. The primary concern is construct a model that satisfies the assumptions of an ARIMA model and select the most accurate one based on forecast errors. HMMs shall be constructed on the assumption that a particular hidden movement influences the increase or decrease of closing sck prices. This involves computation of log-likelihood values of observable movement of scks which would be handy in finding hisric patterns for forecasting. The goal is create multiple HMM models with different structures (in states) extract loglikelihood values from the testing sets and locate a particular hisric pattern with the closest loglikelihood value. The increases/decreases in the said pattern will be used compute for the forecasts. For the SVR, the firefly algorithm was used carefully select the optimal input parameters C, γ, and ε. SVR-FA will refer the firefly optimized SVR. An initial population of 25 fireflies was chosen because it makes the process faster. The parameter δ was set 0.97 so that the optimization process can capture more of the random movements of fireflies. The task is run the algorithm and search for SVR- FA with optimal parameters that can forecast future closing sck prices with the lowest Mean Absolute Percent Error (). All other parameters be used will follow that of Table 1. Table 1. Validated Parameters for SVR-FA Parameter Estimates Validation Source α 0.1 Considered best γ 0.01 through simulation β 1.0 [0.95, δ 0.97] Firefly [25, 40] population Based on usage experience Arora and Singh, 2013 Yang, 2008 Validation of the constructed models was done by computing the values of the forecasts on the testing set. After selecting the final models, measures of accuracy are obtained from the forecasts of the last ten observations that were set aside

June 20 22, 2017 through the and Mean Absolute Deviation (MAD). 3. RESULTS AND DISCUSSION 3.1 ARIMA Model Fitting The closing sck prices after the removal of the last ten observations was first subjected the Goldfeld-Quandt Test as shown in Table 2. Of the five companies, only TEL and JGS closing sck prices showed significance with p-values less than 0.0001 indicating that their variances are significantly different at the beginning and the latter part of the time series. They were then logtransformed eliminate non-stationarity in variance. Augmented Dickey-Fuller was then used as shown in Table 2. Results show that after first differencing, all closing sck prices are significantly mean stationary. Table 2. P-values of the Goldfeld-Quandt Test and Augmented Dickey-Fuller Test Goldfeld- Quandt Test Augmented Dickey-Fuller Test Before Differencing After First Differencing SM 1.0000 0.5376 0.0100 AC 0.1680 0.2558 0.0100 TEL <0.0001 0.7164 0.0100 MER 1.000 0.3884 0.0100 JGS <0.0001 0.1219 0.0100 In accordance with the mean-stationarity tests, max ordinary differencing for the aumatic selection was set at 1 since all five closing sck prices are stationary after first differencing and needless differencing produce less satisfacry models (Pankratz, 1983).The ACF and PACF plots after log transformation and ordinary differencing show no signs of seasonal components. As such, seasonal components and differencing are not included in the aumatic selection criteria. Starting AR and MA operars for the aumatic stepwise selection process are set at 0. The final constructed models for each of the five selected companies are shown in Table 3 while the validation of the constructed models is done using the Ljung-Box test as shown in Table 4. Through the said test with a maximum consideration of 250 lags, each constructed ARIMA model has shown p-values greater than 0.05 which is an indication that the said models are not significantly different from a white noise process. Table 3. Final ARIMA Models Table 4.Ljung-Box Test p-value SM 0.3930 AC 0.0638 TEL 0.9944 MER 0.3267 JGS Final Model With Mean 0.4754 Log Transformed SM ARIMA(5,1,0) No No AC ARIMA(1,1,1) No No TEL ARIMA(4,1,3) No Yes MER ARIMA(0,1,0) Yes No JGS ARIMA(2,1,2) Yes Yes 3.2 HMM Model Fitting HMMs were used get the log-likelihood value of the occurrence of the observation sequences in the testing set. The observation sequence in the training set was then partitioned in sizes similar the testing set and the partition with the closest log-likelihood value was chosen as the best hisric pattern that can replicate the movement of the testing set. This procedure was done in different HMM structures. The HMMs that

June 20 22, 2017 located the best pattern and has the least parameters (states) for each company are selected as the final models. Table 5 shows the selected HMM for each of the 5 companies.additionally, the log-likelihood values of the selected models are shown in Table 6. Table 5. s selected for the HMM procedure for the five companies Number of Hidden States 2 3 4 5 Table 5 (Continuation) Number of Hidden States 2 3 4 5 from 06/04/14 03/31/15 08/24/15 03/31/15 08/24/15 10/29/14 03/30/15 From SM TEL 1.4965% 1.8191% 1.8191% 1.7645% from 10/17/12 03/13/13 03/30/201 5 08/20/15 03/14/13 08/06/13 From 03/14/13 08/06/13 08/07/13 01/08/14 03/27/15 03/27/15 AC MER 1.7008% 1.7202% 1.9833% 2.0721% 2.0128% 1.7255% 1.2230% 1.2230% Table 5 (Continuation) Number of Hidden States 2 3 4 5 Table 6. Loglikehood of the patterns selected SM -157.2148 AC -155.8995 TEL -153.3345 MER -157.8273 JGS -155.7367 From 03/12/13 08/01/13 01/03/14 05/26/14 10/20/14 03/18/15 05/27/14 10/17/14 JGS Log-likelihood Value Testing Date Set 06/04/14-156.9371-156.2866-151.9672-158.0489 03/27/15 01/03/14-155.5603 05/26/14 2.0164% 1.8000% 2.0121% 1.9658% Date 09/08/ 2015 02/09/ 2016 It can be seen from Table 6 that the final HMMs has located hisric patterns that have very close log-likelihood values the corresponding testing set of each company. 3.3 SVR-FA Model Fitting Table 7 shows the firefly-optimized parameters for SVR that are used construct the final SVR model for each of the five selected companies. Note that all parameters are used for analyzing normalized closing sck price movement. Thus, only normalized data may be fed in the constructed SVR models for forecasting. Their corresponding s for the testing set are included

June 20 22, 2017 as well. The parameters C*, γ*, and ε* represent the optimal parameters for the SVR model. Table 7. Firefly-optimized parameters for SVR Figure 1. Forecasts for the actual (blue) closing sck prices of the five selected companies from ARIMA (red), HMM (green), and SVR-FA(violet) models C* γ* ε* SM 0.6483 1.0500 0.0005 1.3896% AC 3.3967 4.8345 0.0390 1.3733% TEL 2.6189 0.0261 0.0002 1.7694% JGS 94.4953 83.3669 0.0201 1.6218% MER 23.4454 33.9205 0.0009 1.0834% 3.4 Forecasting In this stage, the three constructed models are used in forecasting the last ten closing sck prices of the five companies that were set aside. Forecasting future closing sck prices in R using the constructed ARIMA models is done using the forecast command. As for HMM, the observation sequence of the last ten observations were considered as the new testing set and the remaining data were partitioned in sizes of ten. The partition with log-likelihood value closest the sequence in the testing set is considered as the best hisric pattern. The changes in the closing sck prices of the selected pattern were then extracted and used make forecasts. For SVR-FA, the last ten closing sck prices were normalized and each normalized current day closing sck prices was fed the final SVR models for each company generate forecasts. The plots in Figure 1 show the comparison between the movements of the forecasts versus the actual prices. As shown in the previous figure, the plots portray reasonable forecast values in each of the five company s actual prices.the of the forecasts yielded by the three models on the five companies are shown in Table 9. Table 9. of the forecasts for the last 10 observations Mean Absolute Percent Error ARIMA HMM SVR-FA SM 1.1178% 1.8522% 1.1061% AC 1.5506% 1.3226% 0.6867% TEL 3.2931% 2.2285% 1.3879% MER 1.3831% 0.8178% 0.5461% JGS 3.6994% 1.4548% 0.9501% It can be seen from Table 9 that among the three models, SVR-FA s forecasts have the lowest s. Table 10 shows the mean of the last 10 observations for each company that were set aside and the corresponding MADs for the three forecasting methods which portrays the same result.

June 20 22, 2017 Table 10. Mean of the last 10 observations and MAD of their corresponding forecasts (in Php) Mean Mean Absolute Deviation ARIMA HMM SVR-FA SM 822.40 9.17 15.25 9.13 AC 693.75 10.79 9.15 4.77 TEL 2223.6 0 74.45 49.6 31.18 MER 319.18 4.43 2.62 1.74 JGS 67.97 2.54 0.99 0.65 5. CONCLUSION It is clear from the results that the SVR-FA is the best pick in forecasting the closing sck prices of the five selected companies since it has the lowest s and MADs. ARIMA gives stationary forecasts which are not good representations of actual sck price movements. HMM still did well in the study, however, there are sudden increases/decreases in the forecasts which give relatively high or low forecasts compared the actual values. The HMMs used in the study also incorporated all four variables of interest unlike the other two methods; even though SVR-FA only incorporated one variable it was able yield better forecasts for the five selected companies. 6. RECOMMENDATIONS ARIMA, HMM, and SVR FA require complicated structures. For the HMM, continuous variations and new ways determine the number of hidden states may be considered for more accurate forecasts. For researchers that will consider discrete HMMs, they can consider trying different ways on how transform the data in a discrete variable. Researchers may try different optimization procedures for SVR and analyze their corresponding accuracy. All models and procedures specified in this paper may also be applied other sck prices and with longer time periods test whether the same reasonable results can be obtained. 7. REFERENCES Arora, S., & Singh, S. (2013). The Firefly Optimization Algorithm: Convergence Analysis and Parameter Selection. International Journal of Computer Applications IJCA, 69(3), 48-52.doi:10.5120/11826-7528 kernel-based learning methods. Cambridge: Cambridge University Press. Hassan, M., &Nath, B. (2005). Sck Market Forecasting using Hidden Markov model: A new Approach. 5th International Conference on Intelligent Systems Design and Applicationsc(ISDA'05). Retrieved March 30, 2016. Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., &Hussain, O. K. (2013). Support Vecr Rgression with Chaos-based Firefly Algorithm for Sck Market Price Forecasting. Applied Soft Computing, 13(2), 947-958. doi:10.1016/j.asoc.2012.09.024 Magadia, J. C. (2012, May 28-29). Risk Measures for the PSE Index Using Hidden Markov Models. Philippines Sck Market (PSEi) Forecast 2016-2020. (n.d.). Retrieved June 12, 2016, from http://www.tradingeconomics.com /philippines/sck-market/forecast Pankratz, A. (1983). Forecasting with Univariate Box-Jenkins Models (Concepts and Cases).New York: John Wiley & Sons. Yang, X. (2008). Nature-inspired Metaheuristic AlgorithmsFrame, U.K.: LuniverPress.