Foreign Exchange Rates Forecasting with a C-Ascending Least Squares Support Vector Regression Model

Size: px

Start display at page:

Download "Foreign Exchange Rates Forecasting with a C-Ascending Least Squares Support Vector Regression Model"

Dortha Cox
5 years ago
Views:

1 Foreign Exchange Rates Forecasting with a C-Ascending Least Squares Support Vector Regression Model Lean Yu, Xun Zhang, and Shouyang Wang Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing China {yulean,zhangxun,sywang}@amss.ac.cn Abstract. In this paper, a modified least squares support vector regression (LSSVR) model, called C-ascending least squares support vector regression (C-ALSSVR), is proposed for foreign exchange rates forecasting. The generic idea of the proposed C-ALSSVR model is based on the prior knowledge that different data points often provide different information for modeling and more weights should be given to those data points containing more information. The C-ALSSVR can be obtained by a simple modification of the regularization parameter in LSSVR, whereby more weights are given to the recent least squares errors than the distant least squares errors while keeping the regularized terms in its original form. For verification purpose, the performance of the C-ALSSVR model is evaluated using three typical foreign exchange rates. Experimental results obtained demonstrated that the C-ALSSVR model is very promising tool in foreign exchange rates forecasting. Keywords: Foreign exchange rates forecasting, least squares support vector regression, regularization parameter. 1 Introduction The foreign exchange market is a nonlinear dynamic market with high volatility and thus foreign exchange rate series are inherently noisy, non-stationary and deterministically chaotic [1]. Due to its irregularity, foreign exchange rates forecasting is regarded as a rather challenging task. For traditional linear-based forecasting methods such as autoregressive integrated moving average (ARIMA) and exponential smoothing model (ESM) [2], it is extremely difficult to capture the irregularity because they are unable to capture subtle nonlinear patterns hidden in the foreign exchange rate series data. To remedy the gap, many emerging nonlinear techniques, such as artificial neural networks (ANNs), were widely used in the foreign exchange rates forecasting and obtained good results relative to the traditional linear modeling techniques in the past decades. For example, De Matos [3] compared the strength of a multilayer feed-forward neural network (MLFNN) with that of a recurrent neural G. Allen et al. (Eds.): ICCS 2009, Part II, LNCS 5545, pp , c Springer-Verlag Berlin Heidelberg 2009

2 Foreign Exchange Rates Forecasting with a C-ALSSVR Model 607 network (RNN) based on the forecasting of Japanese yen futures. Kuan and Liu [4] provided a comparative evaluation of the performance of MLFNN and a RNN on the prediction of an array of commonly traded exchange rates. In the article of Tenti [5], the RNN is directly applied to exchange rates forecasting. Hsu et al. [6] developed a clustering neural network (CNN) model to predict the direction of movements in the USD/DEM exchange rates. Their experimental results suggested that their proposed model achieved better forecasting performance relative to other indicators. In a more recent study by Leung et al. [7], the forecasting accuracy of MLFNN was compared with the general regression neural network (GRNN). The study showed that the GRNN possessed a greater forecasting strength relative to MLFNN with respect to a variety of currency exchange rates. Similarly, Chen and Leung [8] adopted an error correction neural network (ECNN) model to predict foreign exchange rates. Yu et al. [9] proposed an adaptive smoothing neural network (ASNN) model by adaptively adjusting error signals to predict foreign exchange rates and obtained good performance. Although the ANN models achieve great success in foreign exchange rates forecasting, they still have some disadvantages in some practical applications. For example, ANN models often suffer from over-fitting problem in the case of when training was performed too long or where training examples are rare. Furthermore, local minimum problem often occurred due to the adoption of empirical risk minimization (ERM) principle in ANN learning. For this purpose, a competitive neural network learning model, called support vector machines (SVMs), was proposed by Vapnik and his colleagues in 1995 [10]. The SVM is a novel learning way to train polynomial neural networks based on the structural risk minimization (SRM) principle where seeks to minimize an upper bound of the generalization error rather than minimize the empirical error implemented in other neural networks. The generic idea is based on the fact that the generalization error is bounded by the sum of the empirical error and a confidence interval term that depends on the Vapnik-Chervonenkis (VC) dimension [11]. Using the SRM principle, the SVM will obtain a global optimum solution by adopting a suitable trade-off between the empirical error and the VC-confidence interval. Due to this distinct characteristic, the SVM has been widely applied to pattern classification and function approximation or regression estimation problems [12]. In terms of the classification and regression problems, the SVM can be categorized into support vector classification (SVC) and support vector regression (SVR). In this paper, we focus the SVR for foreign exchange rates forecasting. In SVR, the problem is formulated by employing the so-called Vapnik s ε-insensitive loss function, taking the regression problem as an inequality constrained convex programming problem, more specifically a quadratic programming (QP) problem and using the Mercer condition for mapping from nonlinear feature space to the chosen kernel function [13]. Usually, this QP can lead to higher computational cost. For this purpose, least squares support vector machine (LSSVM), as a variant of SVM, tries to avoid the above shortcoming and obtain an analytical solution directly from solving a set of linear equations

3 608 L. Yu, X. Zhang, and S. Wang instead of solving QP problem. In such a way, a least squares support vector regression (LSSVR) can be formulated by replacing Vapnik s ε-insensitive loss function with a least squares cost function corresponding to a form of ridge regression [13]. By introducing the least squares cost function, the LSSVM can be extended to solve nonlinear regression problems. In the practical time series forecasting, many studies have shown that the relationship between input variables and output variable gradually changed over the time and recent data often contain more information than distant data. It is therefore advisable for us to give more weights to the recent containing more information. In view of this idea, an innovative approach is proposed by Tay and Cao [11] which used the ascending regularization parameter in the SVR to predict financial time series based on the work of Refenes and Bentz [14]. The ascending SVR is obtained by a simple modification of the regularization risk function in SVR, whereby the recent ε-insensitive errors are penalized more heavily than the distant ε-insensitive errors. The ascending SVR is reported to be very effective in financial time series forecasting. This paper is motivated by the ascending SVR model, and generalizes the idea for LSSVR whereby more weights are given to the recent least squares errors than the distant lease squares errors in the regularization parameter. The primary objective of this paper is to propose a new forecasting paradigm called C-ascending LSSVR that can significantly reduce the computational cost and to improve the prediction capability of standard SVR as well as to examine whether the prior knowledge that recent data should provide more information than distant data can also be utilized by LSSVR in financial time series forecasting, especially for foreign exchange rates forecasting in this study. The remainder of this paper is organized as follows. Section 2 overviews the formulation of least square support vector regression (LSSVR) model briefly. In Section 3, the formulation of the C-ascending LSSVR (C-ALSSVR) is presented in detail. For further illustration, three typical foreign exchange rates prediction experiments are conducted in Section 4. Finally, some concluding remarks are drawninsection5. 2 Least Squares Support Vector Regression (LSSVR) Suppose that there is a give training set D of n data points {(x i,y i )} n i=1 with input data x i x R n and output y i y R. An important idea of SVM is to map the input into a high-dimensional feature space F via a nonlinear mapping function ϕ(x) to find an unknown function form f, which takes the following form y = f(x; w, b) =w T ϕ(x)+b (1) where ϕ( ) is a nonlinear mapping function, w i is the ith weights, and b is a bias. In the least squares support vector regression, the following optimization problem is formulated min J = 1 2 wt w + C 2 n i=1 e2 i (2)

4 Foreign Exchange Rates Forecasting with a C-ALSSVR Model 609 Subject to the following equality constraints y i = w T ϕ(x i )+b + e i, for i =1, 2,,n. (3) where C is the regularization parameter, e i is the ith approximation error between predicted and actual values. Combining (2) and (3), one can define a Lagrangian function L(w, b, e i ; α i )= 1 2 wt w + C 2 n i=1 e2 i n i=1 α i [ w T ϕ(x i )+b + e i y i ] (4) where α i is the ith Lagrangian multiplier. The optimal conditions are obtained by differentiating (4) L w =0 w = n α iϕ(x i ) i=1 L b =0 n α i =0 i=1 L (5) e =0 e i = α i /C L =0 w T ϕ(x i )+b + e i y i =0 α i for i =1, 2,,n. After elimination of e i and w, the solution is given by the following set of linear equations { n i,j=1 α iϕ(x i ) T ϕ(x j )+b +(α i /C) y i =0 n i=1 α (6) i =0 for i, j =1, 2,,n. Using the Mercer condition, the kernel function can be defined as K(x i,x j )=ϕ(x i ) T ϕ(x j )fori, j =1, 2,,n.Typical kernel functions include linear kernel K(x i,x j )=x T i x j, polynomial kernel K(x i,x j )= ( x T i x j +1 ) d, Gaussian kernel or RBF kernel K(x i,x j )=exp ( x i x j 2/ ) σ 2,andMLP kernel K(x i,x j )=tanh ( kx T i x j + θ ) where d, σ, k and θ are kernel parameters, which are specified by users beforehand. Accordingly, (6) can be rewritten as { Ωα + b1=y T (7) 1 α =0 where b is a scalar, Ω, α, y, and 1 are either matrix or vectors, which are defined, respectively, Ω = K(x i,x j )+(1/C)I, (8) α =(α 1,α 2,,α n ) T, (9) y =(y 1,y 2,,y n ) T, (10) 1=(1, 1,, 1) T (11)

5 610 L. Yu, X. Zhang, and S. Wang where I is a unit matrix in (8). Equivalently, using the matrix form, the linear equations of (6) can be expressed by [ ] [ ] [ ] Ω 1 α y T = (12) 1 0 b 0 From (8), the Ω is positive definite, the solution of Lagrangian multiplier α can be obtained from (12), i.e., α = Ω 1 (y b 1) (13) Substituting (13) into the second matrix equation in (12), we can obtain b = 1 T Ω 1 y 1 T Ω 1 1 (14) Here, since Ω is positive definite, Ω 1 is also positive definite and thus 1 T Ω 1 1 > 0. Thus, b is always obtained. Substituting (14) into (13), α can be easily obtained. Accordingly the solution of w can be obtained from the first equation in (5). Using w and b, the function f shownin(1)canbedetermined. The distinct advantages of LSSVR reflect the following two-fold. On the one hand, the optimal solution of (1) can be found by solving a set of linear equations instead of solving a quadratic programming (QP) problem which is used in standard SVR thus reducing the computational costs. On the other hand, relative to the standard SVR with the Vapnik s ε-insensitive loss function, there is no need to determine an additional ε accuracy parameter which is related to ε-insensitive loss function and reducing the chance of over-fitting. 3 C-Ascending Least Squares Support Vector Regression As is known to many researchers in the field of machine learning, the regularization parameter C shown in (2) determines the trade-off between the regularized term and the tolerable empirical errors. With the increase of C, the relative importance of empirical errors will grow relative to the regularized term, and vice versa. Usually, in standard SVR and LSSVR, the empirical risk function has equal weight C to all ε-insensitive loss function [11] and least squares errors e i (i =1,2,...,n) between the predicted and actual values [13]. That is, the regularization parameter C is a constant or a fixed value. However, many time series forecasting experiments (e.g., [11]) have shown that a fixed regularization parameter is unsuitable for some prediction tasks with some prior knowledge. Considering that different data might contain different information, more weights should be given to those data offering more information. In the case of LSSVR for financial time series forecasting, more weights should be given to the recent data taking the prior knowledge into account that recent data might offer more information than distant data. For this

6 Foreign Exchange Rates Forecasting with a C-ALSSVR Model 611 purpose, the regularization parameter C should be replaced by a variable regularization parameter C i to capture the variation of information containing in thetimeseriesdata{(x i,y i )} n i=1. In terms of the prior knowledge that recent data might offer more information than distant data, the variable regularization parameter C i should satisfy C i >C i 1 (i = 2,..., n). Since the variable regularization parameters C i will grow from the distant data points to the recent data points, C i is called ascending regularization parameter which will give more weights on the more recent data points. In the practical applications, the form of C i often depends on the prior knowledge we have. In the financial time series forecasting, two typical forms: linear form and exponential form [11] are often used. For the linear ascending form, the number of training data points are often used to determine the value of C i. A common linear form is shown as follows: i C i = n(n +1)/2 C = 2i n(n +1) C (15) where i is a time factor, n is the number of training data point, and C is a constant that needs tuning. Usually the more recent the training data point is, the larger the regularization parameter C i is, according to (15). A distinct advantage of linear form is simplicity and easy to implementation. When the size of training data set is small and low computational cost is expected, it is a good choice. For the exponential ascending form, another parameter r is introduced into the exponential function in order to control the rate of ascending, which is represented as i C i = 1+exp(r 2ri/n) C (16) Exponential form could offer more ways to describe the changing of the information containing in the training data. When the size of training data set is large and the computational cost is not concerned, the exponential form can be adopted. Besides the linear and exponential forms, other forms that can control the ascending rates can also be employed in the practical application. Based on the ascending regularization parameter C i, a new LSSVR called C- ascending LSSVR (C-ALSSVR) can be introduced. Similar to (2) and (3), the optimization problem of C-ALSSVR for time series prediction can be formulated as follows. { min J = 1 2 wt w + Ci n 2 i=1 e2 i s.t. y i = w T (17) ϕ(x i )+b + e i,fori =1, 2,,n. Using the Lagrangian theorem, the final solution is similar to (13) and (14). The only difference is the value of Ω due to the introduction of the variable regularization parameter C i.inthecaseofc-alssvr, the value of Ω is calculated by Ω = K(x i,x j )+(1/C i )I (18) According to (17) and (18), the LSSVR algorithm can still be used except the regularization parameter value C i for every training data points is different, and

7 612 L. Yu, X. Zhang, and S. Wang thus computation and simulation procedures of LSSVR should be utilized by a simple modification of regularization parameter from a fixed value C to a variable parameter C i in terms of (17). 4 Experimental Results In this section, three real-world foreign exchange rates are used to test the effectiveness of the proposed C-ascending LSSVR model. The data used here are monthly and are obtained from Pacific Exchange Rates Services, provided by Professor Werner Antweiler, University of British Columbia, Vancouver, Canada. They consist of the US dollar against each of the three currencies British pounds (GBP), euros (EUR) and Japanese yen (JPY) studied in this paper. We take monthly data from January 1971 to December 2000 as in-sample (training periods) data sets (360 observations including 60 samples for cross-validations). We also take the data from January 2001 to November 2008 as out-of-sample (testing periods) data sets (95 observations), which is used to evaluate the good or bad performance of prediction based on some evaluation measurement. For evaluation, two typical indicators, normalized mean squared error (NMSE)[1] and directional statistics (D stat ) [1] are used. Given N pairs of the actual values (or targets, x t ) and predicted values (ˆx t ), the NMSE which normalizes the MSE by dividing it through the variance of respective series can be defined as NMSE = N t=1 (x t ˆx t ) 2 N t=1 (x t x t ) 2 = 1 1 N δ 2 N (x t ˆx t ) 2 (19) t=1 where δ 2 is the estimated variance of the data and x t the mean. Usually the NMSE is only a level prediction evaluation criterion, which is not enough for financial forecasting. For this, the directional statistics (D stat ) is developed, which is defined by D stat = 1 N a t 100% (20) N t=1 where a t =1 if (x t+1 x t )(ˆx t+1 x t ) 0, and a t =0 otherwise. In addition, for comparison purpose, a reverse model relative to C-ALSSVR, called the C-descending LSSVR (C-DLSSVR), standard LSSVR without using the most recent 60 data points (LSSVR-60), and standard LSSVR (LSSVR) are used here. If the prediction performance of the C-DLSSVR and LSSVR-60 is worse than that of LSSVR, the prior knowledge that the recent training data points can offer more information for modeling than the distant training data points will be confirmed clearer. In the experiment of C-ALSSVR, the Gaussian function is used as the kernel function and two ascending forms are examined. The regularization parameter C and kernel parameter σ are determined by the cross validation method. In particular, the validation set is used to choose the best combination of C, σ, and the optimal control rate r in the exponential ascending form. These values could

8 Foreign Exchange Rates Forecasting with a C-ALSSVR Model 613 vary in different foreign exchange rates due to different characteristics of foreign exchange rates. The LS-SVMlab1.5 for solving the regression problem [15] is implemented in this experiment and the program is developed using Matlab language. For the LSSVR, the LSSVR-60, and the C-descending LSSVR which will put more weights on the more distant training data points, the same settings with C-ALSSVR are adopted. Using the above experimental design, the corresponding computational results are reported in Tables 1 and 2 from the point of level prediction and direction prediction. In the two tables, a clear comparison of various methods for the three currencies is presented via NMSE and D stat. Generally speaking, the results obtained from the two tables also indicate that the prediction performance of the proposed C-ascending LSSVR model is better than those of the standard LSSVR, LSSVR-60 and C-DLSSVR models for the three main currencies. Table 1. The NMSE comparisons of different models for three foreign exchange rates Models GBP EUR JPY NMSE Rank NMSE Rank NMSE Rank LSSVR LSSVR C-DLSSVR Lin C-DLSSVR Exp C-ALSSVR Lin C-ALSSVR Exp Table 2. The D stat comparisons of different models for three foreign exchange rates Models GBP EUR JPY D stat(%) Rank D stat(%) Rank D stat(%) Rank LSSVR LSSVR C-DLSSVR Lin C-DLSSVR Exp C-ALSSVR Lin C-ALSSVR Exp Focusing on the NMSE indicator, our proposed C-ascending LSSVR model with exponential ascending form performs the best in all the cases, followed by the C-ascending LSSVR model with linear ascending form, standard LSSVR, LSSVR-60 model, and the C-descending LSSVR model is the worst. This indicates that the proposed C-ascending LSSVR model is more suitable for foreign exchange rates prediction than the standard LSSVR and C-descending LSSVR models. Interestingly, for the testing case of JPY, the NMSEs of the all models are larger than those of the other two currencies. This might be because the JPY is more volatile than the other two currencies.

9 614 L. Yu, X. Zhang, and S. Wang However, the low NMSE does not necessarily mean that there is a high hit ratio for foreign exchange movement direction prediction. Thus the D stat comparison is necessary for business practitioners. Focusing on D stat of Table 2, we are not hard to find that the proposed C-ascending LSSVR model outperforms the other three models according to the ranking; furthermore, from the business practitioners point of view, D stat is more important than NMSE because the former is an important decision criterion in foreign exchange trading. With reference to Table 2, the differences between the different models are very significant. For instance, for the EUR testing case, the D stat for the C-DLSSVR model with exponential form is only 55.79%, for the LSSVR-60 method it is 66.32%, and the D stat for the standard LSSVR model is 73.68%; while for the C-ALSSVR model with exponential form, D stat reaches 84.21%. Furthermore, like NMSE indicator, the proposed C-ALSSVR model with exponential form also performs the best in all the cases, followed by the C-ALSSVR model with linear form and standard LSSVR model, and the C-DLSSVR model with exponential form performs the worst. The main reasons are that in the foreign exchange rates forecasting the recent training data points are more important than the distant training data points. That is, the recent data could offer more information for modeling. By incorporating this prior knowledge into the LSSVR methods, the C-ALSSVR models are more effective in foreign exchange rates forecasting than the standard LSSVR models. 5 Concluding Remarks In this study, a new least squares support vector regression (LSSVR) model, called C-ascending LSSVR (C-ALSSVR) model, is proposed for foreign exchange rates prediction. In terms of the empirical results, we can find that across different models for the test cases of three main currencies British pounds (GBP), euros (EUR) and Japanese yen (JPY) on the basis of different evaluation criteria, our C-ascending LSSVR models perform the best. In the presented forecasting cases, the NMSE is the lowest and the D stat is the highest, indicating that the proposed C-ascending LSSVR models can be used as a promising tool for foreign exchange rates prediction. In addition, an interesting research direction is to explore more complex ascending functions which can closely follow the dynamics of foreign exchange rates series. Furthermore, the proposed C-ascending LSSVR models can be easily extended into other financial time series forecasting problems. We will look into these issues in the future research. Acknowledgements This work is partially supported by grants from the National Natural Science Foundation of China (NSFC No , ) and the Knowledge Innovation Program of the Chinese Academy of Sciences.

10 References Foreign Exchange Rates Forecasting with a C-ALSSVR Model Yu, L., Wang, S.Y., Lai, K.K.: Foreign-Exchange-Rate Forecasting with Artificial Neural Networks. Springer, New York (2007) 2. Lai, K.K., Yu, L., Wang, S.Y., Huang, W.: Hybridizing Exponential Smoothing and Neural Network for Financial Time Series Predication. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS LNCS, vol. 3994, pp Springer, Heidelberg (2006) 3. De Matos, G.: Neural Networks for Forecasting Exchange Rate. M. Sc. Thesis. The University of Manitoba, Canada (1994) 4. Kuan, C.M., Liu, T.: Forecasting Exchange Rates Using Feedforward and Recurrent Neural Networks. Journal of Applied Econometrics 10, (1995) 5. Tenti, P.: Forecasting Foreign Exchange Rates Using Recurrent Neural Networks. Applied Artificial Intelligence 10, (1996) 6. Hsu, W., Hsu, L.S., Tenorio, M.F.: A Neural Network Procedure for Selecting Predictive Indicators in Currency Trading. In: Refenes, A.N. (ed.) Neural Networks in the Capital Markets, pp John Wiley and Sons, New York (1995) 7. Leung, M.T., Chen, A.S., Daouk, H.: Forecasting Exchange Rates Using General Regression Neural Networks. Computers & Operations Research 27, (2000) 8. Chen, A.S., Leung, M.T.: Regression Neural Network for Error Correction in Foreign Exchange Rate Forecasting and Trading. Computers & Operations Research 31, (2004) 9. Yu, L., Wang, S.Y., Lai, K.K.: Adaptive Smoothing Neural Networks in Foreign Exchange Rate Forecasting. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS LNCS, vol. 3516, pp Springer, Heidelberg (2005) 10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 11. Tay, F.E.H., Cao, L.J.: Modified Support Vector Machines in Financial Time Series Forecasting. Neurocomputing 48, (2005) 12. Scholkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods Support Vector Learning. MIT Press, Cambridge (1998) 13. Suykens, J.A.K., Lukas, L., Vandewalle, J.: Sparse Approximation Using Least Squares Support Vector Machines. In: Proceedings of The 2000 IEEE International Symposium on Circuits and Systems, vol. 2, pp IEEE Press, New York (2000) 14. Refenes, A.N., Bentz, Y., Bunn, D.W., Burgess, A.N., Zapranis, A.D.: Financial Time Series Modeling with Discounted Least Squares Back-Propagation. Neurocomputing 14, (1997) 15. Pelckmans, K., Suykens, J.A.K., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., De Moor, B., Vandewalle, J.: LS-SVMlab: a Matlab/C Toolbox for Least Squares Support Vector Machines. Internal Report 02-44, ESAT-SISTA, K.U. Leuven, Leuven, Belgium (2002)

Selection of the Appropriate Lag Structure of Foreign Exchange Rates Forecasting Based on Autocorrelation Coefficient

Selection of the Appropriate Lag Structure of Foreign Exchange Rates Forecasting Based on Autocorrelation Coefficient Wei Huang 1,2, Shouyang Wang 2, Hui Zhang 3,4, and Renbin Xiao 1 1 School of Management,