Long-Term Prediction of Time Series by combining Direct and MIMO Strategies

Size: px
Start display at page:

Download "Long-Term Prediction of Time Series by combining Direct and MIMO Strategies"

Transcription

1 Long-Term Prediction of Series by combining Direct and MIMO Strategies Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa and Amaury Lendasse Abstract Reliable and accurate prediction of time series over large future horizons has become the new frontier of the forecasting discipline. Current approaches to long-term time series forecasting rely either on iterated predictors, direct predictors or, more recently, on the Multi-Input Multi- Output (MIMO) predictors. The iterated approach suffers from the accumulation of errors, the Direct strategy makes a conditional independence assumption, which does not necessarily preserve the stochastic properties of the time series, while the MIMO technique is limited by the reduced flexibility of the predictor. The paper compares the Direct and MIMO strategy and discusses their respective limitations to the problem of longterm time series prediction. It also proposes a new methodology that is a sort of intermediate way between the Direct and the MIMO technique. The paper presents the results obtained with the ESTSP 2007 competition dataset. I. INTRODUCTION The prediction of the future behavior of an observed time series over a long horizon H is still an open problem in forecasting [1]. Currently, the most common approaches to long-term forecasting rely either on iterated [2] or direct prediction techniques [3]. Given a time series {ϕ 1,..., ϕ t }, in the first case, an H-step-ahead prediction problem is tackled by iterating, H times, a one-step-ahead predictor. Once the predictor has estimated the future series value, this value is fed back as an input to the following prediction. Hence, the predictor takes as inputs estimated values, instead of actual observations with evident negative consequences in terms of error propagation. Examples of iterated approaches are recurrent neural networks [4] or local learning iterated techniques [5], [6]. Another way to perform H-step-ahead forecasting consists in estimating a set of H prediction models, each returning a direct forecast of ϕ t+h with h {1,..., H} [3]. Direct methods often require higher functional complexity than iterated ones in order to model the stochastic dependency between two series values at two distant instants. In computational intelligence literature, we have also examples of research works, where the two approaches have been successfully combined [7]. In spite of their diversity, iterated and direct techniques for multi-step-ahead prediction share a common feature, in the sense that they model from data, a multi-input single-output Gianluca Bontempi and Souhaib Ben Taieb are with the Machine Learning Group, Computer Science Department, Faculty of Sciences, Université Libre de Bruxelles ( {gbonte, sbentaie}@ulb.ac.be). Antti Sorjamaa and Amaury Lendasse are with the Department of Information and Computer Science, Helsinki University of Technology, Finland ( {lendasse, Antti.Sorjamaa}@hut.fi). mapping, whose output is the variable ϕ t+1 in the iterated case and the variable ϕ t+h in the direct case. In [8], the author proposed a MIMO approach for longterm time series prediction, where the predicted value is no more a scalar quantity but a vector of future values {ϕ t+1,..., ϕ t+h } of the time series ϕ. This approach replaces the H models of the Direct approach with one multioutput model, which aims to preserve, between the predicted values, the stochastic dependency characterizing the time series. This paper goes one step further in that direction by proposing a new methodology for long-term prediction, aiming to preserve the most appealing aspects of both the Direct and MIMO approaches. The Direct and the MIMO approach can be indeed seen as two distinct instances of the same prediction approach, which decomposes long-term prediction into multi-output tasks. In the direct case, the number of prediction tasks is equal to the size of the horizon H and the size of the outputs is 1. In the MIMO case, the number of prediction tasks is equal to one and the size of the output is H. Intermediate configurations can be imagined by transforming the original task into n = H s prediction tasks, each with multiple outputs of size s, where s {1,..., H}. This approach, called Multi-Input Several Multi- Output (MISMO), trades off the property of preserving the stochastic dependency between future values with a greater flexibility of the predictor. For instance, the fact of having n > 1 allows the selection of different inputs for different horizons. In other terms, s addresses the bias/variance tradeoff of the predictor by constraining the degree of dependency between predictions (null in the case s = 1 and maximal for s = H). This paper introduces and assesses the MISMO methodology by implementing each prediction model with a Lazy Learning model [9], [10]. Lazy Learning (LL) is a local modeling technique, which is query-based in the sense that the whole learning procedure (i.e. structural and parametric identification) is deferred until a prediction is required. [11], [12] presented a LL algorithm that selects on a queryby-query basis and by means of a local cross-validation scheme the optimal number of neighbors. Iterated versions of Lazy Learning were successfully applied to multi-step-ahead time series prediction [13], [14]. A LL-MIMO version was presented in [8]. This paper presents a LL method for the MISMO prediction of multiple and dependent outputs in the context of long-term time series prediction. Section II introduces the new MISMO strategy with respect to existing approaches to long-term prediction. Sec-

2 tion III details the MISMO methodology. First, the Lazy Learning model and the validation procedure is explained. Next, the way to choose the parameter s of the MISMO model is presented. Then, the input selection procedure for the MISMO model is explained. Section IV presents the MISMO prediction results obtained with a series proposed by the ESTSP 2007 competition [15], [16]. II. LONG-TERM PREDICTION TECHNIQUES This section first introduces the Direct and the MIMO strategies and then discusses how the original MISMO approach situates with respect to the state of the art. Let us consider a regular and univariate time series {ϕ 1, ϕ 2,..., ϕ t } for which we intend to predict the continuation for the next H steps. Suppose that we can embed the time series into an input-output format and that an input selection strategy is adopted. Suppose also that the maximum embedding order is d and that the embedding order after the input selection is m (m d). A. Direct strategy The Direct strategy first embeds the original series into H datasets with D 1 = {(x i1, y i1 ) (R m R)} N i=1, (1). D H = {(x ih, y ih ) (R m R)} N i=1. (2) x ih {ϕ i,..., ϕ i+d 1 }, (3) y ih = {ϕ i+d 1+h }, h {1,..., H}. (4) and where x h stands for the subset of inputs returned by the input selection procedure Once the time series is embedded, the Direct prediction strategy learns H direct models f h ( ) with y ih = f h (x ih ) + w ih, (5) where w ih denotes the additive noise. An attractive property of this method is that it is not prone to the accumulation of the prediction errors [17]. However, the conditional independence of the H trained models does not allow the technique to keep into consideration complex dependency patterns existing between the variables ϕ t+h [8]. B. MIMO Strategy In order to remove the conditional independence assumption which is implicit in the Direct approach, a MIMO strategy was introduced in [8]. Unlike the Direct strategy, MIMO considers a single dataset D = {(x i, y i ) (R m R H )} N i=1, (6) where x i {ϕ i,..., ϕ i+d 1 }, (7) y i = {ϕ i+d,..., ϕ i+d+h 1 }. (8) The MIMO strategy estimates a single multi-output model f( ) with {y i } = f(x i ) + w i, (9) which returns a vector of H predictions. The MIMO strategy constrains all the horizons to be predicted with the same model structure, for instance with the same set of inputs x. This constraint greatly reduces the flexibility of the prediction approach and can excessively bias the returned model. This is not the case of the Direct strategy, where each model is allowed to use a different set of input variables. C. MISMO strategy A solution to the shortcomings of the previously discussed techniques comes from the adoption of an intermediate approach, where the constraint of MIMO is relaxed by tuning an integer parameter s, which calibrates the dimensionality of the output on the basis of a validation criterion. For a given s, the training set of the MISMO techniques is composed of n = H s portions where D 1 = {(x i1, y i1 ) (R m R s )} N i=1, (10). D n = {(x in, y in ) (R m R s )} N i=1. (11) x ip {ϕ i,..., ϕ i+d 1 }, (12) y ip = {ϕ i+d+(p 1)s,..., ϕ i+d+ps 1 }. (13) and p {1,..., n}. The MISMO technique trains n = H s models f p( ) with {y ip } = f p (x ip ) + w ip, (14) where p {1,..., n}. We can see that for s = 1, the approach boils down to the direct approach, while for s = H we have the MIMO predictor. For instance, given an 100-steps-ahead prediction task, the dataset of the MISMO strategy with s = 50 is composed of 2 parts D 1 = {(x i1, y i1 ) (R m R 50 )} N i=1, (15) D 2 = {(x i2, y i2 ) (R m R 50 )} N i=1. (16) and the MISMO strategy estimates 2 models, f 1 and f 2 with {y i1 } = f 1 (x i1 ) + w i1, (17) {y i2 } = f 2 (x i2 ) + w i2. (18) Note that, throughout the paper, we assume that the value of the parameter s is a divisor of H. If it is not the case, it is sufficient to increase the horizon H of the quantity [s (H mod s)] where mod stands for the modulo operation.

3 III. GLOBAL METHODOLOGY The general principle underlying the MISMO strategy can lead to distinct implementations according to the learner used to estimate the dependencies f p and the procedure adopted to tune the value of the parameter s. In the following, we will detail the choices made in this paper in order to design a long-term predictor to be used in real forecasting tasks. A. Learning and validation procedure The estimation of the functions f p in (14) relies on a nearest neighbor approach, where the problem of adjusting the size of the neighborhood is solved by a Lazy Learning strategy [9], [10]. Lazy Learning algorithms are query-based local learning algorithms, i.e. they defer the whole learning process until a specific query needs to be answered, and once the prediction is returned, they discard both the answer and the constructed model [18]. In this work, we use a multi-output extension of the Lazy Learning model called LL-MIMO previously discussed in [8]. Once we have embedded the time series, the forecasting problem boils down to n = H s multi-input multi-output supervised learning tasks D 1 = {(x i1, y i1 ) (R m R s )} N i=1, (19). D n = {(x in, y in ) (R m R s )} N i=1. (20) where H is the horizon of the prediction and s is the size of the output vectors y ip (see equation 11). Given a query point x q R m and a metric on the space R m, we estimate the n prediction vectors y qp of x q, for each dataset D p as follows : Given k, the number of neighbors Order increasingly the set of vectors {x ip } N i=1 with respect to the distance to x q Denote by [j] the index of the jth closest neighbor of x q ŷ qp = 1 k k j=1 y [j]p where y [j]p is the output of the jth closest neighbor of x q in D p After the calculation of ŷ qp for each p {1,..., n}, the long term prediction is given by the concatenated vector (ŷ q1,..., ŷ qn ). The adoption of local approach to solve a prediction task requires the choice of a set of model parameters (e.g. the number k of neighbors, the kernel function, the distance metric). In the following, we will present two criteria to assess and compare local models with different number of neighbors. The first criterion is a multi-ouput extension of the local Leave-One-Out (LOO) and the second is a measure of discrepancy between the training series and the forecasted sequence, which rely either on linear or nonlinear measures [19]. A computationally efficient way to perform LOO crossvalidation and to assess the performance in generalization of local linear models is the PRESS statistic, proposed in 1974 by Allen [20]. By assessing the performance of each local model, alternative configurations can be tested and compared in order to select the best one in terms of expected prediction. Let D denote a multi-input single-output dataset D = {(x i, y i ) (R m R)} N i=1 (21) and suppose we want to estimate the output for a query point x q R m. The idea consists of associating a LOO error e LOO (k) to the estimation ŷ k = 1 k k y [j] (22) j=1 returned by k neighbors. In case of constant model, the LOO term can be derived as follows [18]: e j (k) = y [j] e LOO (k) = 1 k k (e j (k)) 2, (23) j=1 k i=1(i j) y [i] k 1 = k y [j] ŷ k k 1. (24) The optimal number of neighbors is then defined as the number k = arg min e LOO (k), (25) which minimizes the LOO error. Now, in a multi-input multi-output setting with an output of size s, where D = {(x i, y i ) (R m R s )} N i=1, (26) we can define as multi-step LOO error the quantity E LOO (k) = 1 s (e h s LOO(k)) 2, (27) h=1 where e h LOO (k) is the LOO error for the horizon h in (23). In the following, LL-MIMO-CV will refer to a Lazy Learner which selects the number of neighbors according to the multiple LOO assessment. Note that, in a MISMO setting, there are several datasets D p and then, an optimal number of neighbors k p has to be found for each dataset D p. The second criterion we propose, is not based on crossvalidation but rather on a measure of stochastic discrepancy between the forecasted sequence and the training time series. The rationale is that, the lower the discrepancy between the descriptors of the prediction and the descriptors of the training series, the better is the quality of the returned forecast [8]. Suppose, for instance, that we have a training time series, which behaves like a sine wave and that we want to predict H = 20 steps ahead. Suppose, that we have to choose betwen two alternative models (e.g. two LL models with different number of neighbors) whose predictions (in bold on the Figures 1(a) and 1(b)) have not significantly different LOO errors. The question is which one is the

4 Series (a) Training series (normal line) and the forecasted sequence (bold line) Series (b) Training series (normal line) and the forecasted sequence (bold line) Fig. 1. Training series and forecasted sequence of two models, which have not significantly different LOO errors. best, knowing that they do not have significantly different LOO errors?. Intuitively, we would choose for the model in Figure 1(a) since its predicted continuation has a look more similar to the training time series. Now, we have to define mathematically the term look. We can define several measures of discrepancy, both linear and non-linear. For example, we can use the autocorrelation or the partial autocorrelation for the linear case and the likelihood for the non-linear case. In this work, we will restrict to consider linear measures. Let ϕ denote the training time series and ŷ k the estimation returned by k neighbors. The linear discrepancy measure k is k = 1 cor[ρ(ϕ ŷ k ), ρ(ϕ)] + 1 cor[π(ϕ ŷ k ), π(ϕ)], where ϕ ŷ k is the concatenation of the training time series ϕ and the forecasted sequence ŷ k, ρ(x) is the autocorrelation of the series x, π(x) is the partial autocorrelation of the series x and cor[x, y] is the correlation of the series x and the series y. So, we can associate to the long-term forecasting ŷ k, a measure of quality, which is not based on cross-validation but rather on the preservation of the stochastic properties of the series. The corresponding selection criterion is then k = arg min k, (28) which aims to find the number of neighbors k for which the predicted sequence is the closest, in terms of stochastic properties, to the training series. In the following, LL- MIMO-D will refer to a LL predictor where the neighbor selection relies on such criterion. B. Choice of the parameter s In the MISMO model (14) the parameter s addresses the bias/variance trade-off by constraining the degree of dependency between the predictions. The value of this parameter s is expected to play a major role in the accuracy of the method. A possible way to choose the value of this parameter, is to analyze the performance of the MISMO model for different values of s on the learning set, and then use the best value obtained so far, to estimate the outputs. The following algorithm illustrates a cross-validated strategy to choose a good value of s. Algorithm 1: Selection of the parameter s Input : ϕ = {ϕ 1,..., ϕ t }, the time series Input : m = Embedding order Input : H = Horizon Input : K= Max range of number of neighbors Output: s, best value of the parameter s 1 2 for s in {1,..., H} do n = H s D (s) 1 = {(x i1, y i1 ) (R m R s )} N i=1. D n (s) = {(x in, y in ) (R m R s )} N i=1 for p in {1,..., n} do D p (s) = {(x ip, y ip ) (R d R s )} N i=1 = vector of size K E (p) learning for nn in {2,..., K} do E (p) learning [nn] Learn model on D p (s) with range {2,..., nn} end end E learning (s) = vector of size K for nn in {2,..., K} do n p=1 E(p) learning [nn] E learning (s)[nn] 1 n end Error(s) 1 K K nn=2 E learning(s)[nn] end s = arg min Error(s) return s The first loop (line 1) ranges over all the values of the parameter s. Given a value of the parameter s, the output y is divided into n = H s portions (y 1,..., y n ). The second loop (line 2) processes each portions separately by measuring the LOO error for a number nn {2,..., K} of neighbors. For a given s, we obtain as a result a vector E (p) learning of size K for each ouput y p, p {1,..., n}. Now, we average over p to obtain a vector E learning (s) of length K whose k term represents the LOO performance of the MISMO model with the parameter s and k neighbors. Several methods could be used to derive a global (i.e. independent of the number of neighbors) estimate Error(s) of the MISMO performance for a given s. For instance, we

5 could take the minimum or the mean value of the vector E learning (s). In this work, we adopt an averaging approach and we estimate the performance of the MISMO model by taking the mean value of the vector E learning (s). C. Input Selection Procedure The flexibility of the MISMO approach allows a different input selection for each of the n prediction tasks. Input selection [21],[22] consists of choosing a subset of (x 1, x 2,..., x d ), that has the maximal predictive power (x i is the ith variable of the vector x). Each input selection procedure relies on two elements: a relevance criterion and a search procedure [23]. The relevance criterion is a statistical measure of the relevance of the variable selected. Several relevance criteria like Mutual information, Gamma test,etc [23] have been proposed in litterature. In this work we adopted as relevance criterion the 2-fold cross-validation of a 1-NN approximator [24]. Since in the MIMO technique the size of the output y can be greater than one, the relevance criterion is adapted for multiple outputs by taking an average over the prediction horizon H, as shown in the following pseudo-code. Calculation of the relevance criterion Given a dataset D = {(x i, y i ) (R d R s )} N i=1 Given a set of variables V of size m (m d), with V {x 1,..., x d } Given a metric on the space R m Divide the N input-outputs pairs in two parts D1 and D2 For each point (x i, y i ) in D1 Find the nearest neighbor, say x i, of x i in D2 according to the metric and the set V. Calculate err xi = 1 s s j=1 (y ij yij )2 where y ij is the jth component of y i and yij is the jth component of yi For each point (x i, y i ) in D2 Find the nearest neighbor, say x i, of x i in D1 according to the metric and the set V. Calculate err xi = 1 s s j=1 (y ij yij )2 where y ij is the jth component of y i and yij is the jth component of yi Calculate C(V ) = 1 N N i=1 err x i which is the statistical measure of the relevance of the set of variables V. As far as the search is concerned we adopted a Forward- Backward Search procedure(fbs) [21],[22] which is a combination of conventional Forward and Backward search. FBS is flexible in the sense that a variable is able to return to the selected set once it has been dropped and vice versa, a previously selected variable can be discarded later. This method can start from any initial input variable set: empty set, full set, custom set or randomly initialized set. In our experiments, we use the Forward-Backward method with four sets of initial variables, the first one is the empty set, and the three others are randomly initialized sets. After calculating the value of the criterion for these four sets, the final set of variables is the one which minimizes the relevance criterion. We are not starting the FBS from the full set, because our experiments have shown that it leads to far more selected variables and less accurate results. This is due to the local optima problem of the search. IV. EXPERIMENTS In order to assess the performance of the MISMO strategy, we tested it on the ESTSP 2007 competition dataset [15], [16]. The training series, composed of 875 values, is shown in Figure Fig. 2. ESTSP 2007 Competition dataset. We set the maximum embedding order d to 55 [15], the maximum number of neighbors in LL to K = 160 and we consider prediction tasks with H = 100. For time reasons, the selection of the s value was restricted to the five values of the set s = {1, 5, 10, 50, 100}. Note that the value s = 1 corresponds to the Direct strategy, s = 100 to the MIMO strategy and the values s = {5, 10, 50} to intermediate MISMO strategies. Figure 3 plots the vector of the LOO errors obtained for the training series as a function of the number of neighbors for the five values of the parameter s. This figure suggests that the MISMO model with s = 50 is on average better than the others. The following figures compare the predicted continuation of the 5 MISMO models (both for LL-MIMO-CV and LL- MIMO-D) to the real continuation of the series. Figure 4 shows the prediction returned by the MISMO with s = 1, i.e. by the Direct strategy. The Mean Square Error (MSE) is 1.68 for the LL-MIMO-CV model. Note

6 Leave one out error s=1 s=5 s=10 s=50 s=100 Fig. 5. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 5. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model The s = 10 case is illustrated by Figure 6. The accuracy is worse than in the s = 5 case. The MSE is 1.38 for the LL-MIMO-CV model and 1.05 for the LL-MIMO-D model. Number of neighbors Fig. 3. Performance of MISMO on the 10-fold cross validation with s = 1 (solid line), s = 5 (dashed line), s = 10 (dotted line), s = 50 (dotdashed line) and s = 100 (large dashed line). that the LL-MIMO-D prediction is not present here since the too small size of the output prevents the autocorrelation computation. Fig. 6. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 10. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. As suggested by the results in cross-validation, the best performance is attained for s = 50 (Figure 7). The MSE drops to 0.82 for the LL-MIMO-CV model and to 0.67 for the LL-MIMO-D model. Fig. 4. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 1, which corresponds to the Direct strategy. Solid thick line represents the real value and the solid thin line is the prediction with the LL-MIMO-CV model. The MIMO case (s = 100) is ilustrated by Figure 8. The MSE is 1.34 for the LL-MIMO-CV model and 0.95 for the LL-MIMO-D model. Figure 5 shows the s = 5 predictions. The Mean Square Error (MSE) amounts to 1.24 for the LL-MIMO-CV model and to 1.57 for the LL-MIMO-D model. In Table I, the errors for the learning and the test set are summarized for the different values of the parameter s. The results of Table I and the related figures show clearly the impact of the parameter s on the accuracy of the prediction and justify the adoption of a MISMO strategy. The value of s controls effectively the trade-off between bias and

7 Fig. 7. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 50. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. for an horizon H = 50 is Note that this accuracy is better than the one obtained by the winner of the ESTSP 2007 competition (MSE= 0.506) [15], [16]. V. CONCLUSIONS Predictive modelling for single output tasks is known to require careful procedures of assessment and selection. The extension of this procedure to long-term forecasting and consequently to multi-output modelling requires the tuning of additional parameters and the definition of specific calibration procedures. This paper discussed how the size of the multiple outputs play a major role in the generalization accuracy of a long-term forecasting predictor. We showed that Direct and MIMO strategies implicitly constrain the size of the output target with consequent impact on the bias/variance tradeoff and the consequent accuracy. We proposed a new strategy called MISMO and the associated calibration procedure which aims to decompose a longterm prediction task in the optimal number of subtasks. Preliminary results on the ESTSP 2007 competition show that the approach is promising. Fig. 8. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 100. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. s = 1 s = 5 s = 10 s = 50 s = 100 E learning E test LL-MIMO-CV E test LL-MIMO-D TABLE I LOO ERROR OF THE LEARNING AND THE TEST SET, FOR DIFFERENT VALUE OF THE PARAMETER s. variance as made evident also by the nature of the predicted profile (strongly variant in the s = 1 case and progressively smoother for higher values of s). A careful choice of s should be then recommended in case of long-term forecasting. The experimental results confirm also the important role of non cross-validated criteria in long term prediction tasks. Indeed it appears that LL-MIMO-D is competitive and sometimes better than the conventional cross-validated selection criterion. Once, more than a single prediction is required, criteria which resume the global behavior of the prediction become attractive. Finally, it is worth adding that the MSE of the prediction returned by MISMO (s = 50) with the LL-MIMO-D model

8 REFERENCES [1] A. Weigend and N. Gershenfeld, Series Prediction: forecasting the future and understanding the past. Harlow, UK: Addison Wesley, [2] Y. Ji, J. Hao, N. Reyhani, and A. Lendasse, Direct and recursive prediction of time series using mutual information selection, in Computational Intelligence and Bioinspired Systems: 8th International Workshop on Artificial Neural Networks, IWANN 05, Vilanova i la Geltra, Barcelona, Spain, ser. Lecture Notes in Computer Science, vol Springer-Verlag GmbH, June , pp [3] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, and A. Lendasse, Methodology for long-term prediction of time series, Neurocomputing, [4] R. Williams and D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Computation, vol. 1, pp , [5] J. D. Farmer and J. J. Sidorowich, Predicting chaotic time series. Physical Review Letters, vol. 8, no. 59, pp , [6] J. McNames, A nearest trajectory strategy for time series prediction, in Proceedings of the InternationalWorkshop on Advanced Black-Box Techniques for Nonlinear Modeling. Belgium: K.U. Leuven, 1998, pp [7] T. Sauer, series prediction by using delay coordinate embedding, in Series Prediction: forecasting the future and understanding the past, A. S. Weigend and N. A. Gershenfeld, Eds. Harlow, UK: Addison Wesley, 1994, pp [8] G. Bontempi, Long term time series prediction with multi-input multioutput local learning. in Proceedings of the 2nd European Symposium on Series Prediction (TSP), ESTSP08, [9] B. Birattari and M. Bersini, Lazy learning for local modeling and control design, [Online]. Available: [10] M. Birattari, G. Bontempi, and H. Bersini, Lazy learning meets the recursive least-squares algorithm, in Advances in Neural Information Processing Systems 11. MIT Press, 1999, pp [11] G. Bontempi, M. Birattari, and H. Bersini, Lazy learning for modeling and control design, International Journal of Control, vol. 72, no. 7/8, pp , [12] M. Birattari, G. Bontempi, and H. Bersini, Lazy learning meets the recursive least-squares algorithm, in NIPS 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds. Cambridge: MIT Press, 1999, pp [13] G. Bontempi, M. Birattari, and H. Bersini, Lazy learning for iterated time series prediction, in Proceedings of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling, J. A. K. Suykens and J. Vandewalle, Eds. Katholieke Universiteit Leuven, Belgium, 1998, pp [14] G. Bontempi, M. Birattari, and H. Bersin, Local learning for iterated time-series prediction, in Machine Learning: Proceedings of the Sixteenth International Conference, I. Bratko and S. Dzeroski, Eds. San Francisco, CA: Morgan Kaufmann Publishers, 1999, pp [15] A. Lendasse, Ed., ESTSP 2007: Proceedings, [16] A. Lendasse. (2007) ESTSP 07 homepage. [Online]. Available: [17] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, and A. Lendasse, Methodology for long-term prediction of time series, Neurocomput., vol. 70, no , pp , [18] G. Bontempi, Local learning techniques for modeling, prediction and control, Ph.D., IRIDIA-Universit Libre de Bruxelles, Louvainla-Neuve, BELGIUM, [19] T. W. Anderson, The statistical analysis of time series [by] T. W. Anderson. Wiley New York,, [20] D. M. Allen, The relationship between variable selection and data augmentation and a method for prediction. Technometrics, vol. 16, pp , [21] I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, vol. 3, pp , [Online]. Available: [22] K. Fukunaga, Introduction to Statistical Pattern Recognition, Second Edition (Computer Science and Scientific Computing Series). Academic Press, September [Online]. Available: [23] D. Franois, High-dimensional data analysis: optimal metrics and feature selection, Ph.D., Universit catholique de Louvain, Louvainla-Neuve, BELGIUM, [24] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, November [Online]. Available:

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles

More information

Time series prediction

Time series prediction Chapter 12 Time series prediction Amaury Lendasse, Yongnan Ji, Nima Reyhani, Jin Hao, Antti Sorjamaa 183 184 Time series prediction 12.1 Introduction Amaury Lendasse What is Time series prediction? Time

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Ensembles of Nearest Neighbor Forecasts

Ensembles of Nearest Neighbor Forecasts Ensembles of Nearest Neighbor Forecasts Dragomir Yankov 1, Dennis DeCoste 2, and Eamonn Keogh 1 1 University of California, Riverside CA 92507, USA, {dyankov,eamonn}@cs.ucr.edu, 2 Yahoo! Research, 3333

More information

Lazy learning for control design

Lazy learning for control design Lazy learning for control design Gianluca Bontempi, Mauro Birattari, Hugues Bersini Iridia - CP 94/6 Université Libre de Bruxelles 5 Bruxelles - Belgium email: {gbonte, mbiro, bersini}@ulb.ac.be Abstract.

More information

Direct and Recursive Prediction of Time Series Using Mutual Information Selection

Direct and Recursive Prediction of Time Series Using Mutual Information Selection Direct and Recursive Prediction of Time Series Using Mutual Information Selection Yongnan Ji, Jin Hao, Nima Reyhani, and Amaury Lendasse Neural Network Research Centre, Helsinki University of Technology,

More information

Using the Delta Test for Variable Selection

Using the Delta Test for Variable Selection Using the Delta Test for Variable Selection Emil Eirola 1, Elia Liitiäinen 1, Amaury Lendasse 1, Francesco Corona 1, and Michel Verleysen 2 1 Helsinki University of Technology Lab. of Computer and Information

More information

Input Selection for Long-Term Prediction of Time Series

Input Selection for Long-Term Prediction of Time Series Input Selection for Long-Term Prediction of Time Series Jarkko Tikka, Jaakko Hollmén, and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54,

More information

Machine learning techniques for decision support in anesthesia

Machine learning techniques for decision support in anesthesia Machine learning techniques for decision support in anesthesia Olivier Caelen 1, Gianluca Bontempi 1, and Luc Barvais 2 1 Machine Learning Group, Département d Informatique, Université Libre de Bruxelles,

More information

Optimal Kernel Shapes for Local Linear Regression

Optimal Kernel Shapes for Local Linear Regression Optimal Kernel Shapes for Local Linear Regression Dirk Ormoneit Trevor Hastie Department of Statistics Stanford University Stanford, CA 94305-4065 ormoneit@stat.stanjord.edu Abstract Local linear regression

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Non-parametric Residual Variance Estimation in Supervised Learning

Non-parametric Residual Variance Estimation in Supervised Learning Non-parametric Residual Variance Estimation in Supervised Learning Elia Liitiäinen, Amaury Lendasse, and Francesco Corona Helsinki University of Technology - Lab. of Computer and Information Science P.O.

More information

Fast Bootstrap for Least-square Support Vector Machines

Fast Bootstrap for Least-square Support Vector Machines ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 Fast Bootstrap for Least-square Support Vector Machines

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method Geoffroy Simon Université catholique de Louvain DICE - Place du Levant, 3 B-1348 Louvain-la-Neuve Belgium

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

Iterative ARIMA-Multiple Support Vector Regression models for long term time series prediction

Iterative ARIMA-Multiple Support Vector Regression models for long term time series prediction and Machine Learning Bruges (Belgium), 23-25 April 24, i6doccom publ, ISBN 978-2874995-7 Available from http://wwwi6doccom/fr/livre/?gcoi=2843244 Iterative ARIMA-Multiple Support Vector Regression models

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Gaussian Fitting Based FDA for Chemometrics

Gaussian Fitting Based FDA for Chemometrics Gaussian Fitting Based FDA for Chemometrics Tuomas Kärnä and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54 FI-25, Finland tuomas.karna@hut.fi,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Logistic Regression and Boosting for Labeled Bags of Instances

Logistic Regression and Boosting for Labeled Bags of Instances Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Evaluating nonlinearity and validity of nonlinear modeling for complex time series

Evaluating nonlinearity and validity of nonlinear modeling for complex time series Evaluating nonlinearity and validity of nonlinear modeling for complex time series Tomoya Suzuki, 1 Tohru Ikeguchi, 2 and Masuo Suzuki 3 1 Department of Information Systems Design, Doshisha University,

More information

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering

More information

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang / CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from

More information

Autonomous learning algorithm for fully connected recurrent networks

Autonomous learning algorithm for fully connected recurrent networks Autonomous learning algorithm for fully connected recurrent networks Edouard Leclercq, Fabrice Druaux, Dimitri Lefebvre Groupe de Recherche en Electrotechnique et Automatique du Havre Université du Havre,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Using reservoir computing in a decomposition approach for time series prediction.

Using reservoir computing in a decomposition approach for time series prediction. Using reservoir computing in a decomposition approach for time series prediction. Francis wyffels, Benjamin Schrauwen and Dirk Stroobandt Ghent University - Electronics and Information Systems Department

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Model Selection for LS-SVM : Application to Handwriting Recognition

Model Selection for LS-SVM : Application to Handwriting Recognition Model Selection for LS-SVM : Application to Handwriting Recognition Mathias M. Adankon and Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, École de Technologie Supérieure,

More information

Advances in Extreme Learning Machines

Advances in Extreme Learning Machines Advances in Extreme Learning Machines Mark van Heeswijk April 17, 2015 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs

More information

Multi-Model Integration for Long-Term Time Series Prediction

Multi-Model Integration for Long-Term Time Series Prediction Multi-Model Integration for Long-Term Time Series Prediction Zifang Huang, Mei-Ling Shyu, James M. Tien Department of Electrical and Computer Engineering University of Miami, Coral Gables, FL, USA z.huang3@umiami.edu,

More information

Modeling and Predicting Chaotic Time Series

Modeling and Predicting Chaotic Time Series Chapter 14 Modeling and Predicting Chaotic Time Series To understand the behavior of a dynamical system in terms of some meaningful parameters we seek the appropriate mathematical model that captures the

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental

More information

Publication IV Springer-Verlag Berlin Heidelberg. Reprinted with permission.

Publication IV Springer-Verlag Berlin Heidelberg. Reprinted with permission. Publication IV Emil Eirola and Amaury Lendasse. Gaussian Mixture Models for Time Series Modelling, Forecasting, and Interpolation. In Advances in Intelligent Data Analysis XII 12th International Symposium

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine Commun. Theor. Phys. (Beijing, China) 43 (2005) pp. 1056 1060 c International Academic Publishers Vol. 43, No. 6, June 15, 2005 Discussion About Nonlinear Time Series Prediction Using Least Squares Support

More information

A First Attempt of Reservoir Pruning for Classification Problems

A First Attempt of Reservoir Pruning for Classification Problems A First Attempt of Reservoir Pruning for Classification Problems Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin Katholieke Universiteit Leuven - P.M.A. Celestijnenlaan 300b, 3000 Leuven - Belgium Abstract.

More information

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Dual Estimation and the Unscented Transformation

Dual Estimation and the Unscented Transformation Dual Estimation and the Unscented Transformation Eric A. Wan ericwan@ece.ogi.edu Rudolph van der Merwe rudmerwe@ece.ogi.edu Alex T. Nelson atnelson@ece.ogi.edu Oregon Graduate Institute of Science & Technology

More information

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test? Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation

Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation Marco Saerens, Patrice Latinne & Christine Decaestecker Université Catholique de Louvain and Université Libre de Bruxelles

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point?

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point? Engineering Letters, 5:, EL_5 Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point? Pilar Gómez-Gil Abstract This paper presents the advances of a research using a combination

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A.

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A. A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network José Maria P. Menezes Jr. and Guilherme A. Barreto Department of Teleinformatics Engineering Federal University of Ceará,

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Linear Least-Squares Based Methods for Neural Networks Learning

Linear Least-Squares Based Methods for Neural Networks Learning Linear Least-Squares Based Methods for Neural Networks Learning Oscar Fontenla-Romero 1, Deniz Erdogmus 2, JC Principe 2, Amparo Alonso-Betanzos 1, and Enrique Castillo 3 1 Laboratory for Research and

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Classification of Ordinal Data Using Neural Networks

Classification of Ordinal Data Using Neural Networks Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Bootstrap for model selection: linear approximation of the optimism

Bootstrap for model selection: linear approximation of the optimism Bootstrap for model selection: linear approximation of the optimism G. Simon 1, A. Lendasse 2, M. Verleysen 1, Université catholique de Louvain 1 DICE - Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium,

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Linear Dependency Between and the Input Noise in -Support Vector Regression

Linear Dependency Between and the Input Noise in -Support Vector Regression 544 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 3, MAY 2003 Linear Dependency Between the Input Noise in -Support Vector Regression James T. Kwok Ivor W. Tsang Abstract In using the -support vector

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Performance of Cross Validation in Tree-Based Models

Performance of Cross Validation in Tree-Based Models Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

Development of a Data Mining Methodology using Robust Design

Development of a Data Mining Methodology using Robust Design Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis 84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.

More information

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape WCCI 200 IEEE World Congress on Computational Intelligence July, 8-23, 200 - CCIB, Barcelona, Spain CEC IEEE A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape Liang Shen and

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information