Long-Term Prediction of Time Series by combining Direct and MIMO Strategies
|
|
- Owen Waters
- 6 years ago
- Views:
Transcription
1 Long-Term Prediction of Series by combining Direct and MIMO Strategies Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa and Amaury Lendasse Abstract Reliable and accurate prediction of time series over large future horizons has become the new frontier of the forecasting discipline. Current approaches to long-term time series forecasting rely either on iterated predictors, direct predictors or, more recently, on the Multi-Input Multi- Output (MIMO) predictors. The iterated approach suffers from the accumulation of errors, the Direct strategy makes a conditional independence assumption, which does not necessarily preserve the stochastic properties of the time series, while the MIMO technique is limited by the reduced flexibility of the predictor. The paper compares the Direct and MIMO strategy and discusses their respective limitations to the problem of longterm time series prediction. It also proposes a new methodology that is a sort of intermediate way between the Direct and the MIMO technique. The paper presents the results obtained with the ESTSP 2007 competition dataset. I. INTRODUCTION The prediction of the future behavior of an observed time series over a long horizon H is still an open problem in forecasting [1]. Currently, the most common approaches to long-term forecasting rely either on iterated [2] or direct prediction techniques [3]. Given a time series {ϕ 1,..., ϕ t }, in the first case, an H-step-ahead prediction problem is tackled by iterating, H times, a one-step-ahead predictor. Once the predictor has estimated the future series value, this value is fed back as an input to the following prediction. Hence, the predictor takes as inputs estimated values, instead of actual observations with evident negative consequences in terms of error propagation. Examples of iterated approaches are recurrent neural networks [4] or local learning iterated techniques [5], [6]. Another way to perform H-step-ahead forecasting consists in estimating a set of H prediction models, each returning a direct forecast of ϕ t+h with h {1,..., H} [3]. Direct methods often require higher functional complexity than iterated ones in order to model the stochastic dependency between two series values at two distant instants. In computational intelligence literature, we have also examples of research works, where the two approaches have been successfully combined [7]. In spite of their diversity, iterated and direct techniques for multi-step-ahead prediction share a common feature, in the sense that they model from data, a multi-input single-output Gianluca Bontempi and Souhaib Ben Taieb are with the Machine Learning Group, Computer Science Department, Faculty of Sciences, Université Libre de Bruxelles ( {gbonte, sbentaie}@ulb.ac.be). Antti Sorjamaa and Amaury Lendasse are with the Department of Information and Computer Science, Helsinki University of Technology, Finland ( {lendasse, Antti.Sorjamaa}@hut.fi). mapping, whose output is the variable ϕ t+1 in the iterated case and the variable ϕ t+h in the direct case. In [8], the author proposed a MIMO approach for longterm time series prediction, where the predicted value is no more a scalar quantity but a vector of future values {ϕ t+1,..., ϕ t+h } of the time series ϕ. This approach replaces the H models of the Direct approach with one multioutput model, which aims to preserve, between the predicted values, the stochastic dependency characterizing the time series. This paper goes one step further in that direction by proposing a new methodology for long-term prediction, aiming to preserve the most appealing aspects of both the Direct and MIMO approaches. The Direct and the MIMO approach can be indeed seen as two distinct instances of the same prediction approach, which decomposes long-term prediction into multi-output tasks. In the direct case, the number of prediction tasks is equal to the size of the horizon H and the size of the outputs is 1. In the MIMO case, the number of prediction tasks is equal to one and the size of the output is H. Intermediate configurations can be imagined by transforming the original task into n = H s prediction tasks, each with multiple outputs of size s, where s {1,..., H}. This approach, called Multi-Input Several Multi- Output (MISMO), trades off the property of preserving the stochastic dependency between future values with a greater flexibility of the predictor. For instance, the fact of having n > 1 allows the selection of different inputs for different horizons. In other terms, s addresses the bias/variance tradeoff of the predictor by constraining the degree of dependency between predictions (null in the case s = 1 and maximal for s = H). This paper introduces and assesses the MISMO methodology by implementing each prediction model with a Lazy Learning model [9], [10]. Lazy Learning (LL) is a local modeling technique, which is query-based in the sense that the whole learning procedure (i.e. structural and parametric identification) is deferred until a prediction is required. [11], [12] presented a LL algorithm that selects on a queryby-query basis and by means of a local cross-validation scheme the optimal number of neighbors. Iterated versions of Lazy Learning were successfully applied to multi-step-ahead time series prediction [13], [14]. A LL-MIMO version was presented in [8]. This paper presents a LL method for the MISMO prediction of multiple and dependent outputs in the context of long-term time series prediction. Section II introduces the new MISMO strategy with respect to existing approaches to long-term prediction. Sec-
2 tion III details the MISMO methodology. First, the Lazy Learning model and the validation procedure is explained. Next, the way to choose the parameter s of the MISMO model is presented. Then, the input selection procedure for the MISMO model is explained. Section IV presents the MISMO prediction results obtained with a series proposed by the ESTSP 2007 competition [15], [16]. II. LONG-TERM PREDICTION TECHNIQUES This section first introduces the Direct and the MIMO strategies and then discusses how the original MISMO approach situates with respect to the state of the art. Let us consider a regular and univariate time series {ϕ 1, ϕ 2,..., ϕ t } for which we intend to predict the continuation for the next H steps. Suppose that we can embed the time series into an input-output format and that an input selection strategy is adopted. Suppose also that the maximum embedding order is d and that the embedding order after the input selection is m (m d). A. Direct strategy The Direct strategy first embeds the original series into H datasets with D 1 = {(x i1, y i1 ) (R m R)} N i=1, (1). D H = {(x ih, y ih ) (R m R)} N i=1. (2) x ih {ϕ i,..., ϕ i+d 1 }, (3) y ih = {ϕ i+d 1+h }, h {1,..., H}. (4) and where x h stands for the subset of inputs returned by the input selection procedure Once the time series is embedded, the Direct prediction strategy learns H direct models f h ( ) with y ih = f h (x ih ) + w ih, (5) where w ih denotes the additive noise. An attractive property of this method is that it is not prone to the accumulation of the prediction errors [17]. However, the conditional independence of the H trained models does not allow the technique to keep into consideration complex dependency patterns existing between the variables ϕ t+h [8]. B. MIMO Strategy In order to remove the conditional independence assumption which is implicit in the Direct approach, a MIMO strategy was introduced in [8]. Unlike the Direct strategy, MIMO considers a single dataset D = {(x i, y i ) (R m R H )} N i=1, (6) where x i {ϕ i,..., ϕ i+d 1 }, (7) y i = {ϕ i+d,..., ϕ i+d+h 1 }. (8) The MIMO strategy estimates a single multi-output model f( ) with {y i } = f(x i ) + w i, (9) which returns a vector of H predictions. The MIMO strategy constrains all the horizons to be predicted with the same model structure, for instance with the same set of inputs x. This constraint greatly reduces the flexibility of the prediction approach and can excessively bias the returned model. This is not the case of the Direct strategy, where each model is allowed to use a different set of input variables. C. MISMO strategy A solution to the shortcomings of the previously discussed techniques comes from the adoption of an intermediate approach, where the constraint of MIMO is relaxed by tuning an integer parameter s, which calibrates the dimensionality of the output on the basis of a validation criterion. For a given s, the training set of the MISMO techniques is composed of n = H s portions where D 1 = {(x i1, y i1 ) (R m R s )} N i=1, (10). D n = {(x in, y in ) (R m R s )} N i=1. (11) x ip {ϕ i,..., ϕ i+d 1 }, (12) y ip = {ϕ i+d+(p 1)s,..., ϕ i+d+ps 1 }. (13) and p {1,..., n}. The MISMO technique trains n = H s models f p( ) with {y ip } = f p (x ip ) + w ip, (14) where p {1,..., n}. We can see that for s = 1, the approach boils down to the direct approach, while for s = H we have the MIMO predictor. For instance, given an 100-steps-ahead prediction task, the dataset of the MISMO strategy with s = 50 is composed of 2 parts D 1 = {(x i1, y i1 ) (R m R 50 )} N i=1, (15) D 2 = {(x i2, y i2 ) (R m R 50 )} N i=1. (16) and the MISMO strategy estimates 2 models, f 1 and f 2 with {y i1 } = f 1 (x i1 ) + w i1, (17) {y i2 } = f 2 (x i2 ) + w i2. (18) Note that, throughout the paper, we assume that the value of the parameter s is a divisor of H. If it is not the case, it is sufficient to increase the horizon H of the quantity [s (H mod s)] where mod stands for the modulo operation.
3 III. GLOBAL METHODOLOGY The general principle underlying the MISMO strategy can lead to distinct implementations according to the learner used to estimate the dependencies f p and the procedure adopted to tune the value of the parameter s. In the following, we will detail the choices made in this paper in order to design a long-term predictor to be used in real forecasting tasks. A. Learning and validation procedure The estimation of the functions f p in (14) relies on a nearest neighbor approach, where the problem of adjusting the size of the neighborhood is solved by a Lazy Learning strategy [9], [10]. Lazy Learning algorithms are query-based local learning algorithms, i.e. they defer the whole learning process until a specific query needs to be answered, and once the prediction is returned, they discard both the answer and the constructed model [18]. In this work, we use a multi-output extension of the Lazy Learning model called LL-MIMO previously discussed in [8]. Once we have embedded the time series, the forecasting problem boils down to n = H s multi-input multi-output supervised learning tasks D 1 = {(x i1, y i1 ) (R m R s )} N i=1, (19). D n = {(x in, y in ) (R m R s )} N i=1. (20) where H is the horizon of the prediction and s is the size of the output vectors y ip (see equation 11). Given a query point x q R m and a metric on the space R m, we estimate the n prediction vectors y qp of x q, for each dataset D p as follows : Given k, the number of neighbors Order increasingly the set of vectors {x ip } N i=1 with respect to the distance to x q Denote by [j] the index of the jth closest neighbor of x q ŷ qp = 1 k k j=1 y [j]p where y [j]p is the output of the jth closest neighbor of x q in D p After the calculation of ŷ qp for each p {1,..., n}, the long term prediction is given by the concatenated vector (ŷ q1,..., ŷ qn ). The adoption of local approach to solve a prediction task requires the choice of a set of model parameters (e.g. the number k of neighbors, the kernel function, the distance metric). In the following, we will present two criteria to assess and compare local models with different number of neighbors. The first criterion is a multi-ouput extension of the local Leave-One-Out (LOO) and the second is a measure of discrepancy between the training series and the forecasted sequence, which rely either on linear or nonlinear measures [19]. A computationally efficient way to perform LOO crossvalidation and to assess the performance in generalization of local linear models is the PRESS statistic, proposed in 1974 by Allen [20]. By assessing the performance of each local model, alternative configurations can be tested and compared in order to select the best one in terms of expected prediction. Let D denote a multi-input single-output dataset D = {(x i, y i ) (R m R)} N i=1 (21) and suppose we want to estimate the output for a query point x q R m. The idea consists of associating a LOO error e LOO (k) to the estimation ŷ k = 1 k k y [j] (22) j=1 returned by k neighbors. In case of constant model, the LOO term can be derived as follows [18]: e j (k) = y [j] e LOO (k) = 1 k k (e j (k)) 2, (23) j=1 k i=1(i j) y [i] k 1 = k y [j] ŷ k k 1. (24) The optimal number of neighbors is then defined as the number k = arg min e LOO (k), (25) which minimizes the LOO error. Now, in a multi-input multi-output setting with an output of size s, where D = {(x i, y i ) (R m R s )} N i=1, (26) we can define as multi-step LOO error the quantity E LOO (k) = 1 s (e h s LOO(k)) 2, (27) h=1 where e h LOO (k) is the LOO error for the horizon h in (23). In the following, LL-MIMO-CV will refer to a Lazy Learner which selects the number of neighbors according to the multiple LOO assessment. Note that, in a MISMO setting, there are several datasets D p and then, an optimal number of neighbors k p has to be found for each dataset D p. The second criterion we propose, is not based on crossvalidation but rather on a measure of stochastic discrepancy between the forecasted sequence and the training time series. The rationale is that, the lower the discrepancy between the descriptors of the prediction and the descriptors of the training series, the better is the quality of the returned forecast [8]. Suppose, for instance, that we have a training time series, which behaves like a sine wave and that we want to predict H = 20 steps ahead. Suppose, that we have to choose betwen two alternative models (e.g. two LL models with different number of neighbors) whose predictions (in bold on the Figures 1(a) and 1(b)) have not significantly different LOO errors. The question is which one is the
4 Series (a) Training series (normal line) and the forecasted sequence (bold line) Series (b) Training series (normal line) and the forecasted sequence (bold line) Fig. 1. Training series and forecasted sequence of two models, which have not significantly different LOO errors. best, knowing that they do not have significantly different LOO errors?. Intuitively, we would choose for the model in Figure 1(a) since its predicted continuation has a look more similar to the training time series. Now, we have to define mathematically the term look. We can define several measures of discrepancy, both linear and non-linear. For example, we can use the autocorrelation or the partial autocorrelation for the linear case and the likelihood for the non-linear case. In this work, we will restrict to consider linear measures. Let ϕ denote the training time series and ŷ k the estimation returned by k neighbors. The linear discrepancy measure k is k = 1 cor[ρ(ϕ ŷ k ), ρ(ϕ)] + 1 cor[π(ϕ ŷ k ), π(ϕ)], where ϕ ŷ k is the concatenation of the training time series ϕ and the forecasted sequence ŷ k, ρ(x) is the autocorrelation of the series x, π(x) is the partial autocorrelation of the series x and cor[x, y] is the correlation of the series x and the series y. So, we can associate to the long-term forecasting ŷ k, a measure of quality, which is not based on cross-validation but rather on the preservation of the stochastic properties of the series. The corresponding selection criterion is then k = arg min k, (28) which aims to find the number of neighbors k for which the predicted sequence is the closest, in terms of stochastic properties, to the training series. In the following, LL- MIMO-D will refer to a LL predictor where the neighbor selection relies on such criterion. B. Choice of the parameter s In the MISMO model (14) the parameter s addresses the bias/variance trade-off by constraining the degree of dependency between the predictions. The value of this parameter s is expected to play a major role in the accuracy of the method. A possible way to choose the value of this parameter, is to analyze the performance of the MISMO model for different values of s on the learning set, and then use the best value obtained so far, to estimate the outputs. The following algorithm illustrates a cross-validated strategy to choose a good value of s. Algorithm 1: Selection of the parameter s Input : ϕ = {ϕ 1,..., ϕ t }, the time series Input : m = Embedding order Input : H = Horizon Input : K= Max range of number of neighbors Output: s, best value of the parameter s 1 2 for s in {1,..., H} do n = H s D (s) 1 = {(x i1, y i1 ) (R m R s )} N i=1. D n (s) = {(x in, y in ) (R m R s )} N i=1 for p in {1,..., n} do D p (s) = {(x ip, y ip ) (R d R s )} N i=1 = vector of size K E (p) learning for nn in {2,..., K} do E (p) learning [nn] Learn model on D p (s) with range {2,..., nn} end end E learning (s) = vector of size K for nn in {2,..., K} do n p=1 E(p) learning [nn] E learning (s)[nn] 1 n end Error(s) 1 K K nn=2 E learning(s)[nn] end s = arg min Error(s) return s The first loop (line 1) ranges over all the values of the parameter s. Given a value of the parameter s, the output y is divided into n = H s portions (y 1,..., y n ). The second loop (line 2) processes each portions separately by measuring the LOO error for a number nn {2,..., K} of neighbors. For a given s, we obtain as a result a vector E (p) learning of size K for each ouput y p, p {1,..., n}. Now, we average over p to obtain a vector E learning (s) of length K whose k term represents the LOO performance of the MISMO model with the parameter s and k neighbors. Several methods could be used to derive a global (i.e. independent of the number of neighbors) estimate Error(s) of the MISMO performance for a given s. For instance, we
5 could take the minimum or the mean value of the vector E learning (s). In this work, we adopt an averaging approach and we estimate the performance of the MISMO model by taking the mean value of the vector E learning (s). C. Input Selection Procedure The flexibility of the MISMO approach allows a different input selection for each of the n prediction tasks. Input selection [21],[22] consists of choosing a subset of (x 1, x 2,..., x d ), that has the maximal predictive power (x i is the ith variable of the vector x). Each input selection procedure relies on two elements: a relevance criterion and a search procedure [23]. The relevance criterion is a statistical measure of the relevance of the variable selected. Several relevance criteria like Mutual information, Gamma test,etc [23] have been proposed in litterature. In this work we adopted as relevance criterion the 2-fold cross-validation of a 1-NN approximator [24]. Since in the MIMO technique the size of the output y can be greater than one, the relevance criterion is adapted for multiple outputs by taking an average over the prediction horizon H, as shown in the following pseudo-code. Calculation of the relevance criterion Given a dataset D = {(x i, y i ) (R d R s )} N i=1 Given a set of variables V of size m (m d), with V {x 1,..., x d } Given a metric on the space R m Divide the N input-outputs pairs in two parts D1 and D2 For each point (x i, y i ) in D1 Find the nearest neighbor, say x i, of x i in D2 according to the metric and the set V. Calculate err xi = 1 s s j=1 (y ij yij )2 where y ij is the jth component of y i and yij is the jth component of yi For each point (x i, y i ) in D2 Find the nearest neighbor, say x i, of x i in D1 according to the metric and the set V. Calculate err xi = 1 s s j=1 (y ij yij )2 where y ij is the jth component of y i and yij is the jth component of yi Calculate C(V ) = 1 N N i=1 err x i which is the statistical measure of the relevance of the set of variables V. As far as the search is concerned we adopted a Forward- Backward Search procedure(fbs) [21],[22] which is a combination of conventional Forward and Backward search. FBS is flexible in the sense that a variable is able to return to the selected set once it has been dropped and vice versa, a previously selected variable can be discarded later. This method can start from any initial input variable set: empty set, full set, custom set or randomly initialized set. In our experiments, we use the Forward-Backward method with four sets of initial variables, the first one is the empty set, and the three others are randomly initialized sets. After calculating the value of the criterion for these four sets, the final set of variables is the one which minimizes the relevance criterion. We are not starting the FBS from the full set, because our experiments have shown that it leads to far more selected variables and less accurate results. This is due to the local optima problem of the search. IV. EXPERIMENTS In order to assess the performance of the MISMO strategy, we tested it on the ESTSP 2007 competition dataset [15], [16]. The training series, composed of 875 values, is shown in Figure Fig. 2. ESTSP 2007 Competition dataset. We set the maximum embedding order d to 55 [15], the maximum number of neighbors in LL to K = 160 and we consider prediction tasks with H = 100. For time reasons, the selection of the s value was restricted to the five values of the set s = {1, 5, 10, 50, 100}. Note that the value s = 1 corresponds to the Direct strategy, s = 100 to the MIMO strategy and the values s = {5, 10, 50} to intermediate MISMO strategies. Figure 3 plots the vector of the LOO errors obtained for the training series as a function of the number of neighbors for the five values of the parameter s. This figure suggests that the MISMO model with s = 50 is on average better than the others. The following figures compare the predicted continuation of the 5 MISMO models (both for LL-MIMO-CV and LL- MIMO-D) to the real continuation of the series. Figure 4 shows the prediction returned by the MISMO with s = 1, i.e. by the Direct strategy. The Mean Square Error (MSE) is 1.68 for the LL-MIMO-CV model. Note
6 Leave one out error s=1 s=5 s=10 s=50 s=100 Fig. 5. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 5. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model The s = 10 case is illustrated by Figure 6. The accuracy is worse than in the s = 5 case. The MSE is 1.38 for the LL-MIMO-CV model and 1.05 for the LL-MIMO-D model. Number of neighbors Fig. 3. Performance of MISMO on the 10-fold cross validation with s = 1 (solid line), s = 5 (dashed line), s = 10 (dotted line), s = 50 (dotdashed line) and s = 100 (large dashed line). that the LL-MIMO-D prediction is not present here since the too small size of the output prevents the autocorrelation computation. Fig. 6. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 10. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. As suggested by the results in cross-validation, the best performance is attained for s = 50 (Figure 7). The MSE drops to 0.82 for the LL-MIMO-CV model and to 0.67 for the LL-MIMO-D model. Fig. 4. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 1, which corresponds to the Direct strategy. Solid thick line represents the real value and the solid thin line is the prediction with the LL-MIMO-CV model. The MIMO case (s = 100) is ilustrated by Figure 8. The MSE is 1.34 for the LL-MIMO-CV model and 0.95 for the LL-MIMO-D model. Figure 5 shows the s = 5 predictions. The Mean Square Error (MSE) amounts to 1.24 for the LL-MIMO-CV model and to 1.57 for the LL-MIMO-D model. In Table I, the errors for the learning and the test set are summarized for the different values of the parameter s. The results of Table I and the related figures show clearly the impact of the parameter s on the accuracy of the prediction and justify the adoption of a MISMO strategy. The value of s controls effectively the trade-off between bias and
7 Fig. 7. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 50. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. for an horizon H = 50 is Note that this accuracy is better than the one obtained by the winner of the ESTSP 2007 competition (MSE= 0.506) [15], [16]. V. CONCLUSIONS Predictive modelling for single output tasks is known to require careful procedures of assessment and selection. The extension of this procedure to long-term forecasting and consequently to multi-output modelling requires the tuning of additional parameters and the definition of specific calibration procedures. This paper discussed how the size of the multiple outputs play a major role in the generalization accuracy of a long-term forecasting predictor. We showed that Direct and MIMO strategies implicitly constrain the size of the output target with consequent impact on the bias/variance tradeoff and the consequent accuracy. We proposed a new strategy called MISMO and the associated calibration procedure which aims to decompose a longterm prediction task in the optimal number of subtasks. Preliminary results on the ESTSP 2007 competition show that the approach is promising. Fig. 8. ESTSP Competition dataset, prediction of 100 values by using the MISMO strategy with s = 100. Solid thick line represents the real value, the solid thin line is the prediction with the LL-MIMO-CV model and the dotted one is the prediction with the LL-MIMO-D model. s = 1 s = 5 s = 10 s = 50 s = 100 E learning E test LL-MIMO-CV E test LL-MIMO-D TABLE I LOO ERROR OF THE LEARNING AND THE TEST SET, FOR DIFFERENT VALUE OF THE PARAMETER s. variance as made evident also by the nature of the predicted profile (strongly variant in the s = 1 case and progressively smoother for higher values of s). A careful choice of s should be then recommended in case of long-term forecasting. The experimental results confirm also the important role of non cross-validated criteria in long term prediction tasks. Indeed it appears that LL-MIMO-D is competitive and sometimes better than the conventional cross-validated selection criterion. Once, more than a single prediction is required, criteria which resume the global behavior of the prediction become attractive. Finally, it is worth adding that the MSE of the prediction returned by MISMO (s = 50) with the LL-MIMO-D model
8 REFERENCES [1] A. Weigend and N. Gershenfeld, Series Prediction: forecasting the future and understanding the past. Harlow, UK: Addison Wesley, [2] Y. Ji, J. Hao, N. Reyhani, and A. Lendasse, Direct and recursive prediction of time series using mutual information selection, in Computational Intelligence and Bioinspired Systems: 8th International Workshop on Artificial Neural Networks, IWANN 05, Vilanova i la Geltra, Barcelona, Spain, ser. Lecture Notes in Computer Science, vol Springer-Verlag GmbH, June , pp [3] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, and A. Lendasse, Methodology for long-term prediction of time series, Neurocomputing, [4] R. Williams and D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Computation, vol. 1, pp , [5] J. D. Farmer and J. J. Sidorowich, Predicting chaotic time series. Physical Review Letters, vol. 8, no. 59, pp , [6] J. McNames, A nearest trajectory strategy for time series prediction, in Proceedings of the InternationalWorkshop on Advanced Black-Box Techniques for Nonlinear Modeling. Belgium: K.U. Leuven, 1998, pp [7] T. Sauer, series prediction by using delay coordinate embedding, in Series Prediction: forecasting the future and understanding the past, A. S. Weigend and N. A. Gershenfeld, Eds. Harlow, UK: Addison Wesley, 1994, pp [8] G. Bontempi, Long term time series prediction with multi-input multioutput local learning. in Proceedings of the 2nd European Symposium on Series Prediction (TSP), ESTSP08, [9] B. Birattari and M. Bersini, Lazy learning for local modeling and control design, [Online]. Available: [10] M. Birattari, G. Bontempi, and H. Bersini, Lazy learning meets the recursive least-squares algorithm, in Advances in Neural Information Processing Systems 11. MIT Press, 1999, pp [11] G. Bontempi, M. Birattari, and H. Bersini, Lazy learning for modeling and control design, International Journal of Control, vol. 72, no. 7/8, pp , [12] M. Birattari, G. Bontempi, and H. Bersini, Lazy learning meets the recursive least-squares algorithm, in NIPS 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds. Cambridge: MIT Press, 1999, pp [13] G. Bontempi, M. Birattari, and H. Bersini, Lazy learning for iterated time series prediction, in Proceedings of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling, J. A. K. Suykens and J. Vandewalle, Eds. Katholieke Universiteit Leuven, Belgium, 1998, pp [14] G. Bontempi, M. Birattari, and H. Bersin, Local learning for iterated time-series prediction, in Machine Learning: Proceedings of the Sixteenth International Conference, I. Bratko and S. Dzeroski, Eds. San Francisco, CA: Morgan Kaufmann Publishers, 1999, pp [15] A. Lendasse, Ed., ESTSP 2007: Proceedings, [16] A. Lendasse. (2007) ESTSP 07 homepage. [Online]. Available: [17] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, and A. Lendasse, Methodology for long-term prediction of time series, Neurocomput., vol. 70, no , pp , [18] G. Bontempi, Local learning techniques for modeling, prediction and control, Ph.D., IRIDIA-Universit Libre de Bruxelles, Louvainla-Neuve, BELGIUM, [19] T. W. Anderson, The statistical analysis of time series [by] T. W. Anderson. Wiley New York,, [20] D. M. Allen, The relationship between variable selection and data augmentation and a method for prediction. Technometrics, vol. 16, pp , [21] I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, vol. 3, pp , [Online]. Available: [22] K. Fukunaga, Introduction to Statistical Pattern Recognition, Second Edition (Computer Science and Scientific Computing Series). Academic Press, September [Online]. Available: [23] D. Franois, High-dimensional data analysis: optimal metrics and feature selection, Ph.D., Universit catholique de Louvain, Louvainla-Neuve, BELGIUM, [24] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, November [Online]. Available:
EM-algorithm for Training of State-space Models with Application to Time Series Prediction
EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research
More informationLong Term Time Series Prediction with Multi-Input Multi-Output Local Learning
Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles
More informationTime series prediction
Chapter 12 Time series prediction Amaury Lendasse, Yongnan Ji, Nima Reyhani, Jin Hao, Antti Sorjamaa 183 184 Time series prediction 12.1 Introduction Amaury Lendasse What is Time series prediction? Time
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More informationEnsembles of Nearest Neighbor Forecasts
Ensembles of Nearest Neighbor Forecasts Dragomir Yankov 1, Dennis DeCoste 2, and Eamonn Keogh 1 1 University of California, Riverside CA 92507, USA, {dyankov,eamonn}@cs.ucr.edu, 2 Yahoo! Research, 3333
More informationLazy learning for control design
Lazy learning for control design Gianluca Bontempi, Mauro Birattari, Hugues Bersini Iridia - CP 94/6 Université Libre de Bruxelles 5 Bruxelles - Belgium email: {gbonte, mbiro, bersini}@ulb.ac.be Abstract.
More informationDirect and Recursive Prediction of Time Series Using Mutual Information Selection
Direct and Recursive Prediction of Time Series Using Mutual Information Selection Yongnan Ji, Jin Hao, Nima Reyhani, and Amaury Lendasse Neural Network Research Centre, Helsinki University of Technology,
More informationUsing the Delta Test for Variable Selection
Using the Delta Test for Variable Selection Emil Eirola 1, Elia Liitiäinen 1, Amaury Lendasse 1, Francesco Corona 1, and Michel Verleysen 2 1 Helsinki University of Technology Lab. of Computer and Information
More informationInput Selection for Long-Term Prediction of Time Series
Input Selection for Long-Term Prediction of Time Series Jarkko Tikka, Jaakko Hollmén, and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54,
More informationMachine learning techniques for decision support in anesthesia
Machine learning techniques for decision support in anesthesia Olivier Caelen 1, Gianluca Bontempi 1, and Luc Barvais 2 1 Machine Learning Group, Département d Informatique, Université Libre de Bruxelles,
More informationOptimal Kernel Shapes for Local Linear Regression
Optimal Kernel Shapes for Local Linear Regression Dirk Ormoneit Trevor Hastie Department of Statistics Stanford University Stanford, CA 94305-4065 ormoneit@stat.stanjord.edu Abstract Local linear regression
More informationVariable Selection in Data Mining Project
Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal
More informationNon-parametric Residual Variance Estimation in Supervised Learning
Non-parametric Residual Variance Estimation in Supervised Learning Elia Liitiäinen, Amaury Lendasse, and Francesco Corona Helsinki University of Technology - Lab. of Computer and Information Science P.O.
More informationFast Bootstrap for Least-square Support Vector Machines
ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 Fast Bootstrap for Least-square Support Vector Machines
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationLong-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method
Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method Geoffroy Simon Université catholique de Louvain DICE - Place du Levant, 3 B-1348 Louvain-la-Neuve Belgium
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More informationIterative ARIMA-Multiple Support Vector Regression models for long term time series prediction
and Machine Learning Bruges (Belgium), 23-25 April 24, i6doccom publ, ISBN 978-2874995-7 Available from http://wwwi6doccom/fr/livre/?gcoi=2843244 Iterative ARIMA-Multiple Support Vector Regression models
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationGaussian Fitting Based FDA for Chemometrics
Gaussian Fitting Based FDA for Chemometrics Tuomas Kärnä and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54 FI-25, Finland tuomas.karna@hut.fi,
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLogistic Regression and Boosting for Labeled Bags of Instances
Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationEvaluating nonlinearity and validity of nonlinear modeling for complex time series
Evaluating nonlinearity and validity of nonlinear modeling for complex time series Tomoya Suzuki, 1 Tohru Ikeguchi, 2 and Masuo Suzuki 3 1 Department of Information Systems Design, Doshisha University,
More informationOutliers Treatment in Support Vector Regression for Financial Time Series Prediction
Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering
More informationCMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /
CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from
More informationAutonomous learning algorithm for fully connected recurrent networks
Autonomous learning algorithm for fully connected recurrent networks Edouard Leclercq, Fabrice Druaux, Dimitri Lefebvre Groupe de Recherche en Electrotechnique et Automatique du Havre Université du Havre,
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationUsing reservoir computing in a decomposition approach for time series prediction.
Using reservoir computing in a decomposition approach for time series prediction. Francis wyffels, Benjamin Schrauwen and Dirk Stroobandt Ghent University - Electronics and Information Systems Department
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationModel Selection for LS-SVM : Application to Handwriting Recognition
Model Selection for LS-SVM : Application to Handwriting Recognition Mathias M. Adankon and Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, École de Technologie Supérieure,
More informationAdvances in Extreme Learning Machines
Advances in Extreme Learning Machines Mark van Heeswijk April 17, 2015 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs
More informationMulti-Model Integration for Long-Term Time Series Prediction
Multi-Model Integration for Long-Term Time Series Prediction Zifang Huang, Mei-Ling Shyu, James M. Tien Department of Electrical and Computer Engineering University of Miami, Coral Gables, FL, USA z.huang3@umiami.edu,
More informationModeling and Predicting Chaotic Time Series
Chapter 14 Modeling and Predicting Chaotic Time Series To understand the behavior of a dynamical system in terms of some meaningful parameters we seek the appropriate mathematical model that captures the
More informationSVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION
International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2
More informationGeometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat
Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental
More informationPublication IV Springer-Verlag Berlin Heidelberg. Reprinted with permission.
Publication IV Emil Eirola and Amaury Lendasse. Gaussian Mixture Models for Time Series Modelling, Forecasting, and Interpolation. In Advances in Intelligent Data Analysis XII 12th International Symposium
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationA CUSUM approach for online change-point detection on curve sequences
ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationDiscussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine
Commun. Theor. Phys. (Beijing, China) 43 (2005) pp. 1056 1060 c International Academic Publishers Vol. 43, No. 6, June 15, 2005 Discussion About Nonlinear Time Series Prediction Using Least Squares Support
More informationA First Attempt of Reservoir Pruning for Classification Problems
A First Attempt of Reservoir Pruning for Classification Problems Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin Katholieke Universiteit Leuven - P.M.A. Celestijnenlaan 300b, 3000 Leuven - Belgium Abstract.
More informationCombination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters
Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationDual Estimation and the Unscented Transformation
Dual Estimation and the Unscented Transformation Eric A. Wan ericwan@ece.ogi.edu Rudolph van der Merwe rudmerwe@ece.ogi.edu Alex T. Nelson atnelson@ece.ogi.edu Oregon Graduate Institute of Science & Technology
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationElectric Load Forecasting Using Wavelet Transform and Extreme Learning Machine
Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University
More informationAny Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation
Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation Marco Saerens, Patrice Latinne & Christine Decaestecker Université Catholique de Louvain and Université Libre de Bruxelles
More informationA Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationA Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe
More informationLong-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point?
Engineering Letters, 5:, EL_5 Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point? Pilar Gómez-Gil Abstract This paper presents the advances of a research using a combination
More informationInteger weight training by differential evolution algorithms
Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp
More informationA New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A.
A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network José Maria P. Menezes Jr. and Guilherme A. Barreto Department of Teleinformatics Engineering Federal University of Ceará,
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationLinear Least-Squares Based Methods for Neural Networks Learning
Linear Least-Squares Based Methods for Neural Networks Learning Oscar Fontenla-Romero 1, Deniz Erdogmus 2, JC Principe 2, Amparo Alonso-Betanzos 1, and Enrique Castillo 3 1 Laboratory for Research and
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationImmediate Reward Reinforcement Learning for Projective Kernel Methods
ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationAutomatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries
Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationClassification of Ordinal Data Using Neural Networks
Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationBootstrap for model selection: linear approximation of the optimism
Bootstrap for model selection: linear approximation of the optimism G. Simon 1, A. Lendasse 2, M. Verleysen 1, Université catholique de Louvain 1 DICE - Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium,
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationLinear Dependency Between and the Input Noise in -Support Vector Regression
544 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 3, MAY 2003 Linear Dependency Between the Input Noise in -Support Vector Regression James T. Kwok Ivor W. Tsang Abstract In using the -support vector
More informationConvergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network
Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,
More informationModel-Based Reinforcement Learning with Continuous States and Actions
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationPerformance of Cross Validation in Tree-Based Models
Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu
More informationAnalysis of Multiclass Support Vector Machines
Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern
More informationDevelopment of a Data Mining Methodology using Robust Design
Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationComparative Performance Analysis of Three Algorithms for Principal Component Analysis
84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.
More informationA Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape
WCCI 200 IEEE World Congress on Computational Intelligence July, 8-23, 200 - CCIB, Barcelona, Spain CEC IEEE A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape Liang Shen and
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationAn Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM
An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department
More informationLearning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract
Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More information