Fast Bootstrap for Least-square Support Vector Machines

Similar documents
Bootstrap for model selection: linear approximation of the optimism

Fast Bootstrap applied to LS-SVM for Long Term Prediction of Time Series

Direct and Recursive Prediction of Time Series Using Mutual Information Selection

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

Using the Delta Test for Variable Selection

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Time series forecasting with SOM and local non-linear models - Application to the DAX30 index prediction

Compactly supported RBF kernels for sparsifying the Gram matrix in LS-SVM regression models

Gaussian Fitting Based FDA for Chemometrics

Mutual information for the selection of relevant variables in spectrometric nonlinear modelling

Vector quantization: a weighted version for time-series forecasting

Time series prediction

Jeff Howbert Introduction to Machine Learning Winter

Input Selection for Long-Term Prediction of Time Series

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Heterogeneous mixture-of-experts for fusion of locally valid knowledge-based submodels

Phosphene evaluation in a visual prosthesis with artificial neural networks

Learning Gaussian Process Models from Uncertain Data

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

Linear & nonlinear classifiers

Kernel methods, kernel SVM and ridge regression

Support Vector Machine (continued)

Support Vector Machines for Environmental Informatics: Application to Modelling the Nitrogen Removal Processes in Wastewater Treatment Systems

Introduction to SVM and RVM

Generative Kernel PCA

Linear & nonlinear classifiers

Analysis of Fast Input Selection: Application in Time Series Prediction

Support Vector Machines

Kernel Methods. Machine Learning A W VO

Functional Radial Basis Function Networks (FRBFN)

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Bagging and Other Ensemble Methods

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Intelligent Modular Neural Network for Dynamic System Parameter Estimation

Formulation with slack variables

How New Information Criteria WAIC and WBIC Worked for MLP Model Selection

Logistic Regression and Boosting for Labeled Bags of Instances

Support Vector Machines

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Foreign Exchange Rates Forecasting with a C-Ascending Least Squares Support Vector Regression Model

Spatio-temporal feature selection for black-box weather forecasting

The Perceptron Algorithm

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

A Note on the Scale Efficiency Test of Simar and Wilson

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Perplexity-free t-sne and twice Student tt-sne

Least Squares SVM Regression

The Intrinsic Recurrent Support Vector Machine

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Review: Support vector machines. Machine learning techniques and image analysis

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support Vector Machines.

Model Selection for LS-SVM : Application to Handwriting Recognition

LMS Algorithm Summary

Classifier Complexity and Support Vector Classifiers

New Least Squares Support Vector Machines Based on Matrix Patterns

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Solving optimal control problems by PSO-SVM

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Introduction to Support Vector Machines

Lecture 10: A brief introduction to Support Vector Machine

Linear Dependency Between and the Input Noise in -Support Vector Regression

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Machine Learning 2010

SMO Algorithms for Support Vector Machines without Bias Term

Joint distribution optimal transportation for domain adaptation

Support Vector Machines: Maximum Margin Classifiers

Kernels and the Kernel Trick. Machine Learning Fall 2017

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

Lazy learning for control design

SYSTEM identification treats the problem of constructing

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification

Linear Regression Linear Regression with Shrinkage

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

SVMs, Duality and the Kernel Trick

Predict GARCH Based Volatility of Shanghai Composite Index by Recurrent Relevant Vector Machines and Recurrent Least Square Support Vector Machines

arxiv:cs/ v1 [cs.lg] 8 Jan 2007

CMU-Q Lecture 24:

Support Vector Machine via Nonlinear Rescaling Method

Fisher Memory of Linear Wigner Echo State Networks

Tutorial on Approximate Bayesian Computation

Linear Models for Regression. Sargur Srihari

Spatio-temporal feature selection for black-box weather forecasting

Stat 502X Exam 2 Spring 2014

Neural Networks and the Back-propagation Algorithm

Support Vector Machine For Functional Data Classification

Holdout and Cross-Validation Methods Overfitting Avoidance

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

The Curse of Dimensionality in Data Mining and Time Series Prediction

Transcription:

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 Fast Bootstrap for Least-square Support Vector Machines A. Lendasse, G. Simon 2, V. Wertz 3, M. Verleysen 2 Helsinki University of echnology Laboratory of Computer and Information Science Otaniemi 0204, Finland, lendasse@cis.hut.fi. 2 Université catholique de Louvain, Electricity Dept., 3 pl. du Levant, B-348 Louvain-la-euve, Belgium, {simon, verleysen}@dice.ucl.ac.be. 3 Université catholique de Louvain, CESAME, 4 av. G. Lemaître B-348 Louvain-la-euve, Belgium, wertz@auto.ucl.ac.be. Abstract. he Bootstrap resampling method may be efficiently used to estimate the generalization error of nonlinear regression models, as artificial neural networks and especially Least-square Support Vector Machines. evertheless, the use of the Bootstrap implies a high computational load. In this paper we present a simple procedure to obtain a fast approximation of this generalization error with a reduced computation time. his proposal is based on empirical evidence and included in a simulation procedure.. Introduction Model design has raised a considerable research effort since decades, on linear models, nonlinear ones, artificial neural networks, and many others. Model design includes the necessity to compare models (for example of different complexities) in order to select the best model among several ones. For this purpose, it is necessary to obtain a good approximation of the generalization error of each model (the generalization error being the average error that the model would make on an infinitesize and unknown test set independent from the learning one). owadays there exist some well-known and widely used methods able to fulfill this task [-6] and the Bootstrap is one of the more performing methods. he main problem when using the Bootstrap is the computation of the results that is really time consuming. In this paper we will show that, under reasonable and simple hypotheses usually fulfilled in real world applications, it is possible to provide a good estimate of the Bootstrap results with a considerably reduced number of modeling stages, thus saving a considerable amount of computation time. Previously, the Fast Bootstrap has been developed for the Radial Basis Functions etworks (RBF). In this paper, the Fast Bootstrap is extended to Least-square Support Vector Machines (LS-SVM) [7-9]. he LS-SVM is presented in section 2. hen, the Bootstrap is presented in section 3 and finally, the Fast Bootstrap is introduced in section 4 using a toy example.

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 2. Least-Square Support Vector Machines (LS-SVM) Consider a given training set of data points {x k, y k } with x k a n-dimensional input and y t a -dimensional output. In feature space SVM models take the form: yx ( ) ωϕ ( x) + b, () where the nonlinear mapping ϕ(.) maps the input data into a higher dimensional feature space. In least squares support vector machines for function estimation, the following optimization problem is formulated: subect to the equality constraints min J ( ω, e) ω ω+γ e ω, e 2 2 2 k, (2) k yx ( ) ω ϕ ( x) + b+ e, k,...,., (3) his corresponds to a form of ridge regression. he Lagrangian is given by k k{ k k k } L( ω, b, e; α ) J( ω, e) α ω ϕ ( x ) + b+ e y k with Lagrange multipliers α k. he conditions for optimality are k L 0 ω ω L 0 α e α ϕ( x ) L 0 0, (4) k k k b k k, (5) k γek L 0 ω ϕ( xk ) + b + ek yk αk for k... After elimination of e k and ω, the solution is given by the following set of linear equations r 0 b 0 r I α y, (6) Ω+γ where y [y ; ; y ], r [; ; ], α [α;...;α Ν ] and the Mercer condition Ωkl ϕ( xk ) ϕ( xl ) ψ( xk, xl ) k, l,...,, (7) has been applied. his finally results into the following LS-SVM model for function estimation yx ( ) ωϕ ( x) + b, (8) where α and b are the solution to (6). For the choice of the kernel function ψ(.,.) one has several possibilities [7-9]. In this paper, Gaussian kernels are used: ψ(x, x k ) exp{- x-x k 2 /σ 2 } and the remaining unknowns are σ and γ. hese model hyperparameters will be selected according to a model selection procedure detailed in the following of this paper. α

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 3. Bootstrap for Model Structure Selection he bootstrap [4] is a resampling method that has been developed in order to estimate some statistical parameters (like the mean, the variance, etc). In the case of model structure selection, the parameter to be estimated is the generalization error (i.e. the average error that the model would make on an infinite-size and unknown test set). When using the bootstrap, this error is not computed directly. Rather the bootstrap estimates the difference between the generalization error and the training error calculated on the initial data set. his difference is called the optimism. he estimated generalization error will thus be the sum of the training error and of the estimated optimism. he training error is computed using all data from the training set. he optimism is estimated using a resampling technique based on drawing within the training set with replacement. Using notation where the first exponent A denotes the training set while the second exponent A indicates the set used to estimate the model error, the Bootstrap method can be decomposed in the following stages:. From the initial set I, one randomly draws points with replacement. he new set A has thus the same size that the initial set and constitutes a new training set. his stage is called the resampling. 2. he training of the various model structures q is done on the same training set A. One can compute the training error on this single set: A, A i (, ( )) E q q q A ( (, ( )) ) 2 A h xi q yi, (9) with the model parameters after learning, h q the q th model that is used, x i A the i th input vector from set A, y i A the i th output and the number of elements in this set. Index means that the error is evaluated on the th new sample. 3. One can also compute the validation error on the initial sample which now plays the role of the validation set VI: A, V i (, ( )) q V V ( h ( x, ( )) ) 2 i q yi E q q. (0) Here again index means that the error is evaluated on the th new sample. 4. he difference between these two errors (9) and (0) is calculated and defined as the optimism by Efron [6]:

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 optmism ( q, E A, V ( q, E A, A ( q,. () 5. Steps to 4 are repeated J times. he estimate of the optimism is then calculated as the average of the J values from (): J optmism ˆ ( q). (2) J 6. he training of the q model structures is done on the initial data set I and the training error is calculated on the same set. wo exponents I are used to indicate that the initial data set is used for both training and error estimation: E ( q, I, I i optmism ( q, q I I ( h ( x, ( )) ) 2 i q yi 7. An approximation of the generalization error is finally obtained by: ˆ I, I. (3) Egen ( q) optimism ˆ ( q) + E ( q, ). (4) Ê gen (q) is an approximation of the generalization error for each model structure q. he best structure that will be selected is the one that minimizes this estimate of the generalization error. 4. Fast Bootstrap and oy Example In this section, an improvement of the Bootstrap methods is presented. his method is called Fast Bootstrap and allows reducing the computational time of the traditional Bootstraps [0-]. his method is based on experimental observations and is presented on a function approximation example (represented in Figure ). In this example, 200 inputs x has been drawn using a uniform random law between 0 and. he output y has been generated by the function: y sin(5 x) + sin(5 x) + sin(25 x) +ε, (9) with ε a uniform random law between [-0.5-0.5]. y 3 2 0 - -2-3 0 0.2 0.4 0.6 0.8 x Figure : Example of function (dots) and its approximation (solid line).

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 A LS-SVM is used to approximate this function. wo parameters still have to be determined, namely σ and γ. For a fixed σ 0., the optimal γ is determined using the Bootstrap method. he set of γ that is tested ranges from 0 to 00 with a 0. step. he number of resamplings in (2) is equal to 00. he apparent error defined in (3) is computed and represented in Fig.2A. he optimism is computed using (2) and represented in Fig.2B. he generalization error is computed using (5) and represented in Fig.3. he value of γ that minimizes the generalization error is equal to. In Fig.2B, the optimism is very close from an exponential function of γ. his fact has been observed on other examples and benchmarks. hen, using this information, the number of values of γ to be tested can be considerably reduced. In this example, this set is indeed reduced to 5 to 00 with an incremental step of 5. An exponential approximation of the optimism is used. hanks to the approximation, the number of Bootstraps is also reduced by a factor 0 in (2). he new optimism and generalization error are represented as dotted lines in Fig.2B and Fig.3 respectively. he optimum is close to the one that has been selected by the Bootstrap method. his new method, denoted Fast Bootstrap, is in this toy example 500 times quicker than the traditional Bootstrap. In other examples, the Fast Bootstrap is at least 00 times quicker than traditional Bootstrap for the selection of the γ parameter for a LS-SVM, without loss of precision. Apparent Error 0. 0.095 0.09 0.085 0.08 0.075 0 5 0 5 20 25 30 35 40 gamma Optimism 0.05 0.045 0.04 0.035 0.03 20 30 40 50 60 70 gamma Figure 2A: Apparent Error with respect to γ. Figure 2B: Optimism with respect to γ using Bootstrap (solid line) and Fast Bootstrap (dashed line). 0.0894 Generalisation Error 0.0892 0.089 0.0888 0.0886 5 0 5 20 25 30 35 40 gamma Figure 3: Generalization Error with respect to γ using Bootstrap (solid line) and Fast Bootstrap (dashed line).

ESA'2004 proceedings - European Symposium on Artificial eural etworks Bruges (Belgium), 28-30 April 2004, d-side publi., ISB 2-930307-04-8, pp. 525-530 5. Conclusions In this paper we have shown that the optimism term of the Bootstrap estimator of the prediction error is approximatively an exponential with respect to the γ parameter of a LS-SVM. According to the results shown here and to other ones not illustrated in this paper, we recommend a conservative value of 0 for the number of Bootstrap replications before stopping the approximation computation. We would like to emphasize on the fact that the very limited loss of accuracy is balanced by a considerable saving in computation load, this last fact being the main disadvantage of the Bootstrap resampling procedure in practical situations. his saving is due to the reduced number of tested models and to the limited number of Bootstrap replications. Acknowledgements Michel Verleysen is Senior Research Associate of the Belgian ational Fund for Scientific Research (FRS). G. Simon is funded by the Belgian F.R.I.A. Part the work of V. Wertz is supported by the Interuniversity Attraction Poles (IAP), initiated by the Belgian Federal State, Ministry of Sciences, echnologies and Culture. he scientific responsibility rests with the authors. References [] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd Int. Symp. on information heory, 267-8, Budapest, 973. [2] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6, 46-464, 978. [3] L. Lung, System Identification - heory for the user, 2nd Ed, Prentice Hall, 999. [4] B. Efron, R. J. ibshirani, An introduction to the Bootstrap, Chapman & Hall, 993. [5] M. Stone, An asymptotic equivalence of choice of model by cross-validation and Akaike s criterion, J. Royal. Statist. Soc., B39, 44-7, 977. [6] R. Kohavi, A study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Proc. of the 4th Int. Joint Conf. on A.I., Vol. 2, Canada, 995. [7] J.A.K. Suykens,. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific, Singapore, 2002. [8] J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, eural Processing Letters, vol. 9, no. 3, Jun. 999, pp. 293-300.98-86. [9] J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Vandewalle, Weighted least squares support vector machines: robustness and sparse approximation', eurocomputing, vol. 48, no. -4, Oct. 2002, pp. 85-05. [0] G. Simon, A. Lendasse, V. Wertz, M. Verleysen, Fast Approximation of the Bootstrap for Model Selection, ESA 2003, European Symposium on Artificial eural etworks, Bruges (Belgium), 23-25 April 2003, pp. 99-06. [] G. Simon, A. Lendasse, M. Verleysen, Bootstrap for Model Selection: Linear Approximation of the Optimism, IWA 2003, International Work-Conference on Artificial and atural eural etworks, Mao, Menorca (Spain), June 3-6, 2003. Computational Methods in eural Modeling, J. Mira, J.R. Alvarez eds, Springer- Verlag, Lecture otes in Computer Science 2686, 2003, pp. I82-I89.