Statistical Learning Theory

Similar documents
Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Statistical Learning Reading Assignments

Lecture 21. Hypothesis Testing II


Universal Learning Technology: Support Vector Machines

Midterm 1. Every element of the set of functions is continuous

FORMULATION OF THE LEARNING PROBLEM

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

LECTURE 15 + C+F. = A 11 x 1x1 +2A 12 x 1x2 + A 22 x 2x2 + B 1 x 1 + B 2 x 2. xi y 2 = ~y 2 (x 1 ;x 2 ) x 2 = ~x 2 (y 1 ;y 2 1

Stochastic dominance with imprecise information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Regression and Its Applications

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr

= w 2. w 1. B j. A j. C + j1j2

Introduction to machine learning

The best expert versus the smartest algorithm

Linear & nonlinear classifiers

Roots of Unity, Cyclotomic Polynomials and Applications

Lecture 2: Basic Concepts of Statistical Decision Theory

Machine Learning Theory (CS 6783)

Linear & nonlinear classifiers

Realization of set functions as cut functions of graphs and hypergraphs

Error Empirical error. Generalization error. Time (number of iteration)

Fuzzy relational equation with defuzzication algorithm for the largest solution

Machine Learning Lecture 7

Machine Learning

p(z)

Regularization and statistical learning theory for data analysis

Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f

First Year Seminar. Dario Rosa Milano, Thursday, September 27th, 2012

Classifier Performance. Assessment and Improvement

Discriminative Models

Discriminative Models

BRUTE FORCE AND INTELLIGENT METHODS OF LEARNING. Vladimir Vapnik. Columbia University, New York Facebook AI Research, New York

Machine Learning

Lecture 7 Introduction to Statistical Decision Theory

Dynamical Systems & Scientic Computing: Homework Assignments

Winter Lecture 10. Convexity and Concavity

Solving Classification Problems By Knowledge Sets

Chapter 6 The Structural Risk Minimization Principle

Understanding Generalization Error: Bounds and Decompositions

ECE 275A Homework 7 Solutions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Managing Uncertainty

Today. Calculus. Linear Regression. Lagrange Multipliers

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Pattern Recognition. Parameter Estimation of Probability Density Functions

Lecture 3: Introduction to Complexity Regularization

IFT Lecture 7 Elements of statistical learning theory

Intelligent Systems Statistical Machine Learning

3 Stability and Lyapunov Functions

The PAC Learning Framework -II

Machine Learning Theory (CS 6783)

4.6 Montel's Theorem. Robert Oeckl CA NOTES 7 17/11/2009 1

7 Influence Functions

Principles of Risk Minimization for Learning Theory

Pp. 311{318 in Proceedings of the Sixth International Workshop on Articial Intelligence and Statistics

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Foundations of Machine Learning

Copulas. MOU Lili. December, 2014

Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions

Lifting to non-integral idempotents

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Optimization Methods for Machine Learning (OMML)

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Intelligent Systems Statistical Machine Learning

Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

Maximum Likelihood Estimation

Bayes Decision Theory

Machine Learning. Lecture 9: Learning Theory. Feng Li.

CONDITIONAL ACCEPTABILITY MAPPINGS AS BANACH-LATTICE VALUED MAPPINGS

Support Vector Method for Multivariate Density Estimation

Ordinary Differential Equations (ODEs)

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Estimating the sample complexity of a multi-class. discriminant model

Multivariate statistical methods and data mining in particle physics

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Density Estimation: ML, MAP, Bayesian estimation

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

Lecture Support Vector Machine (SVM) Classifiers

Conditional Value-at-Risk (CVaR) Norm: Stochastic Case

An Inverse Problem for Gibbs Fields with Hard Core Potential

Transformation Rules for Locally Stratied Constraint Logic Programs

Implicit Functions, Curves and Surfaces

Lecture 2: Review of Prerequisites. Table of contents

Empirical Risk Minimization

Nader H. Bshouty Lisa Higham Jolanta Warpechowska-Gruca. Canada. (

Computational Learning Theory

Presentation in Convex Optimization

Machine Learning 4771

Introduction to Machine Learning (67577) Lecture 3

March 16, Abstract. We study the problem of portfolio optimization under the \drawdown constraint" that the

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

Statistical Learning Theory Fundamentals Miguel A. Veganzones Grupo Inteligencia Computacional Universidad del País Vasco (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 1 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 2 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 3 / 47

Approaches to the learning problem Learning problem: the problem of choosing the desired dependence on the basis of empirical data. Two approaches: To choose an aproximating function from a given set of functions. To estimate the desired stochastic dependences (densities, conditional densities, conditional probabilities). (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 4 / 47

First approach To choose an aproximating function from a given set of functions It's a problem rather general. Three subproblems: pattern recognition, regression, density estimation. Based on the idea that the quality of the choosen function can be evaluated by a risk functional. Problem of minimizing the risk functional on the basis of empirical data. (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 5 / 47

Second approach To estimate the desired stochastic dependences Using estimated stochastic dependence, the pattern recognition, the regression and the density estimation problems can be solved as well. Requires solution of integral equations for determining these dependences when some elements of the equation are unknown. It gives much more details but it's an ill-posed problem. (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 6 / 47

General problem of learning from examples Figure: A model of learning by examples (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 7 / 47

Generator The Generator, G, determines the environment in which the Supervisor and the Learning Machine act. Simplest environment: G generates the vector x X independently and identically distributed (i.i.d.) according to some unknown but xed probability distribution function, F(x). (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 8 / 47

Supervisor The Supervisor, S, transforms the input vectors x into the output values y. Supposition: S returns the output y on the vector x according to a conditional distribution function, F(y x), which includes the case when the supervisor uses some function y = f (x). (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 9 / 47

Learning Machine The Learning Machine, LM, observes the l pairs (x 1,y 1 ),...,(x l,y l ), the training set, which is drawn randomly and independently according to a joint distribution function F(x, y) = F(y x)f(x). Using the training set, LM constructs some operator which will be used for prediction of the supervisor's answer y i on any specic vector x i generated by G. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 10 / 47

Two dierent goals 1 To imitate the supervisor's operator: try to construct an operator which provides for a given G, the best predictions to the supervisor's outputs. 2 To identify the supervisor's operator: try to construct an operator which is close to the supervisor's operator. Both problems are based on the same general principles. The learning process is a process of choosing an appropiate function from a given set of functions. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 11 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 12 / 47

Functional Among the totally of possible functions, one looks for the one that satises the given quality criterion in the best possible manner. Formally: on the subset Z of the vector space R n, a set of admissible functions {g(z)}, z Z, is given and a functional R = R(g(z)) is dened. It's required to nd the function g (z) from the set {g(z)} which minimizes the functional R = R(g(z)). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 13 / 47

Two cases 1 When the set of functions {g(z)} and the functional R(g(z)) are explicitly given: calculus of variations. 2 When a p.d.f. F(z) is dened on Z and the functional is dened as the mathematical expectation R(g(z)) = L(z,g(z))dF(z) (1) where function L(z,g(z)) is integrable for any g(z) {g(z)}. The problem is then, to minimize (1) when F(z) is unknown but the sample z 1,...,z l of observations is available. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 14 / 47

Problem denition 1 Imitation problem: how can we obtain the minimum of the functional in the given set of functions? 2 Identication problem: what should be minimized in order to select from the set {g(z)} a function which will guarantee that the functional (1) is small? The minimization of the functional (1) on the basis of empirical data z 1,...,z l is one of the main problems of mathematical statistics. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 15 / 47

Parametrization The set of functions {g(z)} will be given in a parametric form {g(z,α),α Λ}. The study of only parametric functions is not a restriction on the problem, since the set Λ is arbitrary: a set of scalar quantities, a set of vectors or a set of abstract elements. The functional (1) can be rewritted as R(α) = Q(z,α)dF (z), α Λ (2) where Q(z, α) = L(z, g(z, α)) is called the loss function. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 16 / 47

The expected loss It's assumed that each function Q(z,α ) determines the ammount of the loss resulting from the realization of the vector z for a xed α = α. The expected loss with respect to z for the function Q(z,α ) is determined by the integral R(α ) = which is called the risk functional or the risk. Q(z,α )df (z) (3) (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 17 / 47

Problem redenition Denition The problem is to choose in the set Q(z,α),α Λ, a function Q(z,α 0 ) which minimizes the risk when the probability distribution function is unknown but random independent observations z 1,...,z l are given. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 18 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 19 / 47

Informal denition Denition A supervisor observes ocurring situations and determines to which of k classes each one of them belongs. It is required to construct a machine which, after observing the supervisor's classication, carries out the classication approximately in the same manner as the supervisor. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 20 / 47

Formal denition Denition In a certain environment characterized by a p.d.f. F(x), situation x appears randomly and independently. the supervisor classies each situation into one of k classes. We assume that the supervisor carries out this classication by F (ω x), where ω {0,1,...,k 1}. Neither F(x) nor F (ω,x) are known, but they exist. Thus, a joint distribution function F (ω,x) = F (ω x)f(x) exists. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 21 / 47

Loss function Let a set of functions φ (x,α),α Λ, which take only k values {0,1,...,k 1} (a set of decidion rules) be given. We shall consider the simplest loss function: { 1 i f ω = φ L(ω,φ) = 0 i f ω φ The problem of pattern recognition is to minimize the functional R(α) = L(ω,φ (x,α))df (ω,x) on the set of functions φ (x,α),α Λ, where the p.d.f. F (ω,x) is unknown but a random independent sample of pairs (x 1,ω 1 ),...,(x l,ω l ) is given. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 22 / 47

Restrictions The problem of pattern recognition has been reduced to the problem of minimizing the risk on the basis of empirical data, where the set of loss functions Q(z,α),α Λ, is not arbitrary as in the general case. The following restrictions are imposed: The vector z consist of n + 1 coordinates: coordinate ω (which takes a nite number of values) and n coordinates x 1,x 2,...,x n which form the vector x. The set of functions Q(z,α),α Λ is given by Q(z,α) = L(ω,φ (x,α)), α Λ and also takes on only a nite number of values. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 23 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 24 / 47

Stochastic dependences There exist relationships (stochastic dependences) where to each vector x there corresponds a number y which we obtain as a result of random trials. F (y x) expresses that stochastic relationship. Estimating the stochastic dependence based on the empirical data (x 1,y 1 ),...,(x l,y l ) is a quite dicult problem (ill-posed problem). However, the knowledge of F (y x) is often not required and it's sucient to determine one of its characteristics. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 25 / 47

Regression The function of conditional mathematical expectaction is called the regression. r (x) = ydf (y x) Estimate the regression in the set of functions f (x,α),α Λ, is referred to as the problem of regression estimation. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 26 / 47

Conditions The problem of regression estimation is reduced to the model of minimizing risk based on empirical data under the following conditions: y 2 df (y,x) < r 2 (x)df (y,x) < On the set f (x,α),α Λ, f (x,α) L 2, the minimum (if exists) of the functional is attained at: R(α) = (y f (x,α)) 2 df (y,x) The regression function if r(x) f (x,α),α Λ. The function f (x,α ) which is the closest to r(x) in the L 2 metric if r(x) / f (x,α),α Λ. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 27 / 47

Demonstration Denote f (x,α) = f (x,α) r (x). The functional can be rewritten as: R(α) = (y r (x)) 2 df (y,x) + 2 f (x,α)(y r (x)) 2 df (y,x) ( f (x,α)) 2 df (y,x) (y r (x)) 2 df (y,x) does not depend of α. f (x,α)(y r (x)) 2 df (y,x) = 0. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 28 / 47

Restrictions The problem of estimating the regression may be also reduced to the scheme of minimizing the risk. The following restrictions are imposed: The vector z consist of n + 1 coordinates: coordinate y and n coordinates x 1,x 2,...,x n which form the vector x. However, the coordinate y as well as the function f (x,α) may take any value on the interval (, ). The set of functions Q(z,α),α Λ is on the form Q(z,α) = (y f (x,α)) 2, and can take on arbitrary non-negative values. α Λ (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 29 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 30 / 47

Problem denition Let p(x,α),α Λ, be a set of probability densities containing the required density: Considering the functional: p(x,α 0 ) = df(x) dx R(α) = ln p(x, α)df (x) (4) the problem of estimating the density in the L 1 metric is reduced to the minimization of the functional (4) on the basis of empirical data (Fisher-Wald formulation). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 31 / 47

Assertions (I) Functional's minimum The minimum of the functional (4) (if it exists) is attained at the functions p(x,α ) which may dier from p(x,α 0 ) only on a set of zero measure. Demostration: Jensen's inequality implies: ln p(x,α) p(x,α 0 ) df (x) ln p(x,α) p(x,α 0 ) p(x,α 0)dx = ln1 = 0 So, the rst assertion is proved by: ln p(x,α) p(x,α 0 ) df (x) = ln p(x,α)df (x) ln p(x,α 0 )df (x) 0 (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 32 / 47

Assertions (II) The Bregtanolle-Huber inequality The Bregtanolle-Huber inequality: p(x,α) p(x,α 0 ) d (x) 2 1 exp{r(α 0 ) R(α)} is valid. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 33 / 47

Conclusion The functions p(x,α ) which are ε-close to the minimum: R(α ) inf α Λ R(α) < ε will be 2 1 exp{ ε}-close to the required density in the L 1 metric. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 34 / 47

Restrictions The density estimation problem in the Fisher-Wald setting is that the set of functions Q(z,α) is subject to the following restrictions: The vector z coincides with the vector x. The set of functions Q(z,α),α Λ, is on the form Q(z,α) = ln p(x,α) where p(x,α) is a set of density functions. The loss function takes on arbitrary values on the interval (, ). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 35 / 47

Outline 1 Introduction Learning problem Statistical learning theory 2 The pattern recognition problem The regression problem The density estimation problem (Fisher-Wald setting) Induction principles for minimizing the risk functional on the basis of empirical data (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 36 / 47

Introduction We have seen that the pattern recognition, the regression and the density estimation problems can be reduced to this scheme by specifying a loss function in the risk functional. Now, how can we minimize the risk functional when the density function is unknown? Classical: empirical risk minimization (ERM). New one: structural risk minimization (SRM). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 37 / 47

Empirical Risk Minimization Denition Instead of minimizing the risk functional: R(α) = Q(z,α)dF (z), minimize the emprical risk functional R emp (α) = 1 l l i=1 Q(z l,α), α Λ α Λ on the basis of empirical data z 1,...,z l obtained according to a distribution function F(z). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 38 / 47

Empirical Risk Minimization Considerations The functional is explicitely dened and it is subject to minimization. The problem is to establish conditions under which the minimum of the empirical risk functional, Q(z,α f ), is closed to the desired one Q(z,α 0 ). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 39 / 47

Empirical Risk Minimization Pattern recognition problem (I) The pattern recognition problem is considered as the minimization of the functional R(α) = L(ω,φ (x,α))df (ω,x), α Λ on a set of functions φ (x,α),α Λ, that take on only a nite number of values, on the basis of empirical data (x 1,ω 1 ),...,(x l,ω l ). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 40 / 47

Empirical Risk Minimization Pattern recognition problem (II) Considering the empirical risk functional R emp (α) = 1 l l i=1 L(ω i,φ (x i,α)), α Λ When L(ω i,α) {0,1} (0 if ω = φ and 01 if ω φ ), minimization of the empirical risk functional is equivalent to minimizing the number of training errors. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 41 / 47

Empirical Risk Minimization Regression problem (I) The resgression problem is considered as the minimization of the functional R(α) = (y f (x,α)) 2 df (y,x), α Λ on a set of functions f (x,α),α Λ, on the basis of empirical data (x 1,y 1 ),...,(x l,y l ). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 42 / 47

Empirical Risk Minimization Regression problem (II) Considering the empirical risk functional R emp (α) = 1 l l i=1 (y i f (x i,α)) 2, α Λ The method of minimizing the empirical risk functional is known as the Least-Squares method. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 43 / 47

Empirical Risk Minimization Density estimation problem (I) The density estimation problem is considered as the minimization of the functional R(α) = ln p(x,α)df (x), α Λ on a set of densities p(x,α),α Λ, using i.i.d. empirical data x 1,...,x l. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 44 / 47

Empirical Risk Minimization Density estimation problem (II) Considering the empirical risk functional R emp (α) = l i=1 ln p(x,α), α Λ it is the same solution which comes from the Maximum Likelihood method (in the Maximum Likelihood method a plus sign is used in front of the sum instead of the minus sign). (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 45 / 47

Appendix For Further Reading The Nature of Statistical Learning Theory. Vladimir N. Vapnik. ISBN: 0-387-98780-0. 1995. Statistical Learning Theory. Vladimir N. Vapnik. ISBN: 0-471-03003-1. 1998. (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 46 / 47

Appendix Questions? Thank you very much for your attention. Contact: Miguel Angel Veganzones Grupo Inteligencia Computacional Universidad del País Vasco - UPV/EHU (Spain) E-mail: miguelangel.veganzones@ehu.es Web page: http://www.ehu.es/computationalintelligence (Grupo Inteligencia Vapnik Computacional Universidad del PaísUPV/EHU Vasco) 47 / 47