Functional Radial Basis Function Networks (FRBFN)

Similar documents
Functional Preprocessing for Multilayer Perceptrons

Support Vector Machine For Functional Data Classification

Multi-Layer Perceptrons for Functional Data Analysis: a Projection Based Approach 0

Functional Multi-Layer Perceptron: a Nonlinear Tool for Functional Data Analysis

Bootstrap for model selection: linear approximation of the optimism

Gaussian Fitting Based FDA for Chemometrics

Using the Delta Test for Variable Selection

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method

Mutual information for the selection of relevant variables in spectrometric nonlinear modelling

Time series prediction

Immediate Reward Reinforcement Learning for Projective Kernel Methods

The Curse of Dimensionality in Data Mining and Time Series Prediction

Fast Bootstrap for Least-square Support Vector Machines

Heterogeneous mixture-of-experts for fusion of locally valid knowledge-based submodels

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Effective Dimension and Generalization of Kernel Learning

Non-orthogonal Support-Width ICA

Neural Networks Lecture 4: Radial Bases Function Networks

PRINCIPAL COMPONENTS ANALYSIS

Curvilinear Components Analysis and Bregman Divergences

Gopalkrishna Veni. Project 4 (Active Shape Models)

Vector quantization: a weighted version for time-series forecasting

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

The Mahalanobis distance for functional data with applications to classification

Linear & nonlinear classifiers

In the Name of God. Lectures 15&16: Radial Basis Function Networks

COMP 558 lecture 18 Nov. 15, 2010

Linear Regression Linear Regression with Shrinkage

Kernel Methods & Support Vector Machines

Linear Regression Linear Regression with Shrinkage

Time series forecasting with SOM and local non-linear models - Application to the DAX30 index prediction

Linear Regression and Its Applications

SPARSE signal representations have gained popularity in recent

CS 195-5: Machine Learning Problem Set 1

Focusing on structural assumptions in regression on functional variable.

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Computer simulation on homogeneity testing for weighted data sets used in HEP

Motivating the Covariance Matrix

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

D. Shepard, Shepard functions, late 1960s (application, surface modelling)

L11: Pattern recognition principles

Online multiclass learning with bandit feedback under a Passive-Aggressive approach

The Curse of Dimensionality for Local Kernel Machines

COMPONENTWISE CLASSIFICATION OF FUNCTIONAL DATA

Hybrid HMM/MLP models for time series prediction

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

PLS discriminant analysis for functional data

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

Robust Probabilistic Projections

Basic Principles of Unsupervised and Unsupervised

Gaussian Processes (10/16/13)

A Partial Differential Equation Approach to Image Zoom

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method

Modelling and Forecasting nancial time series of tick data by functional analysis and neural networks

On the Identifiability of the Functional Convolution. Model

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Non-negative matrix factorization with fixed row and column sums

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Introduction to Machine Learning

Massachusetts Institute of Technology

Mixtures of Robust Probabilistic Principal Component Analyzers

Convergence of a Neural Network Classifier

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

Terminology for Statistical Data

y(n) Time Series Data

Multivariate Regression Modeling for Functional Data

Linear Classifiers as Pattern Detectors

Neural Networks and the Back-propagation Algorithm

Chemometrics: Classification of spectra

Notes on Discriminant Functions and Optimal Classification

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Deep Feedforward Networks

Modeling and Predicting Chaotic Time Series

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

FORMULATION OF THE LEARNING PROBLEM

Table of Contents. Multivariate methods. Introduction II. Introduction I

Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

On Riesz-Fischer sequences and lower frame bounds

Piecewise constant approximation and the Haar Wavelet

The Perceptron. Volker Tresp Summer 2014

Nonlinear Classification

Nonparametric Inference for Auto-Encoding Variational Bayes

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Kernel Methods and Support Vector Machines

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions

Non-parametric Residual Variance Estimation in Supervised Learning

Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Cheng Soon Ong & Christian Walder. Canberra February June 2018

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Neural networks and optimization

Transcription:

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 Functional Radial Basis Function Networks (FRBFN) N. Delannay 1, F. Rossi 2, B. Conan-Guez 2, M. Verleysen 1 1 Université catholique de Louvain (UCL) DICE - Machine Learning Group Place du levant 3, 1348 Louvain-la-Neuve, Belgium {delannay, verleysen}@dice.ucl.ac.be 2 INRIA - Projet AxIS - Université Paris-Dauphine Fabrice.Rossi@dauphine.fr, Brieuc.Conan-Guez@inria.fr Abstract. There has been recently a lot of interest for functional data analysis [1] and extensions of well-known methods to functional inputs (clustering algorithm [2], non-parametric models [3], MLP [4]). The main motivation of these methods is to benefit from the enforced inner structure of the data. This paper presents how functional data can be used with RBFN, and how the inner structure of the former can help designing the network. 1 Introduction In a variety of problems, the data at our disposal are samplings of a functional entity. Working with functional data instead of vectorial data can be an efficient way to account for this within-data structure and could help resolving the modeling problem. In addition, radial basis function networks (RBFN) have shown to be powerful tools to construct non-linear regression models. In this paper, we shall see how the RBFN model can be extended to accept functional data on the input side. Furthermore, we will explain how the functional features can direct the design of the network. In section 2, functional data are presented and some examples are given. Section 3 recalls what the RBFN is and how it can be designed. The extension to functional data input is shown in section 4. Section 5 applies ideas expressed in this paper on spectrometric data. Section 6 gives concluding comments. N.D is Research Fellow of the Belgian National Fund for Scientific Research. M.V. is Senior Research Associate of the Belgian National Fund for Scientific Research.

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 2 Functional data Consider a set of N observations { } N i=1, where the ith observation is itself composed of p i pairs (t (j) i, x (j) i ) pi j=1. The t(j) i T may be considered as the argument of corresponding value x (j) i R, with T an open set in R D. For simplicity, we will assume T R. Each observation is a functional entity: : t T (t) R. (1) The idea behind functional data is to compute the underlying function of each observation and continue to work with the functional expression, instead of the p i -dimensional vector. The usual way to handle functional data is to use a basis expansion such that (t) = K k=1 ξ (k) i h k (t) = ξ i T h(t) (2) where h(t) = [h 1 (t)...h K (t)] T is a set of basis functions defined over T, ξ i is the vector of coeficients characterizing observation i and (t) is a good approximation of the true function (t). Well-known basis functions are Fourier series, B-splines and wavelets. Figure 1 shows three examples for which functional data should be used. Original observations are sets of points, but it is obvious that there is a functional entity behind each observation. Using a functional smooth representation is of particular interest when the data are sampled from a smooth function, when the sampling is different between two observations, or when a smooth function is polluted by noise, as illustrated in figures 1(a), 1(b) and 1(c) respectively..4.6.25.2.5.2.4.3.15 -.2.2 x j.1 -.4.1.2.4.6.8 1.2.4.6.8 1.2.4.6.8 1 (a) t (b) t (c) t Figure 1: (a) the observation is obviously a smooth function, (b) two observations and x l having different sampling arguments t (j) i and t (j) l, (c) a sampled curve with additional noise. 3 RBFN model The general problem of modelling is to estimate the relation between an input variable x R p and an output variable y R from a set of observed data,

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 (, y i ) N i=1. With RBFN, the relation has the following form: ỹ = M m Φ m (d m (x,c m )), (3) m=1 where Φ m ( ) are radial basis functions, c m are centers belonging to the space of x, d m (, ) are associated distances and m are weighting coefficients. According to the universal approximation property of RBFN [5], an adequate choice of these parameters leads to a good approximation ỹ of y. As such, RBFN may be applied to functional data instead of vectorial ones. One has just to define a functional measure of distance d m (, ), leading to so-called Functional RBFN. There is too much freedom in such network to tackle directly its design. A way to proceed can be to start by choosing a family of RBFN and testing different definitions of the distance d m (.,.). Next, the other parameters are computed with a fast algorithm even if it is suboptimal with respect to a global optimization procedure [6]. It should be mentioned that one of the main advantages of RBFN is their simplified training, compared to MLP (Multi Layers Perceptrons) for example. Indeed, centers c m and scaling factors σ m may be computed by an unsupervised learning or a deterministic algorithm. Model learning then becomes a linear problem in coefficients m, leading to low computational cost and avoiding local minima issues. Despite this, the RBFN performs very efficiently if well designed. A classical choice for the distance is d m (x,c m ) = 1 σ m ( (x cm ) T W(x c m ) ) 1/2, (4) where σ m is a scaling factor adjusted with the learning algorithm and W is a positive semi-definite matrix. The usual Euclidian distance is obtained with σ m = 1 and W = I the identity matrix. It is interesting to note that working with W = I on linearly preprocessed vectors, x = Ax, c m = Ac m, is the same as working on original vectors x and c m with W = A T A. This remark shows that it can be equivalent to think in terms of distance definition or in terms of preprocessing of the original data. 4 RBFN with functional data Let us consider now that the input variable s a function on domain T. As mentioned earlier, it is straightforward to extend the RBFN model (3). The functional extension of (4) is d m (x(t), c m (t)) = 1 ( 1/2 w(t)(x(t) c m (t)) dt) 2 (5) σ m T where w(t) is the functional version of W when this matris diagonal. The extension of non-diagonal weighting matrix W to functional case would be a convolution integral.

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 In theory, the scaling function w(t) could be almost any function. Nevertheless, in order to keep the functional character of data in (5), it seems natural to constrain w(t) to exhibit about the same complexity 1 as x(t). Hence, working with functional data can help reducing the excessive freedom of the parameters of the RBFN model. One could also work with a functional semi-metric [3]. For example, d m (x(t), c m (t)) = 1 ( 1/2 D (x(t) c m (t)) dt) 2 (6) σ m T where D( ) is a differential operator. There is no obvious way to design efficiently this extended network. Once again, the choice of the distance is crucial. As it is equivalent to think in terms of distance measure or data preprocessing, the functional approach may help to find an adequate preprocessing accounting for the inner structure of the data. An usual way to handle functions is to work with basis expansions. With x(t) = ξ T h(t) and c m (t) = γ m T h(t) and if w(t) = 1, (5) becomes d m (x(t), c m (t)) = 1 σ m ( (ξ γm ) T H(ξ γ m ) ) 1/2, (7) where H = T h(t)t h(t)dt. Hence, handling functional RBFN can be done as with any vectorial RBFN on coefficients ξ and γ m if the appropriate matrix H is used. 5 Application on spectrometric data In this section, we apply on spectrometric data 2 the ideas developped in this paper. Each observation is a 1-channel spectrum ( [85 15] nm) of a meat sample. Note that we used symbol for the argument instead of the generic t. The output to be predicted is the sample level of fat. Spectra are smooth functions. The original vectorial data are transformed to be expressed on a B-spline basis expansion with 3 splines. On Figure 2, four different choices of preprocessing are illustrated and will be used below. These were selected in an exploratory way. Fig. 2(a) shows five original spectra. It is assumed that the mean of each observation does not account much for the output and thus the means of the observations are substracted on Fig. 2(b). To enhance the correlation between inputs and outputs, spectra preprocessed in (b) are scaled with function a() (Fig. 2(c)). This function a() comes from the mean of the two first components of the partial least square analysis (PLS). The last preprocessing we present here is simply the use of the first derivative of the original spectra (Fig. 2(d)). The RBFN are constructed with the OLS algorithm [7] and with the distance (7) applied on preprocessed functions. The additional scaling factors 1 The complexity of a function is often characterized by its second derivative. 2 Tecator data set: http://lib.stat.cmu.edu/datasets/tecator

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 4 ()' = ().5 ()' = () - mean( ()) 3.5 3 2.5 2 85 9 95 1 15 (a) -.5 85 9 95 (b) 1 15.6.5 ()' = ( ()-mean( ())). a() + C.15.1 ()' = D( ()).4.3.5.2.1 a() C -.5 -.1 85 9 1 15 95 (c) -.1 85 9 1 15 95 (d) Figure 2: (a) 5 original spectra, (b) the same spectra translated to have a null mean, (c) the translated spectra scaled by function a() and (d) the first derivative of original spectra RMSE cross-validation 9 8 7 6 5 4 3 2 1 RMSE validation = 1.7 M = 24 1 2 3 4 nbr.of centers M (a) RMSE test 1 8 6 4 2 error over training set error over test set selected nbr. of centers 5 1 15 2 25 nbr.of centers M (b) RMSE test =.61 M = 24 Figure 3: (a) cross-validation scheme to select the number of centers, (b) test error with the selected number of centers. σ m are determined heuristically according to the mean of the distances to the neighboring centers. The original data set holds 215 spectra. It is partitioned in two parts: a training set and a test set with respectively 172 and 43 spectra. A cross-validation scheme is applied on the training set 3 to select the number of centers for each network. Finally, the root mean squared error (RMSE) over the test set is calculated to give an idea of the overall perfomances of the network. This procedure is illustrated on Figure 3 for preprocessing (d). Table 1 summaries the results obtained with the four different preprocessing. We see that: Working direcly with the original spectra gives very poor results. Applying the scaling function a() slightly improves the performances. 3 each validation set contains 43 of the 172 spectra

Bruges (Belgium), 28-3 April 24, d-side publi., ISBN 2-9337-4-8, pp. 313-318 The best performances are obtained with the first derivative of the spectra. The corresponding test error is not the smallest found in the literature. For example in [8], a test error of.36 is reported. But the major advantages of our procedure are its low computational cost and its ability to take the functional character of observations into account in a natural way. Preprocessing nbr. of centers mean validation RMSE test RMSE (a) 6 1.2 11.7 (b) 23 2.6 1.64 (c) 18 1.95 1.25 (d) 24 1.7.61 Table 1: Results of constructed F-RBFN. 6 Conclusion This paper shows that there are three levels at which functional data can help designing a RBFN. Firstly, it is a convenient way to express data having a functional structure. Secondly, thinking in terms of data preprocessing rather than distance measures, the functional approach helps designing a RBFN taking into account the functional nature of data. And finally, it may be a way to constrain parameters of the general RBFN model. Further work should try to incorporate these advantages in a more automatic learning algorithm and thus reduce the exploratory part of the current scheme. References [1] J.O. Ramsay and B.W. Silverman. Functional Data Analysis. Springer-Verlag, Inc., New York, 1997. [2] C. Abraham and al. Unsupervised curve clustering using b-splines. Technical report, ENSAM-INRIA-UM II-Montpellier, June 22. [3] F. Ferraty and P. Vieu. The functional nonparametric model and application to spectrometric data. Computational Statistics, 17(4):545 564, 22. [4] F. Rossi, B. Conan-Guez, and F. Fleuret. Theoretical properties of functional multi layer perceptrons. In Proceedings of ESANN 22, pages 7 12, Bruges, Belgium, April 22. [5] J. Park and I. W. Sandberg. Universal approximation using radial basis function networks. Neural Computation, 3(2):246 257, 1991. [6] F. Schwenker, H. A. Kestler, and G. Palm. Three learning phases for radial-basis-function networks. Neural Networks, 14(4-5):439 458, May 21. [7] S. Chen, F.N. Cowan, and P. M. Grant. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. on Neural Networks, 2(2):32 39, 1991. [8] H. H. Thodberg. A review of bayesian neural networks with an application to near infrared spectroscopy. IEEE Trans. on Neural Networks, 7(1):56 72, 1996.