Radial-Basis Function Networks. Radial-Basis Function Networks

Similar documents
Neural Networks Lecture 4: Radial Bases Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Ch. 10 Vector Quantization. Advantages & Design

Learning Vector Quantization

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Learning Vector Quantization (LVQ)

THE frequency sensitive competitive learning (FSCL) is

EE368B Image and Video Compression

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

Distance Preservation - Part I

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Artificial Neural Networks. Edward Gatt

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Regularization via Spectral Filtering

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Quantization. Introduction. Roadmap. Optimal Quantizer Uniform Quantizer Non Uniform Quantizer Rate Distorsion Theory. Source coding.

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Statistical Machine Learning

Spectral Regularization

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

CHAPTER 2 NEW RADIAL BASIS NEURAL NETWORKS AND THEIR APPLICATION IN A LARGE-SCALE HANDWRITTEN DIGIT RECOGNITION PROBLEM

Vector Quantization and Subband Coding

Hopfield Network Recurrent Netorks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Radial-Basis Function Networks

CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent

Radial-Basis Function Networks

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Bits of Machine Learning Part 1: Supervised Learning

Back to the future: Radial Basis Function networks revisited

Linear Models for Regression. Sargur Srihari

Radial-Basis Function Networks

Statistical Data Mining and Machine Learning Hilary Term 2016

Machine Learning 4771

CSE446: non-parametric methods Spring 2017

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Long-Term Time Series Forecasting Using Self-Organizing Maps: the Double Vector Quantization Method

Neural Networks: Backpropagation

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Abstract This report has the purpose of describing several algorithms from the literature all related to competitive learning. A uniform terminology i

Introduction to Neural Networks

Bootstrap for model selection: linear approximation of the optimism

Neural Networks and Deep Learning

Self-organizing Neural Networks

Neural Networks and the Back-propagation Algorithm

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Artificial Neural Networks. MGS Lecture 2

L11: Pattern recognition principles

Fractal Dimension and Vector Quantization

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

2 Tikhonov Regularization and ERM

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Linear Regression (continued)

Fractal Dimension and Vector Quantization

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CHAPTER IX Radial Basis Function Networks

Linear & nonlinear classifiers

Generalization and Function Approximation

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

Vector quantization: a weighted version for time-series forecasting

Example: for source

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

L26: Advanced dimensionality reduction

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Least Squares Approximation

Multi-Layer Boosting for Pattern Recognition

PATTERN CLASSIFICATION

EXPECTATION- MAXIMIZATION THEORY

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Artificial Neural Networks Examination, March 2004

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Automatic Noise Recognition Based on Neural Network Using LPC and MFCC Feature Parameters

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Statistical Machine Learning from Data

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

ISyE 691 Data mining and analytics

Statistical Machine Learning from Data

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

3.4 Linear Least-Squares Filter

Classification and Pattern Recognition

Big Data Analytics: Optimization and Randomization

A Tutorial on Support Vector Machine

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Transcription:

Radial-Basis Function Networks November 00 Michel Verleysen Radial-Basis Function Networks - Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks -

Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 3 Origin: Covers theorem p Covers theorem on separability of patterns (965) p x, x,, x P assigned to two classes C C p ϕ-separability: w T w ϕ T w ϕ ( x ) ( x ) > 0 < 0 x C x C p Cover s theorem: p non-linear functions ϕ(x) p dimension hidden space > dimension input space probability of separability closer to p Example linear quadratic Michel Verleysen Radial-Basis Function Networks - 4

Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 5 Interpolation problem p Given points (x p, t p ), x p R D, t p R, p P : p Find F : R D Rthat satisfies p p ( ) KP F x = t, p = p RBF technique (Powell, 988): ( ) F P p ( x ) = w p ϕ( x x ) p= p p ϕ x x p as many functions as data points p centers fixed at known points x p are arbitrary non-linear functions (RBF) Michel Verleysen Radial-Basis Function Networks - 6 3

4 Michel Verleysen Radial-Basis Function Networks - 7 Interpolation problem p Into matrix form: p Vital question: is Φ non-singular? ( ) p p F = t x ( ) ( ) = ϕ = P p p w k F x x x = ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ P P PP P P P P t t t w w w M M L M O M M L L ( ) l k kl x x = ϕ ϕ where x w = Φ x w = Φ Michel Verleysen Radial-Basis Function Networks - 8 Michelli s theorem p If points x k are distinct, Φ is non-singular (regardless of the dimension of the input space) p Valid for a large class of RBF functions: ( ), + l = ϕ c x c x ( ) l, + = ϕ c x c x (l > 0) ( ) σ = ϕ exp c x c x, (σ > 0) non-localized function localized functions

Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 9 Learning: ill-posed problem t x p Necessity for regularization p Error criterion: E P p p ( F ) ( t F( x ) + λ C( w ) = P p= MSE regularization Michel Verleysen Radial-Basis Function Networks - 0 5

Solution to the regularization problem p Poggio & Girosi (990): p if C(w) is a (problem-dependent) linear differential operator, the solution to P p p E( F ) ( t F ( x ) + λ C( w ) is of the following form: where F P p ( x ) = w p G( x, x ) p= G() is a Green s function, w = = P p= ( G + λi ) t G kl = G(x k,x l ) Michel Verleysen Radial-Basis Function Networks - Interpolation - Regularization p Interpolation F P p ( x ) = w p ϕ( x x ) w = Φ p= x p Exact interpolator p Possible RBF: p x x p ϕ( x, x ) = exp σ p Regularization F P p ( x ) = w p G( x, x ) w = p= ( G + λi ) t p Exact interpolator p Equal to the «interpolation» solution iff λ=0 p Example of Green s function: G p ( x, x ) p x x = exp σ One RBF / Green s function for each learning pattern! Michel Verleysen Radial-Basis Function Networks - 6

Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 3 Generalized RBFN (GRBFN RBFN) p As many radial functions as learning patterns: p computationally (too) intensive (inversion of PxP matrix grows with P 3 ) p ill-conditioned matrix p regularization not easy (problem-specific) Generalized RBFN approach! Typically: p K << P p F ϕ ( x ) w i ϕ( x ci ) = K i = ( x c ) i = exp x c σ i i Parameters: c i, σ i, w i Michel Verleysen Radial-Basis Function Networks - 4 7

Radial-Basis Function Networks (RBFN) F ( x ) w i ϕ( x ci ) = K i = x 0 (bias) x ϕ( x c i ) σ w i F(x) ϕ ( x c ) i = exp x c i σ i c ij σk if several outputs x d st layer nd layer p Possibilities: p several outputs (common hidden layer) p bias (recommended) (see extensions) Michel Verleysen Radial-Basis Function Networks - 5 RBFN: universal approximation p Park & Sandberg 99: p For any continuous input-output mapping function f(x) F K ( x ) = wiϕ( x ci ) Lp( f ( x ),F ( x )) < ε ( ε > 0,p [, ] ) i = p The theorem is stronger (radial symmetry not needed) p K not specified p Provides a theoretical basis for practical RBFN! Michel Verleysen Radial-Basis Function Networks - 6 8

RBFN and kernel regression p non-linear regression model p p p p p ( ) P t = f x + ε = y + ε, p p estimation of f(x): average of t around x. More precisely: f ( x ) = E[ y x] = = yf yf Y ( y x ) X Y, f x dy ( x,y ) ( x ) dy p Need for estimates of fx, Y ( x,y ) and f X ( x ) Parzen-Rosenblatt density estimator Michel Verleysen Radial-Basis Function Networks - 7 Parzen-Rosenblatt density estimator fˆ x P x x K d Ph p= h ( x) = p with K() continuous, bounded, symmetric about the origin, with maximum value at 0, and with unit integral, is consistent (asymptotically unbiased). p Estimation of f (,y ) fˆ X,Y ( x,y ) X, Y x P x x = K d Ph p= h y y K + h p p Michel Verleysen Radial-Basis Function Networks - 8 9

RBFN and kernel regression fˆ ( x) yfˆ = X,Y fˆ X ( x,y ) ( x ) dy P p p x x y K p= h = P p x x K p= h f ( x ) = yfx Y, fx ( x,y ) ( x ) dy p Weighted average of y i p called Nadaraya-Watson estimator (964) p equivalent to Normalized RBFN in the unregularized context Michel Verleysen Radial-Basis Function Networks - 9 RBFN MLP p RBFN p single hidden layer p non-linear hidden layer linear output layer p argument of hidden units: Euclidean norm p universal approximation property p local approximators p splitted learning p MLP p single or multiple hidden layers p non-linear hidden layer linear or non-linear output layer p argument of hidden units: scalar product p universal approximation property p global approximators p global learning Michel Verleysen Radial-Basis Function Networks - 0 0

Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - RBFN: learning strategies F ( x ) w i ϕ( x ci ) ϕ( x ci ) = K i = = exp x c σ i i p Parameters to be determined: c i, σ i, w i p Traditional learning strategy: splitted computation. centers c i. widths σ i 3. weights w i Michel Verleysen Radial-Basis Function Networks -

RBFN: computation of centers p Idea: centers c i must have the (density) properties of learning points x k vector quantization p seen in details hereafter p This phase only uses the x k information, not the t k Michel Verleysen Radial-Basis Function Networks - 3 RBFN: computation of widths p Universal approximation property: valid with identical widths p In practice (limited learning set): variable widths σ i p Idea: RBFN use local clusters p choose σ i according to standard deviation of clusters Michel Verleysen Radial-Basis Function Networks - 4

RFBN: computation of weights F K ( x ) = w i ϕ( x ci ) i = ϕ ( x c ) i = exp x c i σ i p Problem becomes linear! p Solution of least square criterion leads to T T w = Φ + t = ( Φ Φ) Φ where k Φ ϕki = ϕ x ci p In practise: use SVD! ( ) constants! E P p p ( F ) = ( t F( x ) P p= Michel Verleysen Radial-Basis Function Networks - 5 p 3-steps method: RBFN: gradient descent F ( x ) = K w i exp i = 3 supervised x c i σ i unsupervised p Once c i, σ i, w i have been set by the previous method, possibility of gradient descent on all parameters p Some improvement, but p learning speed p local minima p risk of non-local basis functions p etc. Michel Verleysen Radial-Basis Function Networks - 6 3

More elaborated models p Add constant and linear terms K D i F( ) w exp x c x = i + w' x w' i i + 0 i = σi i = good idea (very difficult to approximate a constant with kernels) p Use normalized RBFN x ci exp K ( ) σi F x = w i = i K x c j exp = σ j j basis functions are bouded [0,] can be interpreted as probability values (classification) Michel Verleysen Radial-Basis Function Networks - 7 Back to the widths p choose σ i according to standard deviation of clusters p In the literature: p σ = dmax K where d max = maximum distance between centroids [] q p σ = c c where index j scans the q nearest centroids to c i [] i i q j = ( ) j p σi = r min ci c j where r is an overlap constant [3] j p.. [] S. Haykin, "Neural Networks a Comprehensive Foundation", Prentice-Hall Inc, second edition, 999. [] J. Moody and C. J. Darken, "Fast learning in networks of locally-tuned processing units", Neural Computation, pp. 8-94, 989. [3] A. Saha and J. D. Keeler, ''Algorithms for Better Representation and Faster Learning in Radial Basis Function Networks", Advances in Neural Information Processing Systems, Edited by David S. Touretzky, pp. 48-489, 989. Michel Verleysen Radial-Basis Function Networks - 8 4

Basic example p Approximation of f(x) = with a d-dimensional RBFN p In theory: identical w i p Experimentally: side effects only middle taken into account p Error versus width Michel Verleysen Radial-Basis Function Networks - 9 Basic example: erros vs space dimension Michel Verleysen Radial-Basis Function Networks - 30 5

Basic example: local decomposition? Michel Verleysen Radial-Basis Function Networks - 3 Multiple local minima in error curve p Choose the first minimum to preserve the locality of clusters p The first local minimum is usually less sensitive to variability Michel Verleysen Radial-Basis Function Networks - 3 6

Some concluding comments p RBFN: easy learning (compared to MLP) p in a cross-validation scheme: important! p Many RBFN models p Even more RBFN learning schemes p Results not very sensitive to unsupervised part of learning (c i, σ i ) p Open work for a priori (proble-dependent) choice of widths σ i Michel Verleysen Radial-Basis Function Networks - 33 Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized RBFN p Universal approximation p RBFN kernel regression p Learning p Centers vector quantization p Widths p Multiplying factors p Other forms p Vector quantization Michel Verleysen Radial-Basis Function Networks - 34 7

Back to the centers: Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 35 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 36 8

Aim of vector quantization p To reduce the size of a database N features N features P vectors 5 6 3 4 7 7 5 3 5 3 9 4 7 9 0 4 3 8 4 Q vectors Q < P 5 3 7 5 3 3 4 9 4 3 4 Michel Verleysen Radial-Basis Function Networks - 37 Principle of vector quantization p To project a continuous input space on a discrete output space, while minimizing the loss of information Q vectors.................................. P points Q vectors Michel Verleysen Radial-Basis Function Networks - 38 9

Principle of vector quantization p To define zones in the space, the set of points contained in each zone being projected on a representative vector (centroid) p Example: -dimensional spaces Michel Verleysen Radial-Basis Function Networks - 39 What is a vector quantizer? p Vector quantizer =. A codebook (set of centroids, or codewords) { y j, j Q} m =. A quantization function q: i j ( x ) y i q x q = Usually: q is defined by the nearest neighbour rule (according to some distance measure) Michel Verleysen Radial-Basis Function Networks - 40 0

p Least square error d p r-norm error dr Distance measures D i j i j i j i j T i j ( x, y ) ( x y ) = x y = ( x y ) ( x y ) = k k = D i j i j r ( x, y ) ( x y ) = k k= k k r p Mahalanobis distance (Γ is the covariance matrix of the inputs) d W ( i j i j T i j x, y ) = ( x y ) Γ ( x y ) Michel Verleysen Radial-Basis Function Networks - 4 Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 4

Vector scalar quantization p Shannon: a vector quantizer always gives better results than the product of scalar quantizers, even if the probability densities are independent p Example #: uniform -D distribution a scalar quantization [ ( ) 0 d x y ] a E = 36 vector quantization [ ( ) 6 E d x y ] = a 36 Michel Verleysen Radial-Basis Function Networks - 43 Vector scalar quantization p Shannon: a vector quantizer always gives better results than the product of scalar quantizers, even if the probability densities are independent p Example #: uniform -D distribution scalar quantization 4 [ d( x y ) ] c 6 E SQ = vector quantization 4 [ d( x y ) ]. 08h E VQ = c same ratio #centroids/unit surface: E VQ = 0.96 E SQ h Michel Verleysen Radial-Basis Function Networks - 44

Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 45 Lloyd s principle x i encoder j decoder y j p 3 properties:. The first one gives the best encoder, once the decoder is known. The second one gives the best decoder, once the encoder is known 3. There is no point on the borders between Voronoï regions (probability = 0) p Optimal quantizer: properties are bad necessary, but not sufficient: good Michel Verleysen Radial-Basis Function Networks - 46 3

Lloyd: property # x i encoder j decoder y j p For a given decoder β, the best encoder is given by: α i i j j ( x ) = argmin [ d( x, y )] where y = β( j ) j nearest-neighbor rule! Michel Verleysen Radial-Basis Function Networks - 47 Lloyd: property # x i encoder j decoder y j p For a given encoder α, the best decoder is given by: β i j i ( j ) = argmin [ E[ d( x, y ) α( x ) j ] y j = center-of-gravity rule! Michel Verleysen Radial-Basis Function Networks - 48 4

Lloyd: property #3 p The probability to find a point xi on a border (between Voronoï regions) is zero! probability = 0 Michel Verleysen Radial-Basis Function Networks - 49 Lloyd s algorithm. Choice of an initial codebook.. All points x i are encoded; E VQ is evaluated. 3. If E VQ is small enough, then stop. 4. All centroids y j are replaced by the center-of-gravity of the data x i associated to y j in step. 5. Back to step. Michel Verleysen Radial-Basis Function Networks - 50 5

Lloyd: example. Initialization of the codeboook y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 5 Lloyd: example. Encoding (nearest-neighbor) y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 5 6

Lloyd: example 4. Decoding (center-of-gravity) y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 53 Lloyd: example. Encoding (nearest-neighbor) new borders y y y 3 y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 54 7

Lloyd: example 4. Decoding (center-of-gravity) y y 3 new positions of centroids y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 55 Lloyd: example. Encoding (nearest-neighbor) new borders y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 56 8

Lloyd: example 4. Decoding (center-of-gravity) new positions of centroids y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 57 Lloyd: example. Encoding (nearest-neighbor) final borders (convergence) y y 3 y y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 58 9

Lloyd s algorithm: the names p Lloyd s algorithm p Generalized Lloyd s algorithm p Linde-Buzzo-Gray (LBG) algorithm p K-means p ISODATA p All based on the same principle! Michel Verleysen Radial-Basis Function Networks - 59 Lloyd s algorithm: properties p The codebook is modified only after the presentation of the whole dataset p The mean square error (E VQ ) decreases at each iteration p The risk of getting trapped in local minima is high p The final quantizer depends on the initial one Michel Verleysen Radial-Basis Function Networks - 60 30

Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 6 How to initialize Lloyd s algorithm?. randomly in the input space. the Q first data points x i 3. Q randomly chosen data points x i 4. Product codes : the product of scalar quantizers 5. growing initial set: p a first centroid y is randomly chosen (in the data set) p a second centroid y is randomly chosen (in the data set); if d(y,y ) > threshold, y is kept p a third centroid y 3 is randomly chosen (in the data set); if d(y,y 3 ) > threshold AND d(y,y 3 ) > threshold, y 3 is kept p Michel Verleysen Radial-Basis Function Networks - 6 3

How to initialize Lloyd s algorithm? 6. pairwise nearest neighbor : p a first codebook is built with all data points x i p the two centroids y j nearest one from another are merged (center-of-gravity) p Variant: p the increase of distortion (E VQ ) is evaluated for the merge of each pair of centroids y j ; the pair gicing the lowest increase is merged. Michel Verleysen Radial-Basis Function Networks - 63 How to initialize Lloyd s algorithm? 7. Splitting p a first centroid y is randomly chosen (in the data set) p a second centroid y +ε is created; Lloyd s algorithm is applied to the new codebook p two new centroids are created by perturbing the two existing ones; Lloyd s algorithm is applied to this 4-centroids codebook p Michel Verleysen Radial-Basis Function Networks - 64 3

Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 65 Vector quantization: «neural» algorithms p Principle: the codebook is (partly or fully) modified at each presentation of one data vector x i p Advantages: p simplicity p adaptive algorithm (with varying data) p possible parallelisation p speed? p avoids local minima? Michel Verleysen Radial-Basis Function Networks - 66 33

Competitive learning p Algorithm: p For each input vector x i, the «winner» is selected: y y 3 d i k i j ( x, y ) d( x, y ), j,k Q p Adaptation rule: the winner is moved towards the input vector: y k t k i k ( + ) = y ( t) + α( x y ) y y k (t) y k (t+) x i y 4 x i : data y j : centroids Michel Verleysen Radial-Basis Function Networks - 67 Competitive learning p Adaptation rule: stochastic gradient descent on j ( ( x) E = x y ) p( x ) dx convergence to (local) minimum y y 3 p Robbins-Monro conditions on α: α() t = and α () t < t = 0 t = 0 p Local minima! p Some centroids may be «lost»! y never «winner»! y 4 Michel Verleysen Radial-Basis Function Networks - 68 34

Frequency sensitive learning p Competitive learning; some centroids may be lost during learning centroids often chosen (as winners) are penalized! p Choice of winner is replaced by u k d i k j i j ( x, y ) u d( x, y ), j,k Q where u j, u k are incremented each time they are chosen as winner (starting at ) Michel Verleysen Radial-Basis Function Networks - 69 Frequency sensitive learning y y 3 u d u d i ( x, y ) i ( x, y ) y x i possible «winner»! y 4 (with u = and u =3 for example) Michel Verleysen Radial-Basis Function Networks - 70 35

Vector quantization p Aim and principle - What is a vector quantizer? p Vector scalar quantization p Lloyd s principle and algorithm p Initialization p Neural algorithms p Competitive learning p Frequency Sensitive Learning p Winner-take-all winner-take-most p Soft Competition Scheme p Stochastic Relaxation Scheme p Neural gas Michel Verleysen Radial-Basis Function Networks - 7 Winner-take-all Winner-take-most p Most VQ algorithms: if two centroids are close, it will be hard to separate them! p (Competitive learning and LVQ: one or two winners are adapted) p Solution: to adapt the winner and other centroids: p (with respect to the distance between the centroid and x i (SCS) ) p stochastically with respect to the distance (SRS) p with respect to the order of proximity with x i (neural-gas) p those in a neighborhood (on a grid) of the winner (Kohonen) These algorithms accelerate the VQ too! Michel Verleysen Radial-Basis Function Networks - 7 36

Soft Competition Scheme (SCS) p Adaptation rule on all centroids: k y k i i k ( t + ) = y ( t) + αg( k, x )( x y () t ) with i ( x ) G k, i k x y () t e T = i j x y () t Q T e j= (T is made decreasing over time) Michel Verleysen Radial-Basis Function Networks - 73 Stochastic Relaxation Scheme (SRS) p Adaptation rule on all centroids: k y k i i k ( t + ) = y ( t) + αg( k, x )( x y () t ) with i ( x ) G k, = 0 i k x y () t e T P = i j x y () t Q T e j = with probabilit y P with probabilit y P (T is made decreasing over time) Michel Verleysen Radial-Basis Function Networks - 74 37

Neural gas p Principle: centroids are ranked according to their distance with x i : h(j, x i ) = c if y j is the c-st nearest centroid from x i p Adaptation rule: k y k ( t ) = y ( t) i ( k, x ) h + + αe λ i k ( x y () t ) Michel Verleysen Radial-Basis Function Networks - 75 Neural gas p Properties: p if λ=0: competitive learning p partial sorting possible p improvement: prossibility of frequency sensitive learning p The adaptation rule is a stochastic gradient descent on i ( ) Q h j, x E = λ () e Q h l λ j = e l = i j x y () t p( x ) dx Michel Verleysen Radial-Basis Function Networks - 76 38

Sources and references (RBFN) p Most of the basic concepts developed in these slides come from the excellent book: p Neural networks a comprehensive foundation, S. Haykin, Macmillan College Publishing Company, 994. p Some supplementary comments come from the tutorial on RBF: p An overview of Radial Basis Function Networks, J. Ghosh & A. Nag, in: Radial Basis Function Networks, R.J. Howlett & L.C. Jain eds., Physica-Verlag, 00. p The results on the RBFN basic exemple were generated by my colleague N. Benoudjit, and are submitted for publication. Michel Verleysen Radial-Basis Function Networks - 77 Sources and references (VQ) p A classical tutorial on vector quantization p Vector quantization, R.M. Gray, IEEE ASSP Mag., vol., pp. 4-9, April 984. p Most concepts in this chapter come from the following (specialized) papers: p Vector quantization is speech coding, K. Makhoul, S. Roucos, H. Gish, Proceedings IEEE, vol. 73, n., November 985. p Neural-gas network for vector quantization and its application to timeseries prediction, T.M. Martinetz, S.G. Berkovich, K.J. Schulten, IEEE T. Neural Networks, vol.4, n.4, July 993. p Habituation in Learning Vector Quantization, T.? Gestzi, I. Csabai, Complex Systems, n.6, 99. p DVQ: Dynamic Vector Quantization an incremental LVQ, F. Poirier, A. Ferrieux, in: Artificial Neural Networks, T. Kohonen et al. eds., Elsevier, 99. p Representation of nonlinear data structures thourgh a fast VQP neural network, P. Demartines, J. Hérault, Proc. Neuro-Nîmes 993. Michel Verleysen Radial-Basis Function Networks - 78 39