Radial-Basis Function Networks

Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal approxmaton p Comparson wth MLP p RBN = ernel regresson p Learnng p Centers p Wdths p Multplyng factors p Other forms Mchel Verleysen Radal-Bass uncton Networs -

Orgn: Covers theorem p Covers theorem on separablty of patterns (965 p x, x,, x P assgned to two classes C C p -separablty: w p Cover s theorem: T w T w ( x ( x > 0 < 0 x C x C p non-lnear functons (x p dmenson hdden space > dmenson nput space probablty of separablty closer to p Example lnear quadratc Mchel Verleysen Radal-Bass uncton Networs - 3 Interpolaton problem p Gven ponts (x, t, x R d, t R, P : p nd : R d Rthat satsfes p RB technque (Powell, 988: p ( x x ( x = t, = K P ( = P x ( x x w = are arbtrary non-lnear functons (RB p as many functons as data ponts p centers fxed at nown ponts x Mchel Verleysen Radal-Bass uncton Networs - 4

Interpolaton problem ( x ( = P x ( x x = t w = M P M P L L O L P w t P w = t M M M P PP wp t where l = l ( x x p Into matrx form: Φw = x w = Φ x p Vtal queston: s Φ non-sngular? Mchel Verleysen Radal-Bass uncton Networs - 5 Mchell s theorem p If ponts x are dstnct, Φ s non-sngular (regardless of the dmenson of the nput space p Vald for a large class of RB functons: ( x = x c + ( x = x c + ( > 0 ( x c x = exp (σ > 0 σ non-localzed functon localzed functons Mchel Verleysen Radal-Bass uncton Networs - 6 3

Learnng: ll-posed problem t x p Necessty for regularzaton p Error crteron: E P ( ( t ( x + λ C( w = P = MSE regularzaton Mchel Verleysen Radal-Bass uncton Networs - 7 Soluton to the regularzaton problem p Poggo & Gros (990: p f C(w s a (problem-dependent lnear dfferental operator, the soluton to s of the followng form: where E P ( ( t ( x + λ C( w = P = ( = P x ( x, x G( s a Green s functon, G l = G(x,x l w G = w = ( G + λi t Mchel Verleysen Radal-Bass uncton Networs - 8 4

Interpolaton - Regularzaton p Interpolaton ( = P x ( x x w = Φ w = x p Exact nterpolator p Possble RB: x x ( x, x = exp σ p Regularzaton ( = P x ( x, x w G = w = ( G + λi t p Exact nterpolator p Equal to the «nterpolaton» soluton ff λ=0 p Example of Green s functon: G ( x, x x x = exp σ One RB / Green s functon for each learnng pattern! Mchel Verleysen Radal-Bass uncton Networs - 9 Generalzed RBN (GRBN RBN p As many radal functons as learnng patterns: p computatonally (too ntensve (nverson of PxP matrx grows wth P 3 p ll-condtoned matrx p regularzaton not easy (problem-specfc Generalzed RBN approach! Typcally: p K << P p ( x w ( x c = K = ( x c = exp x c σ Parameters: c, σ, w Mchel Verleysen Radal-Bass uncton Networs - 0 5

Radal-Bass uncton Networs (RBN ( x w ( x c = K = x 0 (bas x ( x c σ w (x ( x c = exp x c σ c j σk f several outputs x d st layer nd layer p Possbltes: p several outputs (common hdden layer p bas (recommended (see extensons Mchel Verleysen Radal-Bass uncton Networs - RBN: unversal approxmaton p Par & Sandberg 99: p or any contnuous nput-output mappng functon f(x K ( x = w( x c Lp ( f ( x, ( x < ε ( ε > 0,p [, ] = p The theorem s stronger (radal summetry not needed p K not specfed p Provdes a theoretcal bass for practcal RBN! Mchel Verleysen Radal-Bass uncton Networs - 6

RBN and ernel regresson p non-lnear regresson model t ( x + ε = y + ε, P = f p estmaton of f(x: average of t around x. More precsely: f ( x = E[ y x] = = yf yf Y ( y x X,Y f x dy ( x,y ( x dy p Need for estmates of fx, Y ( x,y and f X ( x Parzen-Rosenblatt densty estmator Mchel Verleysen Radal-Bass uncton Networs - 3 Parzen-Rosenblatt densty estmator fˆ x P x x K d Ph = h ( x = wth K( contnuous, bounded, symmetrc about the orgn, wth maxmum value at 0, and wth unt ntegral, s consstent (asymptotcally unbased. (,y p Estmaton of fx, Y x P ( x x fˆ X,Y x,y = K d Ph = h y y K + h Mchel Verleysen Radal-Bass uncton Networs - 4 7

RBN and ernel regresson fˆ ( x = = P = P yfˆ = X,Y fˆ X ( x,y ( x dy x x y K h x x K h f ( x = yfx,y fx ( x,y ( x dy p Weghted average of y p called Nadaraya-Watson estmator (964 p equvalent to Normalzed RBN n the unregularzed context Mchel Verleysen Radal-Bass uncton Networs - 5 RBN MLP p RBN p sngle hdden layer p non-lnear hdden layer lnear output layer p argument of hdden unts: Eucldean norm p unversal approxmaton property p local approxmators p spltted learnng p MLP p sngle or multple hdden layers p non-lnear hdden layer lnear or non-lnear output layer p argument of hdden unts: scalar product p unversal approxmaton property p global approxmators p global learnng Mchel Verleysen Radal-Bass uncton Networs - 6 8

RBN: learnng strateges ( x w ( x c ( x c = K = = exp x c σ p Parameters to be determned: c, σ, w p Tradtonal learnng strategy: spltted computaton. centers c. wdths σ 3. weghts w Mchel Verleysen Radal-Bass uncton Networs - 7 RBN: computaton of centers p Idea: centers c must have the (densty propertes of learnng ponts x vector quantzaton p selected at random (n learnng set p compettve learnng p frequency-senstve learnng p Kohonen maps p Ths phase only uses the x nformaton, not the t Mchel Verleysen Radal-Bass uncton Networs - 8 9

RBN: computaton of wdths p Unversal approxmaton property: vald wth dentcal wdths p In practce (lmted learnng set: varable wdths σ p Idea: RBN use local clusters p choose σ accordng to standard devaton of clusters Mchel Verleysen Radal-Bass uncton Networs - 9 RBN: computaton of weghts ( x w ( x c = K = ( x c = exp x c σ p Problem becomes lnear! p Soluton of least square crteron leads to where w = Φ + t = Φ = T T ( Φ Φ Φ ( x c p In practse: use SVD! constants! E P ( = ( t ( x P = Mchel Verleysen Radal-Bass uncton Networs - 0 0

p 3-steps method: RBN: gradent descent ( x = K w exp = 3 supervsed x c σ unsupervsed p Once c, σ, w have been set by the prevous method, possblty of gradent descent on all parameters p Some mprovement, but p learnng speed p local mnma p rs of non-local bass functons p etc. Mchel Verleysen Radal-Bass uncton Networs - More elaborated models p Add constant and lnear terms K d ( w exp x c x = + w' x w' + 0 = σ = good dea (very dffcult to approxmate a constant wth ernels p Use normalzed RBN x c exp K ( σ x = w = K x c j exp = σ j j bass functons are bouded [0,] can be nterpreted as probablty values (classfcaton Mchel Verleysen Radal-Bass uncton Networs -

Bac to the wdths p choose σ accordng to standard devaton of clusters p In the lterature: p σ = d K where d max = maxmum dstance between centrods [] max p p σ = c c where ndex j scans the p nearest centrods to c [] p j = ( j p σ = r mn c c j where r s an overlap constant [3] j p.. [] S. Hayn, "Neural Networs a Comprehensve oundaton", Prentce-Hall Inc, second edton, 999. [] J. Moody and C. J. Daren, "ast learnng n networs of locally-tuned processng unts", Neural Computaton, pp. 8-94, 989. [3] A. Saha and J. D. Keeler, ''Algorthms for Better Representaton and aster Learnng n Radal Bass uncton Networs", Advances n Neural Informaton Processng Systems, Edted by Davd S. Touretzy, pp. 48-489, 989. Mchel Verleysen Radal-Bass uncton Networs - 3 Basc example p Approxmaton of f(x = wth a d-dmensonal RBN p In theory: dentcal w p Expermentally: sde effects only mddle taen nto account p Error versus wdth Mchel Verleysen Radal-Bass uncton Networs - 4

Basc example: erros vs space dmenson Mchel Verleysen Radal-Bass uncton Networs - 5 Basc example: local decomposton? Mchel Verleysen Radal-Bass uncton Networs - 6 3

Multple local mnma n error curve p Choose the frst mnmum to preserve the localty of clusters p The frst local mnmum s usually less senstve to varablty Mchel Verleysen Radal-Bass uncton Networs - 7 Some concludng comments p RBN: easy learnng (compared to MLP p n a cross-valdaton scheme: mportant! p Many RBN models p Even more RBN learnng schemes p Results not very senstve to unsupervsed part of learnng (c, σ p Open wor for a pror (proble-dependent choce of wdths σ Mchel Verleysen Radal-Bass uncton Networs - 8 4

Sources and references p Most of the basc concepts developed n these sldes come from the excellent boo: p Neural networs a comprehensve foundaton, S. Hayn, Macmllan College Publshng Company, 994. p Some supplementary comments come from the tutoral on RB: p An overvew of Radal Bass uncton Networs, J. Ghosh & A. Nag, n: Radal Bass uncton Networs, R.J. Howlett & L.C. Jan eds., Physca-Verlag, 00. p The results on the basc exemple were generated by my colleague N. Benoudjt, and are submtted for publcaton. Mchel Verleysen Radal-Bass uncton Networs - 9 5