Raial Basis-Function Networks Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why oes RBF network work
Back-propagation The algorithm gives a prescription for changing the weights w ij in any feeforwar network to learn a training set of input output pairs {x,t } We consier a simple two-layer network x k x x x 3 x 4 x 5
Given the pattern x the hien unit j receives a net input net j = k= w jk x k an prouces the output 5 V j = f (net j ) = f ( w jk x k ) 5 k= Output unit i thus receives 3 net i = W ij V j = (W ij f ( w jk x k )) j= j= k= An prouce the final output 3 3 o i = f (net i ) = f ( W ij V j ) = f ( (W ij f ( w jk x k ))) j= j= k= 3 5 5 3
In our example E becomes E[ w ] = E[ w ] = m = i= m (t i o i ) = i= 3 (t i f ( W ij f ( w jk x k ))) E[w] is ifferentiable given f is ifferentiable Graient escent can be applie j 5 k= For hien-to-output connections the graient escent rule gives: ΔW ij = η E = η W ij ΔW ij = η m = m = (t i o i ) f ' (net i ) V j (t i o i ) ( f ' (net i )) V j δ i = f ' (net i )(t i o i ) m ΔW ij = ηδ i V j = 4
For the input-to hien connection w jk we must ifferentiate with respect to the w jk Using the chain rule we obtain Δw jk = η E = η w jk m = E V V j j w jk Δw jk = η = i= (t i δ i = f ' (net i )(t i o i ) Δw jk = η m δ j = f ' (net j ) Δw jk = η m = i= m = δ j o i ) f ' (net i )W ij f ' (net j ) x k δ i W ij f ' (net j ) x k x k W ij δ i i= 5
Example w ={w =0.,w =0.,w 3 =0.,w 4 =0.,w 5 =0.} w ={w =0.,w =0.,w 3 =0.,w 4 =0.,w 5 =0.} w 3 ={w 3 =0.,w 3 =0.,w 33 =0.,w 34 =0.,w 35 =0.} W ={W =0.,W =0.,W 3 =0.} W ={W =0.,W =0.,W 3 =0.} X ={,,0,0,0}; t ={,0} X ={0,0,0,,}; t ={0,} f (x) = σ(x) = + e ( x) f ' (x) = σ ' (x) = σ(x) ( σ(x)) net = w k x k 5 k= net = w k x k 5 k= net 3 = w 3k x k 5 k= V = f (net ) = + e net net =*0.+*0.+0*0.+0*0.+0*0. V =f(net )=/(+exp(-0.))=0.54983 V = f (net ) = + e net V =f(net )=/(+exp(-0.))=0.54983 V 3 = f (net 3 ) = + e net 3 V 3=f(net 3 )=/(+exp(-0.))=0.54983 6
3 net = W j V j j= o = f (net ) = + e net net =0.54983*0.+ 0.54983*0.+ 0.54983*0.= 0.6495 o = f(net)=/(+exp(- 0.6495))= 0.544 3 net = W j V j j= o = f (net ) = + e net net =0.54983*0.+ 0.54983*0.+ 0.54983*0.= 0.6495 o = f(net)=/(+exp(- 0.6495))= 0.544 ΔW ij = η m (t i o i ) f ' (net i ) V j = We will use stochastic graient escent with η= ΔW ij = (t i o i ) f ' (net i )V j f ' (x) = σ ' (x) = σ(x) ( σ(x)) ΔW ij = (t i o i )σ(net i )( σ(net i ))V j δ i = (t i o i )σ(net i )( σ(net i )) ΔW ij = δ i V j 7
δ = (t o )σ(net )( σ(net )) ΔW j = δ V j δ =(- 0.544)*(/(+exp(- 0.6495)))*(-(/(+exp(- 0.6495))))= 0.394 δ = (t o )σ(net )( σ(net )) ΔW j = δ V j δ =(0-0.544)*(/(+exp(- 0.6495)))*(-(/(+exp(- 0.6495))))= -0.3437 Δw jk = δ i W ij f ' (net j ) x k Δw jk = δ i W ij σ(net j )( σ(net j )) x k δ j = σ(net j )( σ(net j )) Δw jk = δ j x k W ij δ i i= 8
δ = σ(net )( σ(net )) W i δ i i= δ = /(+exp(- 0.))*(- /(+exp(- 0.)))*(0.* 0.394+0.*( -0.3437)) δ = -5.0568e-04 δ = σ(net )( σ(net )) δ = -5.0568e-04 W i δ i i= δ 3 = σ(net 3 )( σ(net 3 )) i= δ 3 = -5.0568e-04 W i3 δ i First Aaptation for x (one epoch, aaptation over all training patterns, in our case x x ) Δw jk = δ j x k ΔW ij = δ i V j δ = -5.0568e-04 δ = 0.394 δ = -5.0568e-04 δ = -0.3437 δ 3 = -5.0568e-04 x = v =0.54983 x = v =0.54983 x 3 =0 v 3 =0.54983 x 4 =0 x 5 =0 9
5/4/ Raial Basis-Function Networks RBF networks train rapily No local minima problems No oscillation Universal approximators Can approximate any continuous function Share this property with fee forwar networks with hien layer of nonlinear neurons (units) Disavantage After training they are generally slower to use 0
Gaussian response function Each hien layer unit computes D i σ h i = e x = an input vector u = weight vector of hien layer neuron i D i = ( x u i ) T ( x u i ) The output neuron prouces the linear weighte sum o = n i= 0 w i h i The weights have to be aopte (LMS) Δw i = η(t o)h i
The operation of the hien layer One imensional input (x u) σ h = e Two imensional input
Every hien neuron has a receptive fiel efine by the basis-function x=u, maximum output Output for other values rops as x eviates from u Output has a significant response to the input x only over a range of values of x calle receptive fiel The size of the receptive fiel is efine by σ u may be calle mean an σ stanar eviation The function is raially symmetric aroun the mean u Location of centers u The location of the receptive fiel is critical Apply clustering to the training set each etermine cluster center woul correspon to a center u of a receptive fiel of a hien neuron 3
Determining σ The object is to cover the input space with receptive fiels as uniformly as possible If the spacing between centers is not uniform, it may be necessary for each hien layer neuron to have its own σ For hien layer neurons whose centers are wiely separate from others, σ must be large enough to cover the gap Following heuristic will perform well in practice For each hien layer neuron, fin the RMS istance between u i an the center of its N nearest neighbors c N j RMS = n c lk n u l= k Assign this value to σ i i= k N 4
5
5/4/ Why oes a RBF network work? The hien layer applies a nonlinear transformation from the input space to the hien space In the hien space a linear iscrimination can be performe 6
Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why oes RBF network work Bibliography Wasserman, P. D., Avance Methos in Neural Computing, New York: Van Nostran Reinhol, 993 Simon Haykin, Neural Networks, Secen eition Prentice Hall, 999 7
Support Vector Machines 8