Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - SS 004 Holger Fröhlich (abg. Vorl. von S. Kaushik¹) Lehrstuhl Rechnerarchitektur, Prof. Dr. A. Zell ¹www.cse.iitd.ernet.in/~saroj
Radial Basis Function (RBF) = non increasing function of the distance between two input vectors. RBFs represent local receptors Hidden layer uses neurons with RBF activation functions describing local receptors. Output node is used to combine linearly the outputs of the hidden neurons. w3 w w The output of the red vector is interpolated using the three green vectors, where each vector gives a contribution that depends on its weight and on its distance from the red point. In the picture we have w < w3 < w Neuronale Netze, SS 004
RBF Architecture x ϕ w x f(x,,x d ) ϕ w m x d One hidden layer with RBF activation functions Output layer with linear activation function. f x = wϕ x - t + + w m ϕ x - t m x-t x= x x d t ( ) ( )... ( ) distance of (,..., ) from vector Neuronale Netze, SS 004 3
Hidden Neuron Model Hidden units: use radial basis functions φ σ ( x - t ) the output depends on the distance of the input x from the center t x φ σ ( x - t ) x x d ϕ σ t is called center σ is called spread center and spread are parameters Neuronale Netze, SS 004 4
Hidden Neurons A hidden neuron is more sensitive to data points near its center. For Gaussian RBF this sensitivity may be tuned by adjusting the spread σ, where a larger spread implies less sensitivity. Biological example: cochlear stereocilia cells (in our ears...) have locally tuned frequency responses. Neuronale Netze, SS 004 5
Gaussian RBF φ : center σ is a measure of how spread the curve is: Large σ Small σ Neuronale Netze, SS 004 6
Types of φ Multiquadrics: Inverse multiquadrics: ϕ( r ) = ( r + c ϕ( r) = ( r + c ) Gaussian functions (most used): ) c > 0 r = x-t c > 0 ϕ( r) r exp σ = σ > 0 Neuronale Netze, SS 004 7
Example: the XOR problem Input space: x (0,) (,) (0,0) (,0) x Output space: 0 y Construct an RBF pattern classifier such that: (0,0) and (,) are mapped to 0, class C (,0) and (0,) are mapped to, class C Neuronale Netze, SS 004 8
Example: the XOR problem In the feature (hidden layer) space: ϕ ( x-t ) = ϕ ( x-t ) = e x-t e x-t φ.0 (0,0) with t = (,) and t = Decision boundary (0,0) 0.5 (,) (0,) and (,0) 0.5 When mapped into the feature space < ϕ, ϕ > (hidden layer), C and C become linearly separable. So a linear classifier with ϕ (x) and ϕ (x) as inputs can be used to solve the XOR problem..0 φ Neuronale Netze, SS 004 9
RBF NN for the XOR problem ϕ ( x-t ) ϕ ( x-t ) = = e e x-t x-t with t = (,) and t = (0,0) x x t t x-t x-t ( ) = + f x e e - f(x,x ) - + If f(x) > 0 then class otherwise class 0 Neuronale Netze, SS 004 0
RBF network parameters What do we have to learn for a RBF NN with a given architecture? The centers of the RBF activation functions the spreads of the Gaussian RBF activation functions the weights from the hidden to the output layer Different learning algorithms may be used for learning the RBF network parameters. (Pseudo)-Inverse method Regularization Iterative Training Neuronale Netze, SS 004
Learning the weights () Direct computation of the weights: ( x, y ) For an example consider the output of the network i i f( x ) = wϕ( x t ) +... + w ϕ( x t ) i i N i N f( x ) = We would like for each of N examples, that is i y i wϕ( x t ) +... + w ϕ( x t ) = y i N i N i Neuronale Netze, SS 004
Learning the weights () This can be re-written in matrix form for one example [ x-t ] i x-t i N ϕ( )... ϕ ( ) [ w... w ] T = y N i and ϕ( x-t )... ϕ( x-tn ) T... [ w... wn] = [ y... yn] ϕ( x )... ϕ( ) N -t xn -tn T for all the N examples at the same time Neuronale Netze, SS 004 3
Learning the weights (3) Let ϕ( x-t )... ϕ( xn -tn ) Φ=... ϕ( x )... ϕ( ) N -t xn -tn then we can write and hence w y Φ...... = w N y N T =Φ N [ w... w ] [ y... y ] N T Neuronale Netze, SS 004 4
Approximation Idea: restrict to m < N basis functions => better generalization capability, fewer weights and better running time ϕ( x-t )... ϕ( x -tm ) Φ=... ϕ( x )... ϕ( ) N -t xn -tm Neuronale Netze, SS 004 5
Approximation () w y We have Φ...... = and hence w m y N T =Φ + m [ w... w ] [ y... y ] N T + T T where Φ = ( Φ Φ) Φ is the Penrose - Moore pseudoinverse Neuronale Netze, SS 004 6
Solution by means of regularization [Variationsrechnung, PogGir, 989] Idea: Minimize regularized risk N Rreg[ f] = ( yi f( xi)) + λ Pf i= P is a differential operator m wiw jϕ t,t i j i, j= Possible choice: e.g. Pf = ( ) Neuronale Netze, SS 004 7
Regularization () ϕ( t -t )... ϕ( t -t ) Φ= ϕ( t )... ϕ( ) m -t tm -tm m ' With... N Rreg[ f] = ( yi f( xi)) + λ Pf i= N m m ( yi wjϕ( x,t i j)) λ ww i jϕ( t,t i j) i= j= i, j= = + N N m N m m yi yw i jϕ( x,t i j) wjϕ( x,t i j) + λ i= i= j= i= j= i, j= = + ww i jϕ( t,t i j T T T T T T ' = yy wφ y+ w (ΦΦ)w + λw Φ w T T ' T T T = w (ΦΦ+ λφ ) w w Φ y+ y y Neuronale Netze, SS 004 8
Regularization (3) R! reg[ f] T ' T = ( ΦΦ+ λφ ) w-φ y = 0 w T ' T ( ΦΦ+ λφ ) w = Φ y T ' T w ΦΦ λφ Φy = ( + ) With λ = 0 we get exactly the same solution as with the Pseudo-inverse method! λ is a regularization constant which models the tradeoff between model complexity and training error Neuronale Netze, SS 004 9
Computing the RBF spreads and centers Centers:. centers are chosen randomly from the training set. centers are computed using LVQ or some clustering algorithm (e.g. k-means) Spreads: are chosen by normalization: Maximum distance between any centers σ = = number of centers Then the activation function of hidden neuron becomes: m ( ) ϕ x-ti = exp x-t i d max d max m Neuronale Netze, SS 004 0
LVQ clustering clustering algorithm for finding the centers Initialization: t k (0) random k =,, m Sampling: draw x from input space 3 Similarity matching: find index of center closer to x k(x) = 4 Updating: adjust centers arg min x(n) (n) t [ ] k (n) +η x(n) tk(n) if k = k(x) t k (n + ) = t (n) η [ x(n) t (n)] otherwise k 5 Continuation: increment n by, goto and continue until no noticeable changes of centers occur k k t k Neuronale Netze, SS 004
Iterative Training of RBF networks Apply the gradient descent method for finding centers, spread and weights, by minimizing the (instantaneous) squared error Update for: centers spread weights N E = ( f ( x i ) yi ) i = t j = t = η E t σ j η σ w j = j E σ j η E w j Neuronale Netze, SS 004
Iterative Training for usage of Gaussian RBF E w j N = ( y f( x )) ϕ( x -t ) i= N E = ( yi f( xi)) wjϕ( xi -tj )( xi -tj) t j E σ j i= N 3 = ( y f( x )) wϕ( x -t ) / σ i= i i i i j i j i j This is for batch modus, but online training is also possible Also more than one output neuron possible Neuronale Netze, SS 004 3
Possible extension: Hyper Basis Function (HBF) networks Up to now: linear activation function in output unit Now: sigmoid activation function Addition of weighted input vector to output (realized by shortcut connections from input to output units) m d f( x) = ς wjϕ( x-t j ) + ωkxk + θ) j= k= with ς being a sigmoidial function Neuronale Netze, SS 004 4
Comparison with MLP networks RBF-Networks are used for regression and for performing complex (non-linear) pattern classification tasks. Comparison between RBF networks and MLPs: Both are examples of non-linear layered feed-forward networks. Both are universal approximators. Neuronale Netze, SS 004 5
Comparison with MLPs Architecture: RBF networks (usually) have one single hidden layer. MLP networks may have more hidden layers. Neuron Model: In RBF the neuron model of the hidden neurons is different from the one of the output nodes. Typically in MLP hidden and output neurons share a common neuron model. The hidden layer of RBF is non-linear, the output layer of RBF is linear. Hidden and output layers of MLP are usually non-linear. Neuronale Netze, SS 004 6
Comparison with MLPs () Activation functions: The argument of activation function of each hidden neuron in a RBF NN computes the Euclidean distance between input vector and the center of that unit. The argument of the activation function of each hidden neuron in a MLP computes the inner product of input vector and the synaptic weight vector of that neuron. Approximation: RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping. MLP construct global approximations to non-linear I/O mapping. Neuronale Netze, SS 004 7