Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1
Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion 2
I. Introduction 3 Hebbian Learning by McCulloch and Pitts 1949 1, Perceptron by Frank Rosenblatt 1958 2, ADALINE by Bernard Widrow and Ted Hoff 1960 3,4, Associative Memory by Teuvo Kohonen 1972 5, Hopfield Networks by John Hopfield 1982 6, Backpropagation by David Rumelhart and James McCelland 1986 7, Radial-basis Function by M. Powell 1987 8. 1. D. O. Hebb, The Organization of Behavior, New York: Wiley, 1949 2. F. Rosenblatt, the Perceptron: Aprobabilistic Model for Information Storage and organization in the Brain, Psychological Review, Vol. 65, pp. 386-408, 1958. 3. B. Widrow and M. E. Hoff, Adaptive Switching Circuits, IRE Wescon Convention Record, New York: IRE Part 4, pp. 96-104, 1960. 4. M. Minsky and S. Papert, Perceptrons, Cambrige, MA: MIT Press, 1969. 5. T. Kohonen, Correlation Matrix Memories, IEEE Transactions on Computers, Vol. 21, pp. 353-359, 1972. 6. J. J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proceedings of the National Academy of Sciences, Vol. 79, pp.2554-2558, 1982. 7. D. E. Rumelhart and J. L. McClelland, eds., Parallel Distributied Processing: Exploration in the Microstructure of Cognition, Vol. 1, Cambridge, MA: MIT Press, 1986. 8. M. J. D. Powell, Radial basis Function for Multivariable Interpolation: A review, in Algorithems for the approximation of Functions and Data, J, England: Clarendon Press, pp. 143-167, 1987.
II. Supervised Learning i. Hebbian Learning ii. Perceptron Learning iii. Adeline Learning iv. Back propagation Learing 4
i. Hebbian Learning w( K + 1) = w( k) +αt( k) x( k) T α 5
ii. Perceptron Learning Rosenblatt introduced perceptron learning to overcome Hebbian learning limitation. w( K + 1) = w( k) + α( t( k) o( k)) x( k) T b( K + 1) = b( k) + α( t( k) o( k)) x(k) is the input vector at the moment k t is the desired target O is the network output corresponding to the input b is the bias vector 6 α is learning rate w(k) is the weight matrix at the moment k Issue: The Perceptron learning cannot be used for nonlinear separable classification.
iii. Adeline Learning The Adaline network is similar to the Perceptron network The activation function. Adeline activation function is linear, instead of sigmoid w( K + 1) = w( k) + 2 α( t( k) o( k)) x( k) T b( K + 1) = b( k) + 2 α( t( k) o( k)) x(k) is the input vector at the moment k t is the desired target O is the network output corresponding to the input b is the bias vector α is learning rate w(k) is the weight matrix at the moment k Issue: The Adeline learning cannot be used for nonlinear separable classification. 7
iv. Back Propagation Learning Back propagation learning (BPL) is a powerful tool, which can be used for classification of nonlinear separable classes. BPL consists of the following three steps in each epoch. 1. Propagate the input forward a0 = p ( ) a = f W a + b m m m m 1 m 2. Propagate the error backward s = 2 F ( n )( t a ) m m m m ( )( ) T s = F n W s m 1 m 1 m 1 m m 3. Update weights and biases 8 ( + 1) = ( ) α ( 1 ) ( + 1) = ( ) α W k W k s a m m m m b k b k s m m m T
iii. BPL Numerical Example Approximate the following function: π t( p) = 1+ cos p 4 9
Numerical Example Con t Propagate the input forward 10
Numerical Example Con t Estimated error 11
Numerical Example Con t Propagate error backward 12
Numerical Example Con t Propagate error backward 13
Numerical Example Con t Update Two-Layer Neural Network 14
Numerical Example Con t Update Two Layer Neural Network 15
Numerical Example Con t Propagate the input forward 16
Numerical Example Con t Estimated Error 17 0.63
Numerical Example Con t FFN for dynamic system simulation: y( k) y k + = + u k 1 + y( k) 2 ( 1) ( ) Two-layer feed forward neural network verses system dynamic 18
III. Hybrid Learning i. Hybrid Learning Theory ii. Radial-basis Function Network i. Hidden Layer Training ii. Feed forward layer Training iii. RBFN Example 19
i.hybrid Learning Theory Hybrid learning is combinational learning techniques. The learning is used to Overcome the weakness of supervised learning Train networks for better training performance. Radial-basis function network (RBFN) and Adaptive Neural Fuzzy System(ANFIS) are the most well known networks, which exploit the hybrid learning. 20
ii. RBFN Application and Theory Radial Basis Function network Applications are Dynamic System simulation Pattern Classification Prediction Control General Formula for RBFN ( ) = ( ) O x w g x j ij i i= 1 i ( ) g x = e n 2 1 x mi 2 σ i 21 x is the input vector, m i is the center of the i-th receptive field, σ i is the width of the i-th receptive field, w ij is the connection weight between the i-th receptive field unit and j-th output g i is the output of i-th receptive field unit.
A. RBFN Hidden Layer Training 1) Forward Pass Training a) Classical Statistic Training b) Learning Vector Quantization 2) BackWard Pass Training 22
1)Forward Pass Training a) Classical Statistic Training m i σ = i = n i j= 1 n n i T ( x j mi ) ( x j mi ) j= 1 i x j n 1 i n i is the number of data samples which contribute in forming the i-th RBF b) Learning Vector Quantization If then min L x m = x m xi Cw i i i k k = 1 The center and width of the winner cluster are updated as follows: [ ] m ( t + 1) = m ( t) + α x m ( t) i i i i n 2 ( 1) ( 1) i x j mi t + x j mi t + σ i ( t + 1) = σ i ( t) + n 1 n 1 i i T 23
2) BackWard Pass Training the error propagate from the output end toward the input. The centers and widths of RBF nodes are modified in the same way is mentioned in the PBL section while the weights in the forward layer are fixed. 24
iv. RBFN Example Figure 1: RBF network approximation 25
IV. BPL verses RBFN Approximate the following function: π t( p) = 1+ sin p 4 26
BPL verses RBFN (Con t) a) RBFN is an exact approximation network while BPL might fails if Gradient descent trap in local minimum. b) RBFN Training is faster than FFN trained by BPL. c) To add more constraint to a trained RBFN, we can train the network only with the new constraint, but adding more constraint to the trained FFN after a course of training requires to repeat whole training process again, which is not advisable. 27
Supervised verses Hybrid Learning a) Single training is not robust enough to guarantee highest efficiency. b) Smarter networks can be train by employing Hybrid learning. c) In supervised learning if the algorithm fail, the network performance will decline. d) In hybrid learning if one algorithm fail the other algorithm can recover the failure of the network. e) Combination of two or more algorithms can speed up the training performance 28
29 End