Neural Networks Lecture 4: Radial Bases Function Networks

Similar documents
Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Neural Networks Lecture 2:Single Layer Classifiers

Artificial Neural Networks. Edward Gatt

Radial-Basis Function Networks

Neural Networks Introduction

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Neural Networks Lecture 3:Multi-Layer Perceptron

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Radial-Basis Function Networks

y(x n, w) t n 2. (1)

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Radial Basis Function (RBF) Networks

Neural Networks and the Back-propagation Algorithm

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Linear & nonlinear classifiers

Neural networks. Chapter 20. Chapter 20 1

Multilayer Neural Networks

Computational Intelligence Lecture 20:Neuro-Fuzzy Systems

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural networks. Chapter 19, Sections 1 5 1

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

CHAPTER IX Radial Basis Function Networks

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Data Mining Part 5. Prediction

CMSC 421: Neural Computation. Applications of Neural Networks

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

Multilayer Perceptron = FeedForward Neural Network

Unit III. A Survey of Neural Network Model

Neural Networks (Part 1) Goals for the lecture

Multilayer Perceptrons and Backpropagation

Revision: Neural Network

Introduction to Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Part 8: Neural Networks

Lecture 5: Logistic Regression. Neural Networks

Artificial Neural Networks. MGS Lecture 2

Pattern Classification

Linear & nonlinear classifiers

Machine Learning Lecture 5

Course 395: Machine Learning - Lectures

Multilayer Neural Networks

Artificial Neural Networks Examination, June 2005

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Simple Neural Nets For Pattern Classification

Neural networks. Chapter 20, Section 5 1

Artificial Neural Networks. Part 2

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Machine Learning

Multilayer Perceptron

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

4. Multilayer Perceptrons

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Multi-layer Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Single layer NN. Neuron Model

The Perceptron Algorithm

Neural Networks DWML, /25

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Christian Mohr

Artificial Neural Networks (ANN)

Intelligent Systems Discriminative Learning, Neural Networks

Artificial Neural Networks 2

Learning and Neural Networks

A New Trust Region Algorithm Using Radial Basis Function Models

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

AI Programming CS F-20 Neural Networks

Introduction to Neural Networks

Neural Networks and Deep Learning

Computational Intelligence Winter Term 2017/18

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial Neural Networks Examination, March 2002

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

An artificial neural networks (ANNs) model is a functional abstraction of the

Computational statistics

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CS:4420 Artificial Intelligence

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

CS 590D Lecture Notes

Artificial Neural Networks

Radial Basis Function Networks: Algorithms

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Reading Group on Deep Learning Session 1

Transcription:

Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 1/20

Introduction Commonly Used Radial Basis Functions Training RBFN Basis Function Optimization Unsupervised Methods Supervised Training Method Finding the Output Weights RBF Applications Classification Comparison of RBFN and MLP H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 2/20

Radial Bases Functions Networks (RBFN) is firstly proposed by Broomhead and Lowe in 1988 Main features They have two-layer feed-forward networks. The hidden nodes implement a set of radial basis functions (e.g. Gaussian functions). The output nodes implement linear summation functions (similar to MLP). The network training is divided into two stages: 1. The weights from the input to hidden layer are determined 2. Then the weights from the hidden to output layer are found. The training/learning is fairly fast. RBF nets have better performance than MLP in some classification problems and function interpolation H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 3/20

RBFN approximates f (x) by following equation f (x) = n w i φ(r) i=1 where r = x c i x R n : input vector c i vector value parameter centroid (first layer weight) wi connection weights in the second layer (from hidden layer to output) φ: activation function should be radially symmetric (i.e. if x1 = x 2 then φ( x 1 ) = φ( x 2 )) Considering φ as Gaussian fcn: φ(r) = exp( x c i 2 ) σ i : pos. valued shaping parameter (width) Training RBFN is a process to find appropriate values of w kj, c ij and σ j 2σ 2 i H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 4/20

Commonly Used Radial Basis Functions 1. Linear Function: φ(r) = r 2. Cubic Function: φ(r) = r 3 3. Gaussian Function φ(r) = exp( r 2 2σ 2 ) 4. Multi-Quadratic φ(r) = (r 2 + σ 2 ) 1/2 5. Generalized Multi-Quadratic φ(r) = (r 2 + σ 2 ) β, 1 > β > 0 6. Inverse Multi-Quadratic φ(r) = (r 2 + σ 2 ) 1/2 7. Generalized Inverse Multi-Quadratic φ(r) = (r 2 + σ 2 ) α, α > 0 8. Thin Plate Spline φ(r) = r 2 ln(r) 9. Shifted Logarithm log(r 2 + σ 2 ) where r = x c 2 H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 5/20

The Gaussian and Inverse Multi-Quadric Functions are localized in the sense that φ(r) 0 as r For all the other mentioned functions: φ(r) as r In RBFNN the hidden layer and output layer play very different role. It is appropriate to use different learning alg. for each: First the hidden node centers are determined Then the output layer weights are trained H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 6/20

Training RBFN: Basis Function Optimization One major advantage of RBFN is possibility of choosing suitable hidden unit without having perform a full nonlinear optimization of the whole network The methods of doing it are categorized to: Unsupervised methods: are particularly useful in situations where labelled data is in short supply, but there is plenty of unlabelled data (i.e. inputs without output targets) Supervised methods: get usually better results H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 7/20

Unsupervised Methods 1. Fixed center selected at random: It is the simplest and quickest approach Centers are selected as fixed M points randomly from N data points Their widths are equal and fixed at an appropriate size for the distribution of data points. The normalized RBF centered at cj are defined φ j (x) = exp( M dm 2 x c j 2 ) c i x p where d m is max distance between chosen centers The widths are σj = dm 2M It ensures individual RBF s are neither too peaked nor too flat For large training sets, this method provides reasonable results H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 8/20

Unsupervised Methods 2. K-Mean Clustering Using clustering techniques provides an improved approach which more accurately reflects the distribution of the data points. Partitions the data points x p into K disjoint subsets S j s.t. the sum of square error is min Each subset contains Nj data points based on min distance rule. K-Mean Alg: 2.1 Choose a set of centers c 1,..., c k arbitrarily 2.2 Assign N samples to the K subsets using the min Euclidean distance rule: x p c i if x p c i < x p c j i j 2.3 After all data points are assigned go to step 2.4 2.4 Compute new subset center (c i ) s.t. min the cost function J i = x S i x c i 2 2.5 If any subset center is changed, return to step 2.2, otherwise stop. H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 9/20

Supervised Training Method This method is generally giving better results than unsupervised procedures but the computational costs are usually enormous. Similar to MLP, this approach performs gradient descent on a sum squared output error function The error function would be E = p (y k (x p ) t p k )2 = p k M ( w kj φ j (x p, c j, σ j ) t p k )2 k j=0 where t p k : kth element of target vector in pth sample data, x p : input vector of pth sample data, y k is network output H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 10/20

The weights/basis function parameters are updated as w jk = η w E E E, c ij = η c, σ j = η σ w jk c ij σ j The learning rates η should be selected carefully to avoid local minima and acceptable convergence rate and error Finding the Output Weights The output weights can be obtained by either supervised or unsupervised methods 1. Unsupervised method: After the input to hidden weights are found (centers), they are kept fixed for the second stage of training during which the hidden to output weights are learned Since the second stage involves just a single layer of weights w jk, they can easily be found analytically by solving a set of linear equations. Usually, this can be done quickly, without the need for a set of iterative weight updates as in gradient descent learning. H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 11/20

Training: Finding the Output Weights 1. Unsupervised method cont d: Given the hidden units activation φ(x, C ij, σ j ), the below series of simple linear equations should be found M y k (x p ) = w kj φ j (x p ) = t p k j=0 This can be rewritten as ΦW T = T, where W kj = {w kj }, Φ pj = {φ j (x p )}, T pk = {t p k }. Therefore W T = Φ T where Φ = (Φ T Φ) 1 Φ T is pseudo inverse of Φ T In practice we tend to use singular value decomposition (SVD) to avoid possible ill-conditioning of Φ. 2. Supervised method: the weights of second layer can be trained by supervised learning method like BP alg2 H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 12/20

Training By Orthogonal Least Squares [?] Incrementally add RBFs 1. Start by one hidden neuron; its center is the data point with lowest error 2. Update the weights and calculate the errors 3. If the error is greater than the desired error then Incrementally add basis functions, one-by one Each time choose a center form the remaining data points Go to step 2 4. Otherwise stop the training H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 13/20

H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 14/20

RBF Applications Classification Suppose we have a data set that falls into three classes: An MLP would separate the classes with hyper-planes in the input planes RBF model the separate class distributions by localized basis functions H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 15/20

Example: XOR Problem Single layer perceptron with step or sigmoidal activation functions can not form the correct outputs, since they can only generate a single decision boundary. We had introduced MLP (an extra layer) H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 16/20

Using RBFN: The Gaussian activation function is φ j = exp( M dmax 2 x cj 2 ) There are 4 patterns and 2 classes Choose M = 2 Choose µ 1 = (0, 0) and µ2 = (1, 1) as center of basis functions d max = 2 So the two basis functions are φ 1 = exp( x µ 1 ), µ 1 = (0, 0) φ 2 = exp( x µ 2 ), µ 2 = (1, 1) H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 17/20

the patterns are now linearly separable Now the output weights should be adjusted: y(x) = w 1 φ 1 + φ 2 w 2 θ Considering the patterns: 0 = 1w 1 + 0.1353w 2 1θ 1 = 0.3678w 1 + 0.3678w 2 1θ 1 = 0.3678w 1 + 0.3678w 2 1θ 0 = 0.1353w 1 + 1w 2 1θ Therefore w 1 = w 2 = 2.5018, θ = 2.8404 H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 18/20

Other applications of RBFN are in Model prediction speech-hand writing recognition control for high dimensional and highly nonlinear systems image processing time series analysis medical diagnosis Comparison of RBFN and MLP Similarities 1. They are both non-linear feed-forward networks. 2. They are both universal approximators. 3. They are using in similar application areas. There always exists an RBFN to accurately mimic a specified MLP, or vice versa Differences 1. An RBFN (in its most basic form) has a single hidden layer. But an MLP may have one or more hidden layers (AN RBF proposed by Lapendes (1991) has two hidden layers where the second hidden layer and output layer are both linear.) H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 19/20

Differences 2. MLPs construct global approximations to nonlinear I/O mappings they are capable of generalization in region of input space that little or no training data are available. But RBF networks tend to use localized nonlinearities (Gaussian func.) at the hidden layer to construct local approximations they can learn fast. 3. Due to local approximation of RBF, they may require larger number of hidden nodes to span the input space adequately as compared to MLP 4. In MLP the computation nodes in different layers share a common neural model (not necessarily the same activation function). But in RBFN, the hidden nodes (basis funcs) operate very different and have different purpose to the output nodes. In RBFN, the argument of each hidden unit activation func is the distance between the input and the weights (RBF centers), But in MLPs it is the inner product of input and the weights H. A. Talebi, Farzaneh Abdollahi Computational Intelligence Lecture 4 20/20