Radial Basis Function Networks: Algorithms

Similar documents
Feedback-error control

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Principal Components Analysis and Unsupervised Hebbian Learning

Recent Developments in Multilayer Perceptron Neural Networks

E( x ) = [b(n) - a(n,m)x(m) ]

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Approximating min-max k-clustering

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Generation of Linear Models using Simulation Results

Session 5: Review of Classical Astrodynamics

ECE 534 Information Theory - Midterm 2

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

E( x ) [b(n) - a(n, m)x(m) ]

Research of power plant parameter based on the Principal Component Analysis method

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

State Estimation with ARMarkov Models

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Convex Optimization methods for Computing Channel Capacity

Session 12 : Monopropellant Thrusters

Fuzzy Automata Induction using Construction Method

Neural Networks Lecture 4: Radial Bases Function Networks

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Dynamic System Eigenvalue Extraction using a Linear Echo State Network for Small-Signal Stability Analysis a Novel Application

arxiv: v1 [physics.data-an] 26 Oct 2012

On Wald-Type Optimal Stopping for Brownian Motion

Notes on Instrumental Variables Methods

Best approximation by linear combinations of characteristic functions of half-spaces

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

1 University of Edinburgh, 2 British Geological Survey, 3 China University of Petroleum

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Fig. 21: Architecture of PeerSim [44]

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Chapter 1 Fundamentals

Numerical Linear Algebra

Recursive Estimation of the Preisach Density function for a Smart Actuator

Distributed Rule-Based Inference in the Presence of Redundant Information

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

Finite Mixture EFA in Mplus

ESTIMATION OF THE OUTPUT DEVIATION NORM FOR UNCERTAIN, DISCRETE-TIME NONLINEAR SYSTEMS IN A STATE DEPENDENT FORM

PCA fused NN approach for drill wear prediction in drilling mild steel specimen

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Estimation of the large covariance matrix with two-step monotone missing data

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Simple geometric interpretation of signal evolution in phase-sensitive fibre optic parametric amplifier

Figure : An 8 bridge design grid. (a) Run this model using LOQO. What is the otimal comliance? What is the running time?

A Recursive Block Incomplete Factorization. Preconditioner for Adaptive Filtering Problem

Linear diophantine equations for discrete tomography

MATH 2710: NOTES FOR ANALYSIS

Optimal array pattern synthesis with desired magnitude response

Downloaded from jhs.mazums.ac.ir at 9: on Monday September 17th 2018 [ DOI: /acadpub.jhs ]

Hidden Predictors: A Factor Analysis Primer

General Linear Model Introduction, Classes of Linear models and Estimation

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Learning Sequence Motif Models Using Gibbs Sampling

Coding Along Hermite Polynomials for Gaussian Noise Channels

On Fractional Predictive PID Controller Design Method Emmanuel Edet*. Reza Katebi.**

Adaptive estimation with change detection for streaming data

Statics and dynamics: some elementary concepts

Learning Vector Quantization (LVQ)

The non-stochastic multi-armed bandit problem

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Cryptanalysis of Pseudorandom Generators

Positive decomposition of transfer functions with multiple poles

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

An Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem

Spectral Analysis by Stationary Time Series Modeling

Lecture 21: Quantum Communication

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

FE FORMULATIONS FOR PLASTICITY

Improved Identification of Nonlinear Dynamic Systems using Artificial Immune System

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Learning Vector Quantization

A M,ETHOD OF MEASURING THE RESISTIVITY AND HALL' COEFFICIENT ON LAMELLAE OF ARBITRARY SHAPE

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Neural network models for river flow forecasting

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

SHAPE OPTOMIZATION OF H-BEAM FLANGE FOR MAXIMUM PLASTIC ENERGY DISSIPATION

Variable Selection and Model Building

Topology Optimization of Three Dimensional Structures under Self-weight and Inertial Forces

Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

The analysis and representation of random signals

Generalized Coiflets: A New Family of Orthonormal Wavelets

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research

STK4900/ Lecture 7. Program

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

Analysis of Group Coding of Multiple Amino Acids in Artificial Neural Network Applied to the Prediction of Protein Secondary Structure

On parameter estimation in deformable models

Finding Shortest Hamiltonian Path is in P. Abstract

Transcription:

Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4. Training an RBF Network 5. Unsuervised Otimization of the Basis Functions 6. Finding the Outut Weights

The Radial Basis Function (RBF) Maing We are working within the standard framework of function aroximation. We have a set of N data oints in a multi-dimensional sace such that every D dimensional inut vector x i = { x : i = 1,..., D} has a corresonding K dimensional target outut t = { tk : k = 1,..., K}. The target oututs will generally be generated by some underlying functions g k ( x ) lus random noise. The goal is to aroximate the g k ( x ) with functions y k ( x ) of the form y M ( x) = w φ ( x) k k = 0 We shall concentrate on the case of Gaussian basis functions φ ( x) = ex x 2σ µ 2 2 in which we have basis centres { µ } and widths { σ }. Naturally, the way to roceed is to develo a rocess for finding the aroriate values for M, { w k }, { µ i } and { σ }. L13-2

The RBF Network Architecture We can cast the RBF Maing into a form that resembles a neural network: 1 K oututs y k weights w k 1 M basis functions φ (x i, µ i, σ ) weights µ i 1 D inuts x i The hidden to outut layer art oerates like a standard feed-forward MLP network, with the sum of the weighted hidden unit activations giving the outut unit activations. The hidden unit activations are given by the basis functions φ ( x, µ, σ ), which deend on the weights { µ i, σ } and inut activations { x i } in a non-standard manner. L13-3

Comutational Power of RBF Networks Intuitively, we can easily understand why linear suerositions of localised basis functions are caable of universal aroximation. More formally: Hartman, Keeler & Kowalski (1990, Neural Comutation, vol.2,.210-215) rovided a formal roof of this roerty for networks with Gaussian basis functions in which the widths { σ } are treated as adustable arameters. Park & Sandberg (1991, Neural Comutation, vol.3,.246-257; and 1993, Neural Comutation, vol.5,.305-316) showed that with only mild restrictions on the basis functions, the universal function aroximation roerty still holds. As with the corresonding roofs for MLPs, these are existence roofs which rely on the availability of an arbitrarily large number of hidden units (i.e. basis functions). However, they do rovide a theoretical foundation on which ractical alications can be based with confidence. L13-4

Training RBF Networks The roofs about comutational ower tell us what an RBF Network can do, but nothing about how to find all its arameters/weights { w k, µ i, σ }. Unlike in MLPs, in RBF networks the hidden and outut layers lay very different roles, and the corresonding weights have very different meanings and roerties. It is therefore aroriate to use different learning algorithms for them. The inut to hidden weights (i.e. basis function arameters { µ i, σ }) can be trained (or set) using any of a number of unsuervised learning techniques. Then, after the inut to hidden weights are found, they are ket fixed while the hidden to outut weights are learned. Since this second stage of training involves ust a single layer of weights { w k } and linear outut activation functions, the weights can easily be found analytically by solving a set of linear equations. This can be done very quickly, without the need for a set of iterative weight udates as in gradient descent learning. L13-5

Basis Function Otimization One maor advantage of RBF networks is the ossibility of choosing suitable hidden unit/basis function arameters without having to erform a full non-linear otimization of the whole network. We shall now look at several ways of doing this: 1. Fixed centres selected at random 2. Orthogonal least squares 3. K-means clustering These are all unsuervised techniques, which will be articularly useful in situations where labelled data is in short suly, but there is lenty of unlabelled data (i.e. inuts without outut targets). Next lecture we shall look at how we might get better results by erforming a full suervised non-linear otimization of the network instead. With either aroach, determining a good value for M remains a roblem. It will generally be aroriate to comare the results for a range of different values, following the same kind of validation/cross validation methodology used for otimizing MLPs. L13-6

Fixed Centres Selected At Random The simlest and quickest aroach to setting the RBF arameters is to have their centres fixed at M oints selected at random from the N data oints, and to set all their widths to be equal and fixed at an aroriate size for the distribution of data oints. Secifically, we can use normalised RBFs centred at { µ } defined by φ ( x) = ex x 2σ µ 2 2 where { µ } { x } and the σ are all related in the same way to the maximum or average distance between the chosen centres µ. Common choices are σ d 2M = max or σ = 2d ave which ensure that the individual RBFs are neither too wide, nor too narrow, for the given training data. For large training sets, this aroach gives reasonable results. L13-7

Orthogonal Least Squares A more rinciled aroach to selecting a sub-set of data oints as the basis function centres is based on the technique of orthogonal least squares. This involves the sequential addition of new basis functions, each centred on one of the data oints. At each stage, we try out each otential Lth basis function by using the N L other data oints to determine the networks outut weights. The otential Lth basis function which leaves the smallest residual outut sum squared outut error is used, and we move on to choose which L+1th basis function to add. This sounds wasteful, but if we construct a set of orthogonal vectors in the sace S sanned by the vectors of hidden unit activations for each attern in the training set, we can calculate directly which data oint should be chosen as the next basis function. To get good generalization we generally use validation/cross validation to sto the rocess when an aroriate number of data oints have been selected as centres. L13-8

K-Means Clustering A otentially even better aroach is to use clustering techniques to find a set of centres which more accurately reflects the distribution of the data oints. The K-Means Clustering Algorithm icks the number K of centres in advance, and then follows a simle re-estimation rocedure to artition the data oints { x } into K disoint sub-sets S containing N data oints to minimize the sum squared clustering function J K x µ = 1 S = 2 where µ is the mean/centroid of the data oints in set S given by = 1 µ N S x Once the basis centres have been determined in this way, the widths can then be set according to the variances of the oints in the corresonding cluster. L13-9

Dealing with the Outut Layer Given the hidden unit activations φ ( x, µ, σ ) are fixed while we determine the outut weights { w k }, we essentially only have to find the weights that otimise a single layer linear network. As with MLPs we can define a sum-squared outut error measure 2 2 ( k k ) k 1 E = y ( x ) t but here the oututs are a simle linear combination of the hidden unit activations, i.e. y k M ( x ) = w φ ( x ) = 0 k. At the minimum of E the gradients with resect to all the weights w ki will be zero, so E w ki M = wkφ ( x ) tk φi( x ) = = 0 0 and linear equations like this are well known to be easy to solve analytically. L13-10

Comuting the Outut Weights Our equations for the weights are most conveniently written in matrix form by defining matrices with comonents (W) k = w k, (Φ) = φ (x ), and (T) k = {t k }. This gives Φ T T ( Φ W T) = 0 and the formal solution for the weights is W T = Φ T in which we have the standard seudo inverse of Φ T 1 Φ ( Φ Φ) Φ which can be seen to have the roerty Φ Φ = I. We see that the network weights can be comuted by fast linear matrix inversion techniques. In ractice we tend to use singular value decomosition (SVD) to avoid ossible ill-conditioning of Φ, i.e. Φ T Φ being singular or near singular. T L13-11

Overview and Reading 1. We began by defining Radial Basis Function (RBF) maings and the corresonding network architecture. 2. Then we considered the comutational ower of RBF networks. 3. We then saw how the two layers of network weights were rather different and different techniques were aroriate for training each of them. 4. We first looked at several unsuervised techniques for carrying out the first stage, namely otimizing the basis functions. 5. We then saw how the second stage, determining the outut weights, could be erformed by fast linear matrix inversion techniques. Reading 1. Bisho: Sections 5.2, 5.3, 5.9, 5.10, 3.4 2. Haykin: Sections 5.4, 5.9, 5.10, 5.13 L13-12