Radial-Basis Function Networks

Similar documents
Radial-Basis Function Networks

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Single layer NN. Neuron Model

Neural Networks Lecture 4: Radial Bases Function Networks

Multi-layer Neural Networks

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Multilayer Neural Networks

Simple Neural Nets For Pattern Classification

Neural Networks DWML, /25

In the Name of God. Lecture 11: Single Layer Perceptrons

3.4 Linear Least-Squares Filter

Introduction to Neural Networks

18.6 Regression and Classification with Linear Models

Neural Networks and the Back-propagation Algorithm

Multilayer Perceptron

Unit III. A Survey of Neural Network Model

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Radial Basis Function (RBF) Networks

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Intelligent Systems Discriminative Learning, Neural Networks

Artificial Neural Network : Training

Course 395: Machine Learning - Lectures

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Multilayer Perceptrons and Backpropagation

Data Mining Part 5. Prediction

Linear & nonlinear classifiers

The Perceptron Algorithm

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Artificial Neural Networks

AI Programming CS F-20 Neural Networks

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Multilayer Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Lecture 5: Logistic Regression. Neural Networks

Linear discriminant functions

In the Name of God. Lectures 15&16: Radial Basis Function Networks

4. Multilayer Perceptrons

Neural Networks (Part 1) Goals for the lecture

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Machine Learning. Neural Networks

Artificial Neural Networks The Introduction

An artificial neural networks (ANNs) model is a functional abstraction of the

Artificial Neural Networks. Edward Gatt

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks. Chapter 19, Sections 1 5 1

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Nonlinear Classification

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

y(x n, w) t n 2. (1)

Learning Vector Quantization

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Artificial Neural Networks Examination, June 2005

Multilayer Perceptron = FeedForward Neural Network

Artificial Neural Networks. MGS Lecture 2

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Artificial Neural Networks

Neural Networks and Deep Learning

Artificial Neural Network

Introduction to Neural Networks

Artificial Neural Networks. Part 2

Pattern Classification

Backpropagation: The Good, the Bad and the Ugly

Lecture 6. Notes on Linear Algebra. Perceptron

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artifical Neural Networks

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Lecture 2: Logistic Regression and Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Lecture 7 Artificial neural networks: Supervised learning

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lab 5: 16 th April Exercises on Neural Networks

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators

Artificial Neuron (Perceptron)

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

Neural networks. Chapter 20. Chapter 20 1

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Machine Learning Lecture 5

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Part 8: Neural Networks

Chapter ML:VI (continued)

Unit 8: Introduction to neural networks. Perceptrons

Artificial Neural Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Artificial Intelligence

CSC242: Intro to AI. Lecture 21

Transcription:

Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated below, where each point is a stored vector used in one. In a network one hidden layer uses neurons with activation functions describing local receptors. Then one output node is used to combine linearly the outputs of the hidden neurons. w3 P w w The vector P is interpolated using the three vectors; each vector gives a contribution that depends on its weight and on its distance from the point P. In the picture we have w < w3 < w 5 ARCHITECTURE x ϕ w x y ϕ m w m x m One hidden layer with activation functions ϕ... ϕ m Output layer with linear activation function. y w ϕ ( x t ) +... + w m ϕ m ( x t m x t distance of x ( x,..., xm ) from vector t ) 5 Rossella Cancelliere

HIDDE EURO MODEL Hidden units: use radial basis functions φ σ ( x - t ) the output depends on the distance of the input x from the center t x φ σ ( x - t ) x x m ϕ σ t is called center σ is called spread center and spread are parameters 5 3 Hidden eurons A hidden neuron is more sensitive to data points near its center. For Gaussian this sensitivity may be tuned by adjusting the spread σ, where a larger spread implies less sensitivity. 5 4 Rossella Cancelliere

Gaussian φ φ : center σ is a measure of how spread the curve is: Large σ Small σ 5 5 Interpolation with The interpolation problem: m Given a set of different points { x R, i L } i and a set of real numbers { R, i L }, find a function F : R m R d i that satisfies the interpolation condition: F ( x i ) d i If F ( x ) w iϕ ( x x i ) i ϕ( x x )...... ϕ( x x )... ϕ( x ϕ( x x x we have: ) ) w... w d... d Φ w 5 6 d Rossella Cancelliere 3

Types of φ Micchelli s theorem: Let { x be a set of distinct points in. Then the -by- interpolation i} i R matrix Φ, whose ji-th element is ϕ ji ϕ ( x j x i ) is nonsingular. Multiquadrics: ϕ ( r ) ( r + c ) Inverse multiquadrics: ϕ ( r ) ( r + c Gaussian functions (most used): r ϕ ( r ) exp σ m ) σ > 0 c > 0 r x t 5 7 Example: the XOR problem Input space: x (0,) (,) (0,0) (,0) x Output space: 0 y Construct an pattern classifier such that: (0,0) and (,) are mapped to 0, class C (,0) and (0,) are mapped to, class C 5 8 Rossella Cancelliere 4

Example: the XOR problem In the feature (hidden layer) space: ϕ ( x t ϕ ( x t ) e ) e x t x t with t (,) and t (0,0) φ.0 (0,0) Decision boundary 0.5 (,) (0,) and (,0) 0.5.0 φ When mapped into the feature space < ϕ, ϕ > (hidden layer), C and C become linearly separable. So a linear classifier with ϕ (x) and ϕ (x) as inputs can be used to solve the XOR problem. 5 9 for the XOR problem x t ϕ ( x t ) e with t (,) and t (0,0) ϕ ( x t ) e x t y e x t x t e x t x t - y - + + If y > 0 then classotherwise class 0 5 0 Rossella Cancelliere 5

network parameters What do we have to learn for a with a given architecture? The centers of the activation functions the spreads of the Gaussian activation functions the weights from the hidden to the output layer Different learning algorithms may be used for learning the network parameters. We describe three possible methods for learning centers, spreads and weights. 5 Learning Algorithm Centers: are selected at random centers are chosen randomly from the training set Spreads: are chosen by normalization: Maximum distance between any centers σ number of centers d max m Then the activation function of hidden neuron becomes: i ϕ i m ( ) x ti exp x t i d max 5 Rossella Cancelliere 6

Learning Algorithm Weights: are computed by means of the pseudo-inverse method. For an example ( x i, d ) consider the output of i the network y( xi ) w ϕ( xi t ) +... + wm ϕ m ( xi tm ) We would like y ( x i ) d i for each example, that is w ϕ ( xi t ) +... + wmϕ m( xi tm ) d i 5 3 Learning Algorithm This can be re-written in matrix form for one example w ϕ ( x i t )... ϕm x i tm... ( ) d w m and [ ] i ϕ( x... ϕ( x t )... ϕ m t )... ϕ ( x t m ( x m w... w m for all the examples at the same time m t ) ) [ d... d ] T 5 4 Rossella Cancelliere 7

Learning Algorithm let then we can write ϕ( x t )... Φ... ϕ( x t )... w Φ... ϕ ( x m m ϕ ( x w m t t m m d... d ) ) So the unknown vector w is the solution of the linear systems Φw d or T Φ Φw Φ T d 5 5 Pseudoinverse matrix If we define the matrix formula Φ Φ T T ( Φ Φ) Φ + as the pseudo-inverse of we can obtain the weights using the following T + T [ w... wm] Φ [ d... d ] Φ Φ The evaluation of the entire matrix. + usually requires to store in memory 5 6 Rossella Cancelliere 8

Pseudoinverse matrix Φ A good idea is to divide in Q subsets each made by an arbitrary number of rows (i..q) a i Then we build a set of new matrices named Ai; in Ai the only set of rows different from zero is the i-th subset of matrix Φ so we have Φ Q i A i Φ A A A3 5 7 Pseudoinverse matrix Q + If we replace Φ in the definition of Φ we have: i A i We define 5 8 Rossella Cancelliere 9

Pseudoinverse matrix 5 9 Pseudoinverse matrix 5 0 Rossella Cancelliere 0

Pseudoinverse matrix, so 5 Pseudoinverse matrix 5 Rossella Cancelliere

Learning Algorithm : summary. Choose the centers randomly from the training set.. Compute the spread for the function using the normalization method. 3. Find the weights using the pseudo-inverse method. 5 3 Learning Algorithm : Centers clustering algorithm for finding the centers Initialization: t k (0) random k,, m Sampling: draw x from input space 3 Similarity matching: find index of center closer to x k(x) arg min 4 Updating: adjust centers t k (n + ) x(n) t (n) 5 Continuation: increment n by, goto and continue until no noticeable changes of centers occur k [ x(n) t (n)] if k k(x) tk (n) +η k t k (n) otherwise 5 4 k Rossella Cancelliere

Learning Algorithm : summary Hybrid Learning Process: Clustering for finding the centers. Spreads chosen by normalization. LMS algorithm (see Adaline) for finding the weights. 5 5 Learning Algorithm 3 Apply the gradient descent method for finding centers, spread and weights, by minimizing the (instantaneous) squared error E ( y( x) d Update for: centers spread weights η E t j t j t j σ j η σ j w ij E σ E η ij w ij j ) 5 6 Rossella Cancelliere 3

Comparison with FF -etworks are used for regression and for performing complex (non-linear) pattern classification tasks. Comparison between networks and FF: Both are examples of non-linear layered feed-forward networks. Both are universal approximators. 5 7 Comparison with multilayer Architecture: networks have one single hidden layer. FF networks may have more hidden layers. euron Model: In the neuron model of the hidden neurons is different from the one of the output nodes. Typically in FF hidden and output neurons share a common neuron model. The hidden layer of is non-linear, the output layer of is linear. Hidden and output layers of FF are usually non-linear. 5 8 Rossella Cancelliere 4

Comparison with multilayer Activation functions: The argument of activation function of each hidden neuron in a computes the Euclidean distance between input vector and the center of that unit. The argument of the activation function of each hidden neuron in a FF computes the inner product of input vector and the synaptic weight vector of that neuron. Approximation: using Gaussian functions construct local approximations to non-linear I/O mapping. FF construct global approximations to non-linear I/O mapping. 5 9 Example: Astronomical image processing We used neural networks to verify the similarity of real astronomical images to predefined reference profiles, so we analyse realistic images and deduce from them a set of parameters able to describe their discrepancy with respect to the ideal, non-aberrated image. The focal plane image, in an ideal case, is the Airy function (blue line): I () r P S λ R p J πr ( πr ) Usually the image is perturbed by aberrations associated to the characteristics of real-world instruments (red line represents the perturbed image) 5 30 Rossella Cancelliere 5

Example: Astronomical image processing A realistic image is associated to the Fourier transform of the aperture function: I PS π p iφ( ρ, θ ) iπrρ cos( θ φ ) ( r, φ) dρ dθ ρ e e π λ R 0 0 which includes the contributions of the classical (Seidel) aberrations: Φ π 4 3 ( ρ, θ ) [ A ρ + A ρ cosθ + A ρ cos θ + A ρ + A ρ cosθ ] λ A s : Spherical aberration s c a A c : Coma Aa d : Astigmatism t A d : Field curvature (d defocus) A t : Distortion (t tilt) 5 3 Example: Astronomical image processing - The maximum range of variation considered is ± 0.3λ for the training set and ± 0.5λ for the test set - To ease the computational task the image is encoded using the first moments:,0 0, 0, 3 0, 4,,, 3,,, 3 where: nm x y x y dx dy σ x σ y dx dy I ( x, y) n m I ( x, y) 5 3 Rossella Cancelliere 6

Example: Astronomical image processing We performed this task using a F neural network (with 60 hidden nodes) and a neural network (with 75 hidden nodes) Usually the former requires less internal nodes, but more training iterations than the latter The training required 000 iterations (F) and 75 iterations (); both optimised networks provide reasonably good results in terms of approximation of the desired unknown function relating aberrations and moments. 5 33 Example: Astronomical image processing Performances of the sigmoidal (left) and radial (right) neural networks; from top to bottom, coma, astigmatism, defocus and distortion are shown. The blue (solid) line follows the test set targets, whereas the green dots are the values computed by the networks. 5 34 Rossella Cancelliere 7