Radial Basis-Function Networks

Similar documents
Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

03/12/15. SAD: 8º Projecto

Influence of weight initialization on multilayer perceptron performance

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Neural Network Training By Gradient Descent Algorithms: Application on the Solar Cell

Mathematics. Circles. hsn.uk.net. Higher. Contents. Circles 1. CfE Edition

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Neural Networks. Tobias Scheffer

Neural networks. Chapter 20. Chapter 20 1

Lab 5: 16 th April Exercises on Neural Networks

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Introduction to Neural Networks

Machine Learning

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

4. Multilayer Perceptrons

7.1 Support Vector Machine

Machine Learning

Lagrangian and Hamiltonian Mechanics

Multilayer Neural Networks

Neural networks III: The delta learning rule with semilinear activation function

Introduction to Markov Processes

Neural Networks and the Back-propagation Algorithm

Lecture 3 Notes. Dan Sheldon. September 17, 2012

Implicit Differentiation. Lecture 16.

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Electric Potential. Slide 1 / 29. Slide 2 / 29. Slide 3 / 29. Slide 4 / 29. Slide 6 / 29. Slide 5 / 29. Work done in a Uniform Electric Field

Convergence of Random Walks

Pattern Classification

Backcalculation of Airport Flexible Pavement Non-Linear Moduli Using Artificial Neural Networks

Matrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan

Table of Common Derivatives By David Abraham

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

Artificial Neural Networks

Least-Squares Regression on Sparse Spaces

IMPLICIT DIFFERENTIATION

arxiv: v5 [cs.lg] 28 Mar 2017

y(x n, w) t n 2. (1)

cosh x sinh x So writing t = tan(x/2) we have 6.4 Integration using tan(x/2) 2t 1 + t 2 cos x = 1 t2 sin x =

Introduction to Machine Learning

Lecture 4: Perceptrons and Multilayer Perceptrons

5-4 Electrostatic Boundary Value Problems

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Artificial Neural Network

Chapter 2 Lagrangian Modeling

WJEC Core 2 Integration. Section 1: Introduction to integration

Neural Networks Analysis of Airfield Pavement Heavy Weight Deflectometer Data

model considered before, but the prey obey logistic growth in the absence of predators. In

Cascaded redundancy reduction

Neural networks. Chapter 20, Section 5 1

Dynamics of Cortical Columns Self-Organization of Receptive Fields

8. Lecture Neural Networks

arxiv: v2 [cs.ds] 11 May 2016

Lecture 1b. Differential operators and orthogonal coordinates. Partial derivatives. Divergence and divergence theorem. Gradient. A y. + A y y dy. 1b.

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Neural Networks DWML, /25

Introduction to Machine Learning

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

water adding dye partial mixing homogenization time

Static Equilibrium. Theory: The conditions for the mechanical equilibrium of a rigid body are (a) (b)

Summary: Differentiation

Calculus and optimization

Comparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope

Data Mining Part 5. Prediction

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Implicit Differentiation

Introduction to Neural Networks: Structure and Training

Euler equations for multiple integrals

Introduction to the Vlasov-Poisson system

Learning Vector Quantization

PH 132 Exam 1 Spring Student Name. Student Number. Lab/Recitation Section Number (11,,36)

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Linear First-Order Equations

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Neural Networks Lecture 4: Radial Bases Function Networks

Basic Principles of Unsupervised and Unsupervised

Diagonalization of Matrices Dr. E. Jacobs

Outline. Calculus for the Life Sciences II. Introduction. Tides Introduction. Lecture Notes Differentiation of Trigonometric Functions

Neural Network Controller for Robotic Manipulator

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

The Press-Schechter mass function

Math 11 Fall 2016 Section 1 Monday, September 19, Definition: A vector parametric equation for the line parallel to vector v = x v, y v, z v

Artificial Neural Networks

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Vectors in two dimensions

ON THE RIEMANN EXTENSION OF THE SCHWARZSCHILD METRICS

How the potentials in different gauges yield the same retarded electric and magnetic fields

arxiv: v1 [hep-ex] 4 Sep 2018 Simone Ragoni, for the ALICE Collaboration

The new concepts of measurement error s regularities and effect characteristics

Statistical Learning Theory. Part I 5. Deep Learning

Applications of the Wronskian to ordinary linear differential equations

THE DISPLACEMENT GRADIENT AND THE LAGRANGIAN STRAIN TENSOR Revision B

Chapter 9 Method of Weighted Residuals

Linear Models for Regression. Sargur Srihari

Transcription:

Raial Basis-Function Networks Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why oes RBF network work

Back-propagation The algorithm gives a prescription for changing the weights w ij in any feeforwar network to learn a training set of input output pairs {x,t } We consier a simple two-layer network x k x x x 3 x 4 x 5

Given the pattern x the hien unit j receives a net input net j = k= w jk x k an prouces the output 5 V j = f (net j ) = f ( w jk x k ) 5 k= Output unit i thus receives 3 net i = W ij V j = (W ij f ( w jk x k )) j= j= k= An prouce the final output 3 3 o i = f (net i ) = f ( W ij V j ) = f ( (W ij f ( w jk x k ))) j= j= k= 3 5 5 3

In our example E becomes E[ w ] = E[ w ] = m = i= m (t i o i ) = i= 3 (t i f ( W ij f ( w jk x k ))) E[w] is ifferentiable given f is ifferentiable Graient escent can be applie j 5 k= For hien-to-output connections the graient escent rule gives: ΔW ij = η E = η W ij ΔW ij = η m = m = (t i o i ) f ' (net i ) V j (t i o i ) ( f ' (net i )) V j δ i = f ' (net i )(t i o i ) m ΔW ij = ηδ i V j = 4

For the input-to hien connection w jk we must ifferentiate with respect to the w jk Using the chain rule we obtain Δw jk = η E = η w jk m = E V V j j w jk Δw jk = η = i= (t i δ i = f ' (net i )(t i o i ) Δw jk = η m δ j = f ' (net j ) Δw jk = η m = i= m = δ j o i ) f ' (net i )W ij f ' (net j ) x k δ i W ij f ' (net j ) x k x k W ij δ i i= 5

Example w ={w =0.,w =0.,w 3 =0.,w 4 =0.,w 5 =0.} w ={w =0.,w =0.,w 3 =0.,w 4 =0.,w 5 =0.} w 3 ={w 3 =0.,w 3 =0.,w 33 =0.,w 34 =0.,w 35 =0.} W ={W =0.,W =0.,W 3 =0.} W ={W =0.,W =0.,W 3 =0.} X ={,,0,0,0}; t ={,0} X ={0,0,0,,}; t ={0,} f (x) = σ(x) = + e ( x) f ' (x) = σ ' (x) = σ(x) ( σ(x)) net = w k x k 5 k= net = w k x k 5 k= net 3 = w 3k x k 5 k= V = f (net ) = + e net net =*0.+*0.+0*0.+0*0.+0*0. V =f(net )=/(+exp(-0.))=0.54983 V = f (net ) = + e net V =f(net )=/(+exp(-0.))=0.54983 V 3 = f (net 3 ) = + e net 3 V 3=f(net 3 )=/(+exp(-0.))=0.54983 6

3 net = W j V j j= o = f (net ) = + e net net =0.54983*0.+ 0.54983*0.+ 0.54983*0.= 0.6495 o = f(net)=/(+exp(- 0.6495))= 0.544 3 net = W j V j j= o = f (net ) = + e net net =0.54983*0.+ 0.54983*0.+ 0.54983*0.= 0.6495 o = f(net)=/(+exp(- 0.6495))= 0.544 ΔW ij = η m (t i o i ) f ' (net i ) V j = We will use stochastic graient escent with η= ΔW ij = (t i o i ) f ' (net i )V j f ' (x) = σ ' (x) = σ(x) ( σ(x)) ΔW ij = (t i o i )σ(net i )( σ(net i ))V j δ i = (t i o i )σ(net i )( σ(net i )) ΔW ij = δ i V j 7

δ = (t o )σ(net )( σ(net )) ΔW j = δ V j δ =(- 0.544)*(/(+exp(- 0.6495)))*(-(/(+exp(- 0.6495))))= 0.394 δ = (t o )σ(net )( σ(net )) ΔW j = δ V j δ =(0-0.544)*(/(+exp(- 0.6495)))*(-(/(+exp(- 0.6495))))= -0.3437 Δw jk = δ i W ij f ' (net j ) x k Δw jk = δ i W ij σ(net j )( σ(net j )) x k δ j = σ(net j )( σ(net j )) Δw jk = δ j x k W ij δ i i= 8

δ = σ(net )( σ(net )) W i δ i i= δ = /(+exp(- 0.))*(- /(+exp(- 0.)))*(0.* 0.394+0.*( -0.3437)) δ = -5.0568e-04 δ = σ(net )( σ(net )) δ = -5.0568e-04 W i δ i i= δ 3 = σ(net 3 )( σ(net 3 )) i= δ 3 = -5.0568e-04 W i3 δ i First Aaptation for x (one epoch, aaptation over all training patterns, in our case x x ) Δw jk = δ j x k ΔW ij = δ i V j δ = -5.0568e-04 δ = 0.394 δ = -5.0568e-04 δ = -0.3437 δ 3 = -5.0568e-04 x = v =0.54983 x = v =0.54983 x 3 =0 v 3 =0.54983 x 4 =0 x 5 =0 9

5/4/ Raial Basis-Function Networks RBF networks train rapily No local minima problems No oscillation Universal approximators Can approximate any continuous function Share this property with fee forwar networks with hien layer of nonlinear neurons (units) Disavantage After training they are generally slower to use 0

Gaussian response function Each hien layer unit computes D i σ h i = e x = an input vector u = weight vector of hien layer neuron i D i = ( x u i ) T ( x u i ) The output neuron prouces the linear weighte sum o = n i= 0 w i h i The weights have to be aopte (LMS) Δw i = η(t o)h i

The operation of the hien layer One imensional input (x u) σ h = e Two imensional input

Every hien neuron has a receptive fiel efine by the basis-function x=u, maximum output Output for other values rops as x eviates from u Output has a significant response to the input x only over a range of values of x calle receptive fiel The size of the receptive fiel is efine by σ u may be calle mean an σ stanar eviation The function is raially symmetric aroun the mean u Location of centers u The location of the receptive fiel is critical Apply clustering to the training set each etermine cluster center woul correspon to a center u of a receptive fiel of a hien neuron 3

Determining σ The object is to cover the input space with receptive fiels as uniformly as possible If the spacing between centers is not uniform, it may be necessary for each hien layer neuron to have its own σ For hien layer neurons whose centers are wiely separate from others, σ must be large enough to cover the gap Following heuristic will perform well in practice For each hien layer neuron, fin the RMS istance between u i an the center of its N nearest neighbors c N j RMS = n c lk n u l= k Assign this value to σ i i= k N 4

5

5/4/ Why oes a RBF network work? The hien layer applies a nonlinear transformation from the input space to the hien space In the hien space a linear iscrimination can be performe 6

Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why oes RBF network work Bibliography Wasserman, P. D., Avance Methos in Neural Computing, New York: Van Nostran Reinhol, 993 Simon Haykin, Neural Networks, Secen eition Prentice Hall, 999 7

Support Vector Machines 8