Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Similar documents
Radial-Basis Function Networks

Radial-Basis Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Single layer NN. Neuron Model

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Radial Basis Function (RBF) Networks

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

In the Name of God. Lecture 11: Single Layer Perceptrons

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

The Perceptron Algorithm

3.4 Linear Least-Squares Filter

y(x n, w) t n 2. (1)

Linear discriminant functions

Linear Models for Regression

Neural Networks (Part 1) Goals for the lecture

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Multilayer Perceptron

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Unit III. A Survey of Neural Network Model

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Revision: Neural Network

Artificial Neural Networks The Introduction

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Lecture 6. Notes on Linear Algebra. Perceptron

Neural Networks and Deep Learning

Course 395: Machine Learning - Lectures

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Neural Networks DWML, /25

Artificial Neural Networks. Edward Gatt

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Computational statistics

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Introduction to Neural Networks

Computational Intelligence Winter Term 2017/18

Multi-layer Neural Networks

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Neural Networks and the Back-propagation Algorithm

Supervised Learning. George Konidaris

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Computational Intelligence

Part 8: Neural Networks

Neural networks and support vector machines

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Midterm: CS 6375 Spring 2015 Solutions

Lab 5: 16 th April Exercises on Neural Networks

Support Vector Machine (SVM) and Kernel Methods

Neural Networks. Volker Tresp Summer 2015

Multilayer Perceptron

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Lecture 5: Logistic Regression. Neural Networks

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Linear & nonlinear classifiers

Multilayer Perceptrons and Backpropagation

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

4. Multilayer Perceptrons

10-701/ Machine Learning, Fall

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

ECE521 Lectures 9 Fully Connected Neural Networks

Learning Vector Quantization

Neural networks III: The delta learning rule with semilinear activation function

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Artificial Neural Network : Training

Data Mining Part 5. Prediction

Linear & nonlinear classifiers

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Lecture 4: Perceptrons and Multilayer Perceptrons

Logistic Regression. Stochastic Gradient Descent

Learning Vector Quantization (LVQ)

CSC242: Intro to AI. Lecture 21

Linear Models for Regression. Sargur Srihari

Multilayer Perceptron = FeedForward Neural Network

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

CSC 578 Neural Networks and Deep Learning

Machine Learning 4771

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

AI Programming CS F-20 Neural Networks

The exam is closed book, closed notes except your one-page cheat sheet.

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Nonlinear Classification

Least-Squares Temporal Difference Learning based on Extreme Learning Machine

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

Artificial Intelligence

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Midterm exam CS 189/289, Fall 2015

18.6 Regression and Classification with Linear Models

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Statistical Machine Learning from Data

Bidirectional Representation and Backpropagation Learning

Chapter ML:VI (continued)

Transcription:

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - SS 004 Holger Fröhlich (abg. Vorl. von S. Kaushik¹) Lehrstuhl Rechnerarchitektur, Prof. Dr. A. Zell ¹www.cse.iitd.ernet.in/~saroj

Radial Basis Function (RBF) = non increasing function of the distance between two input vectors. RBFs represent local receptors Hidden layer uses neurons with RBF activation functions describing local receptors. Output node is used to combine linearly the outputs of the hidden neurons. w3 w w The output of the red vector is interpolated using the three green vectors, where each vector gives a contribution that depends on its weight and on its distance from the red point. In the picture we have w < w3 < w Neuronale Netze, SS 004

RBF Architecture x ϕ w x f(x,,x d ) ϕ w m x d One hidden layer with RBF activation functions Output layer with linear activation function. f x = wϕ x - t + + w m ϕ x - t m x-t x= x x d t ( ) ( )... ( ) distance of (,..., ) from vector Neuronale Netze, SS 004 3

Hidden Neuron Model Hidden units: use radial basis functions φ σ ( x - t ) the output depends on the distance of the input x from the center t x φ σ ( x - t ) x x d ϕ σ t is called center σ is called spread center and spread are parameters Neuronale Netze, SS 004 4

Hidden Neurons A hidden neuron is more sensitive to data points near its center. For Gaussian RBF this sensitivity may be tuned by adjusting the spread σ, where a larger spread implies less sensitivity. Biological example: cochlear stereocilia cells (in our ears...) have locally tuned frequency responses. Neuronale Netze, SS 004 5

Gaussian RBF φ : center σ is a measure of how spread the curve is: Large σ Small σ Neuronale Netze, SS 004 6

Types of φ Multiquadrics: Inverse multiquadrics: ϕ( r ) = ( r + c ϕ( r) = ( r + c ) Gaussian functions (most used): ) c > 0 r = x-t c > 0 ϕ( r) r exp σ = σ > 0 Neuronale Netze, SS 004 7

Example: the XOR problem Input space: x (0,) (,) (0,0) (,0) x Output space: 0 y Construct an RBF pattern classifier such that: (0,0) and (,) are mapped to 0, class C (,0) and (0,) are mapped to, class C Neuronale Netze, SS 004 8

Example: the XOR problem In the feature (hidden layer) space: ϕ ( x-t ) = ϕ ( x-t ) = e x-t e x-t φ.0 (0,0) with t = (,) and t = Decision boundary (0,0) 0.5 (,) (0,) and (,0) 0.5 When mapped into the feature space < ϕ, ϕ > (hidden layer), C and C become linearly separable. So a linear classifier with ϕ (x) and ϕ (x) as inputs can be used to solve the XOR problem..0 φ Neuronale Netze, SS 004 9

RBF NN for the XOR problem ϕ ( x-t ) ϕ ( x-t ) = = e e x-t x-t with t = (,) and t = (0,0) x x t t x-t x-t ( ) = + f x e e - f(x,x ) - + If f(x) > 0 then class otherwise class 0 Neuronale Netze, SS 004 0

RBF network parameters What do we have to learn for a RBF NN with a given architecture? The centers of the RBF activation functions the spreads of the Gaussian RBF activation functions the weights from the hidden to the output layer Different learning algorithms may be used for learning the RBF network parameters. (Pseudo)-Inverse method Regularization Iterative Training Neuronale Netze, SS 004

Learning the weights () Direct computation of the weights: ( x, y ) For an example consider the output of the network i i f( x ) = wϕ( x t ) +... + w ϕ( x t ) i i N i N f( x ) = We would like for each of N examples, that is i y i wϕ( x t ) +... + w ϕ( x t ) = y i N i N i Neuronale Netze, SS 004

Learning the weights () This can be re-written in matrix form for one example [ x-t ] i x-t i N ϕ( )... ϕ ( ) [ w... w ] T = y N i and ϕ( x-t )... ϕ( x-tn ) T... [ w... wn] = [ y... yn] ϕ( x )... ϕ( ) N -t xn -tn T for all the N examples at the same time Neuronale Netze, SS 004 3

Learning the weights (3) Let ϕ( x-t )... ϕ( xn -tn ) Φ=... ϕ( x )... ϕ( ) N -t xn -tn then we can write and hence w y Φ...... = w N y N T =Φ N [ w... w ] [ y... y ] N T Neuronale Netze, SS 004 4

Approximation Idea: restrict to m < N basis functions => better generalization capability, fewer weights and better running time ϕ( x-t )... ϕ( x -tm ) Φ=... ϕ( x )... ϕ( ) N -t xn -tm Neuronale Netze, SS 004 5

Approximation () w y We have Φ...... = and hence w m y N T =Φ + m [ w... w ] [ y... y ] N T + T T where Φ = ( Φ Φ) Φ is the Penrose - Moore pseudoinverse Neuronale Netze, SS 004 6

Solution by means of regularization [Variationsrechnung, PogGir, 989] Idea: Minimize regularized risk N Rreg[ f] = ( yi f( xi)) + λ Pf i= P is a differential operator m wiw jϕ t,t i j i, j= Possible choice: e.g. Pf = ( ) Neuronale Netze, SS 004 7

Regularization () ϕ( t -t )... ϕ( t -t ) Φ= ϕ( t )... ϕ( ) m -t tm -tm m ' With... N Rreg[ f] = ( yi f( xi)) + λ Pf i= N m m ( yi wjϕ( x,t i j)) λ ww i jϕ( t,t i j) i= j= i, j= = + N N m N m m yi yw i jϕ( x,t i j) wjϕ( x,t i j) + λ i= i= j= i= j= i, j= = + ww i jϕ( t,t i j T T T T T T ' = yy wφ y+ w (ΦΦ)w + λw Φ w T T ' T T T = w (ΦΦ+ λφ ) w w Φ y+ y y Neuronale Netze, SS 004 8

Regularization (3) R! reg[ f] T ' T = ( ΦΦ+ λφ ) w-φ y = 0 w T ' T ( ΦΦ+ λφ ) w = Φ y T ' T w ΦΦ λφ Φy = ( + ) With λ = 0 we get exactly the same solution as with the Pseudo-inverse method! λ is a regularization constant which models the tradeoff between model complexity and training error Neuronale Netze, SS 004 9

Computing the RBF spreads and centers Centers:. centers are chosen randomly from the training set. centers are computed using LVQ or some clustering algorithm (e.g. k-means) Spreads: are chosen by normalization: Maximum distance between any centers σ = = number of centers Then the activation function of hidden neuron becomes: m ( ) ϕ x-ti = exp x-t i d max d max m Neuronale Netze, SS 004 0

LVQ clustering clustering algorithm for finding the centers Initialization: t k (0) random k =,, m Sampling: draw x from input space 3 Similarity matching: find index of center closer to x k(x) = 4 Updating: adjust centers arg min x(n) (n) t [ ] k (n) +η x(n) tk(n) if k = k(x) t k (n + ) = t (n) η [ x(n) t (n)] otherwise k 5 Continuation: increment n by, goto and continue until no noticeable changes of centers occur k k t k Neuronale Netze, SS 004

Iterative Training of RBF networks Apply the gradient descent method for finding centers, spread and weights, by minimizing the (instantaneous) squared error Update for: centers spread weights N E = ( f ( x i ) yi ) i = t j = t = η E t σ j η σ w j = j E σ j η E w j Neuronale Netze, SS 004

Iterative Training for usage of Gaussian RBF E w j N = ( y f( x )) ϕ( x -t ) i= N E = ( yi f( xi)) wjϕ( xi -tj )( xi -tj) t j E σ j i= N 3 = ( y f( x )) wϕ( x -t ) / σ i= i i i i j i j i j This is for batch modus, but online training is also possible Also more than one output neuron possible Neuronale Netze, SS 004 3

Possible extension: Hyper Basis Function (HBF) networks Up to now: linear activation function in output unit Now: sigmoid activation function Addition of weighted input vector to output (realized by shortcut connections from input to output units) m d f( x) = ς wjϕ( x-t j ) + ωkxk + θ) j= k= with ς being a sigmoidial function Neuronale Netze, SS 004 4

Comparison with MLP networks RBF-Networks are used for regression and for performing complex (non-linear) pattern classification tasks. Comparison between RBF networks and MLPs: Both are examples of non-linear layered feed-forward networks. Both are universal approximators. Neuronale Netze, SS 004 5

Comparison with MLPs Architecture: RBF networks (usually) have one single hidden layer. MLP networks may have more hidden layers. Neuron Model: In RBF the neuron model of the hidden neurons is different from the one of the output nodes. Typically in MLP hidden and output neurons share a common neuron model. The hidden layer of RBF is non-linear, the output layer of RBF is linear. Hidden and output layers of MLP are usually non-linear. Neuronale Netze, SS 004 6

Comparison with MLPs () Activation functions: The argument of activation function of each hidden neuron in a RBF NN computes the Euclidean distance between input vector and the center of that unit. The argument of the activation function of each hidden neuron in a MLP computes the inner product of input vector and the synaptic weight vector of that neuron. Approximation: RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping. MLP construct global approximations to non-linear I/O mapping. Neuronale Netze, SS 004 7