Radial Basis Function (RBF) Networks

Similar documents
Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks Lecture 4: Radial Bases Function Networks

Multilayer Perceptrons (MLPs)

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Hopfield Network for Associative Memory

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Radial-Basis Function Networks

CSC 578 Neural Networks and Deep Learning

Lecture 11. Kernel Methods

Radial-Basis Function Networks

Multilayer Neural Networks

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

CS249: ADVANCED DATA MINING

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Multilayer Neural Networks

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

HISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Artificial Neural Networks. Edward Gatt

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

Pattern Classification

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS

Linear Models for Regression. Sargur Srihari

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

CS249: ADVANCED DATA MINING

MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

Introduction to Machine Learning

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

Machine Learning (CSE 446): Neural Networks

Neural Network Training

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Classical RSA algorithm

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

Introduction to Neural Networks

CSC321 Lecture 2: Linear Regression

Unit III. A Survey of Neural Network Model

Learning from Data: Regression

Revision: Neural Network

Multilayer Perceptron = FeedForward Neural Network

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Numerical Solution of Fredholm and Volterra Integral Equations of the First Kind Using Wavelets bases

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

ECE 592 Topics in Data Science

Pattern Recognition and Machine Learning

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Statistical Learning with the Lasso, spring The Lasso

Machine Learning Lecture 5

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification

Linear Models for Regression CS534

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Introduction to Neural Networks: Structure and Training

EE 341 A Few Review Problems Fall 2018

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Christian Mohr

General Strong Polarization

Predicting Winners of Competitive Events with Topological Data Analysis

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Nonlinear Classification

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Intelligent Systems Discriminative Learning, Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

1 What a Neural Network Computes

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

A new procedure for sensitivity testing with two stress factors

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Week 5: Logistic Regression & Neural Networks

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Kaggle.

Mathematics Ext 2. HSC 2014 Solutions. Suite 403, 410 Elizabeth St, Surry Hills NSW 2010 keystoneeducation.com.

Single layer NN. Neuron Model

Transcription:

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1

Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending on output layer nonlinearity And error function being minimized As a function approximator, MLPs are nonlinear, semiparametric, and universal 2

Function approximation Radial basis function (RBF) networks are similar function approximators Also nonlinear, semiparametric, universal Can also be visualized as layered network of nodes Easier to train than MLPs Do not require backpropagation But do not necessarily find an optimal solution 3

RBF net illustration 4

Function approximation background Before getting into RBF networks, let s discuss approximating scalar functions of a single variable Weierstrass approximation theorem: any continuous real function in an interval can be approximated arbitrarily well by a set of polynomials Taylor expansion approximates any differentiable function by a polynomial in the neighborhood around a point Fourier series gives a way of approximating any periodic function by a sum of sines and cosines 5

Linear projection Approximate function f (x) by a linear combination of simpler functions FF xx = ww jj φφ jj (xx) jj If w j s can be chosen so that approximation error is arbitrarily small for any function f (x) over the domain of interest, then {φφ jj } has the property of universal approximation, or {φφ jj } is complete 6

Example incomplete basis: sinc sinc(xx) = sin(ππxx) φφ ππππ jj xx = sinc(xx μμ jj ) Can approximate any smooth function 7

Example orthogonal complete basis: sinusoids φφ 2nn xx = sin 2ππππππππ φφ 2nn+1 xx = cos 2ππππππππ Complete on the interval [0,1] 8

Example orthogonal complete basis: Chebyshev polynomials TT 0 xx = 1 TT 1 xx = xx TT nn+1 xx = 2xxTT nn xx TT nn 1 xx Complete on the interval [0,1] TT 2 xx = 2xx 2 1 TT 3 xx = 4xx 3 3xx etc. "Chebyshev Polynomials of the 1st Kind (n=0-5, x=(-1,1))" by Inductiveload - Own work. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/fil e:chebyshev_polynomials_of_the_1st _Kind_(n%3D0-5,_x%3D(-1,1)).svg 9

Radial basis functions A radial basis function (RBF) is a basis function of the form φφ jj xx = φφ xx μμ jj Where φφ rr is positive w/monotonic derivative for rr > 0 Consider a Gaussian RBF φφ jj xx = exp 1 2σσ 2 xx xx 2 jj = GG xx xx jj A local basis function, falling off from the center 1 G( x-x j ) x j x 10

Radial basis functions (cont.) Thus approximation by Gaussian RBF becomes FF(xx)= ww jj G( xx xx jj ) jj Gaussians are universal approximators I.e., they form a complete basis 11

RBF net illustration 12

Remarks (cont.) Other RBFs exist, but we won t be using them Multiquadrics φφ xx = xx 2 + cc 2 Inverse multiquadrics 1 φφ xx = xx 2 + cc 2 Micchelli s theorem (1986) Let {xx ii } be a set of NN distinct points, φφ( ) be an RBF Then the matrix φφ iiii = φφ xx ii xx jj is non-singular 13

Four questions to answer for RBF nets If we want to use Gaussian RBFs to approximate a function specified by training data 1. How do we choose the Gaussian centers? 2. How do we determine the Gaussian widths? 3. How do we determine the weights ww jj? 4. How do we select the number of bases? 14

1. How do we choose the Gaussian centers? Easy way: select K data points at random Potentially better way: unsupervised clustering, e.g. using the K-means algorithm 15

K-means algorithm Goal: Divide N input patterns into K clusters with minimum total variance In other words, partition patterns into K clusters CC jj to minimize the following cost function where uu jj = 1 KK JJ= xx ii uu jj 2 jj=1 ii CC jj xx CC jj ii CC jj ii is the mean (center) of cluster j 16

K-means algorithm 1. Choose a set of K cluster centers randomly from the input patterns 2. Assign the N input patterns to the K clusters using the squared Euclidean distance rule: xx is assigned to CC jj if xx uu jj 2 xx uu ii 2 for all ii jj 3. Update cluster centers uu jj = 1 CC jj xx ii ii CC jj 4. If any cluster center changes, go to step 2; else stop 17

K-means illustration From Bishop (2006) 18

K-means illustration From Bishop (2006) 19

K-means illustration From Bishop (2006) 20

K-means illustration From Bishop (2006) 21

K-means illustration From Bishop (2006) 22

K-means illustration From Bishop (2006) 23

K-means illustration From Bishop (2006) 24

K-means illustration From Bishop (2006) 25

K-means illustration From Bishop (2006) 26

K-means cost function From Bishop (2006) 27

K-means algorithm remarks The K-means algorithm always converges, but only to a local minimum 28

2. How to determine the Gaussian widths? Once cluster centers are determined, the variance within each cluster can be set to 2 σ jj = 1 CC jj uu jj xx ii 2 ii CC jj Remark: to simplify the RBF net design, all clusters can assume the same Gaussian width: σ= dd max 2KK where d max is the maximum distance between the K cluster centers 29

3. How do we determine the weights ww jj? With the hidden layer decided, weight training can be treated as a linear regression problem ΦΦww = dd Can solve using the LMS algorithm The textbook discusses recursive least squares (RLS) solutions Can also solve in one shot using the pseudo-inverse ww = ΦΦ + dd = ΦΦ T ΦΦ 1 ΦΦ TT dd Note that a bias term needs to be included in ΦΦ 30

4. How do we select the number of bases? The same problem as that of selecting the size of an MLP for classification The short answer: (cross-)validation The long answer: by balancing bias and variance 31

Bias and variance Bias: training error Difference between desired output and actual output for a particular training sample Variance: generalization error difference between the learned function from a particular training sample and the function derived from all training samples Two extreme cases: zero bias and zero variance A good-sized model is one where both bias and variance are low 32

RBF net training summary To train 1. Choose the Gaussian centers using K-means, etc. 2. Determine the Gaussian widths as the variance of each cluster, or using ddmax 3. Determine the weights ww jj using linear regression Select the number of bases using (cross-)validation 33

RBF net illustration 34

Comparison between RBF net and MLP For RBF nets, bases are local, while for MLP, bases are global Generally, more bases are needed for an RBF net than hidden units for an MLP Training is more efficient for RBF nets 35

XOR problem, again RBF nets can also be applied to pattern classification problems XOR problem revisited Let φφ 1 xx = exp xx tt 1 2 φφ 2 xx = exp( xx tt 2 2 ) Where tt 1 = 1,1 TT tt 2 = 0,0 TT 36

XOR problem (cont.) 37

RBF nets can also be applied to pattern classification problems XOR problem revisited XOR problem, again 38

RBF net on double moon data, d = -5 39

RBF net on double moon data, d = -5 40

RBF net on double moon data, d = -6 41

RBF net on double moon data, d = -6 42