Radial-Basis Function Networks

Similar documents
Radial-Basis Function Networks. Radial-Basis Function Networks

Generalized Linear Methods

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Lecture Notes on Linear Regression

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Pattern Classification

Linear Approximation with Regularization and Moving Least Squares

Multilayer Perceptron (MLP)

Report on Image warping

Lecture 12: Classification

Neural Networks & Learning

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Feature Selection: Part 1

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Errors for Linear Systems

Week 5: Neural Networks

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

EEE 241: Linear Systems

Multilayer neural networks

Supporting Information

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

Support Vector Machines

Multi-layer neural networks

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Evaluation of classifiers MLPs

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Uncertainty as the Overlap of Alternate Conditional Distributions

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Natural Language Processing and Information Retrieval

Maximum Likelihood Estimation (MLE)

VQ widely used in coding speech, image, and video

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Topic 5: Non-Linear Regression

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Classification as a Regression Problem

Lecture 3: Dual problems and Kernels

Chapter 15 Student Lecture Notes 15-1

Kernel Methods and SVMs Extension

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

Chapter 11: Simple Linear Regression and Correlation

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Fundamentals of Neural Networks

Linear Regression Analysis: Terminology and Notation

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

10-701/ Machine Learning, Fall 2005 Homework 3

Clustering & Unsupervised Learning

CSE 252C: Computer Vision III

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

18-660: Numerical Methods for Engineering Design and Optimization

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Multigradient for Neural Networks for Equalizers 1

Lecture 10 Support Vector Machines II

Lecture 12: Discrete Laplacian

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

Support Vector Machines

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

10) Activity analysis

ORIGIN 1. PTC_CE_BSD_3.2_us_mp.mcdx. Mathcad Enabled Content 2011 Knovel Corp.

The Geometry of Logit and Probit

Limited Dependent Variables

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Feb 14: Spatial analysis of data fields

Which Separator? Spring 1

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Homework Assignment 3 Due in class, Thursday October 15

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Singular Value Decomposition: Theory and Applications

Support Vector Machines

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION

APPENDIX A Some Linear Algebra

Solving Nonlinear Differential Equations by a Neural Network Method

Lecture 3: Shannon s Theorem

Non-linear Canonical Correlation Analysis Using a RBF Network

Support Vector Machines

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Why feed-forward networks are in a bad shape

K means B d ase Consensus Cluste i r ng Dr. Dr Junjie Wu Beihang University

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Composite Hypotheses testing

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

An (almost) unbiased estimator for the S-Gini index

The big picture. Outline

Fuzzy Boundaries of Sample Selection Model

Curve Fitting with the Least Square Method

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Transcription:

Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal approxmaton p Comparson wth MLP p RBN = ernel regresson p Learnng p Centers p Wdths p Multplyng factors p Other forms Mchel Verleysen Radal-Bass uncton Networs -

Orgn: Covers theorem p Covers theorem on separablty of patterns (965 p x, x,, x P assgned to two classes C C p -separablty: w p Cover s theorem: T w T w ( x ( x > 0 < 0 x C x C p non-lnear functons (x p dmenson hdden space > dmenson nput space probablty of separablty closer to p Example lnear quadratc Mchel Verleysen Radal-Bass uncton Networs - 3 Interpolaton problem p Gven ponts (x, t, x R d, t R, P : p nd : R d Rthat satsfes p RB technque (Powell, 988: p ( x x ( x = t, = K P ( = P x ( x x w = are arbtrary non-lnear functons (RB p as many functons as data ponts p centers fxed at nown ponts x Mchel Verleysen Radal-Bass uncton Networs - 4

Interpolaton problem ( x ( = P x ( x x = t w = M P M P L L O L P w t P w = t M M M P PP wp t where l = l ( x x p Into matrx form: Φw = x w = Φ x p Vtal queston: s Φ non-sngular? Mchel Verleysen Radal-Bass uncton Networs - 5 Mchell s theorem p If ponts x are dstnct, Φ s non-sngular (regardless of the dmenson of the nput space p Vald for a large class of RB functons: ( x = x c + ( x = x c + ( > 0 ( x c x = exp (σ > 0 σ non-localzed functon localzed functons Mchel Verleysen Radal-Bass uncton Networs - 6 3

Learnng: ll-posed problem t x p Necessty for regularzaton p Error crteron: E P ( ( t ( x + λ C( w = P = MSE regularzaton Mchel Verleysen Radal-Bass uncton Networs - 7 Soluton to the regularzaton problem p Poggo & Gros (990: p f C(w s a (problem-dependent lnear dfferental operator, the soluton to s of the followng form: where E P ( ( t ( x + λ C( w = P = ( = P x ( x, x G( s a Green s functon, G l = G(x,x l w G = w = ( G + λi t Mchel Verleysen Radal-Bass uncton Networs - 8 4

Interpolaton - Regularzaton p Interpolaton ( = P x ( x x w = Φ w = x p Exact nterpolator p Possble RB: x x ( x, x = exp σ p Regularzaton ( = P x ( x, x w G = w = ( G + λi t p Exact nterpolator p Equal to the «nterpolaton» soluton ff λ=0 p Example of Green s functon: G ( x, x x x = exp σ One RB / Green s functon for each learnng pattern! Mchel Verleysen Radal-Bass uncton Networs - 9 Generalzed RBN (GRBN RBN p As many radal functons as learnng patterns: p computatonally (too ntensve (nverson of PxP matrx grows wth P 3 p ll-condtoned matrx p regularzaton not easy (problem-specfc Generalzed RBN approach! Typcally: p K << P p ( x w ( x c = K = ( x c = exp x c σ Parameters: c, σ, w Mchel Verleysen Radal-Bass uncton Networs - 0 5

Radal-Bass uncton Networs (RBN ( x w ( x c = K = x 0 (bas x ( x c σ w (x ( x c = exp x c σ c j σk f several outputs x d st layer nd layer p Possbltes: p several outputs (common hdden layer p bas (recommended (see extensons Mchel Verleysen Radal-Bass uncton Networs - RBN: unversal approxmaton p Par & Sandberg 99: p or any contnuous nput-output mappng functon f(x K ( x = w( x c Lp ( f ( x, ( x < ε ( ε > 0,p [, ] = p The theorem s stronger (radal summetry not needed p K not specfed p Provdes a theoretcal bass for practcal RBN! Mchel Verleysen Radal-Bass uncton Networs - 6

RBN and ernel regresson p non-lnear regresson model t ( x + ε = y + ε, P = f p estmaton of f(x: average of t around x. More precsely: f ( x = E[ y x] = = yf yf Y ( y x X,Y f x dy ( x,y ( x dy p Need for estmates of fx, Y ( x,y and f X ( x Parzen-Rosenblatt densty estmator Mchel Verleysen Radal-Bass uncton Networs - 3 Parzen-Rosenblatt densty estmator fˆ x P x x K d Ph = h ( x = wth K( contnuous, bounded, symmetrc about the orgn, wth maxmum value at 0, and wth unt ntegral, s consstent (asymptotcally unbased. (,y p Estmaton of fx, Y x P ( x x fˆ X,Y x,y = K d Ph = h y y K + h Mchel Verleysen Radal-Bass uncton Networs - 4 7

RBN and ernel regresson fˆ ( x = = P = P yfˆ = X,Y fˆ X ( x,y ( x dy x x y K h x x K h f ( x = yfx,y fx ( x,y ( x dy p Weghted average of y p called Nadaraya-Watson estmator (964 p equvalent to Normalzed RBN n the unregularzed context Mchel Verleysen Radal-Bass uncton Networs - 5 RBN MLP p RBN p sngle hdden layer p non-lnear hdden layer lnear output layer p argument of hdden unts: Eucldean norm p unversal approxmaton property p local approxmators p spltted learnng p MLP p sngle or multple hdden layers p non-lnear hdden layer lnear or non-lnear output layer p argument of hdden unts: scalar product p unversal approxmaton property p global approxmators p global learnng Mchel Verleysen Radal-Bass uncton Networs - 6 8

RBN: learnng strateges ( x w ( x c ( x c = K = = exp x c σ p Parameters to be determned: c, σ, w p Tradtonal learnng strategy: spltted computaton. centers c. wdths σ 3. weghts w Mchel Verleysen Radal-Bass uncton Networs - 7 RBN: computaton of centers p Idea: centers c must have the (densty propertes of learnng ponts x vector quantzaton p selected at random (n learnng set p compettve learnng p frequency-senstve learnng p Kohonen maps p Ths phase only uses the x nformaton, not the t Mchel Verleysen Radal-Bass uncton Networs - 8 9

RBN: computaton of wdths p Unversal approxmaton property: vald wth dentcal wdths p In practce (lmted learnng set: varable wdths σ p Idea: RBN use local clusters p choose σ accordng to standard devaton of clusters Mchel Verleysen Radal-Bass uncton Networs - 9 RBN: computaton of weghts ( x w ( x c = K = ( x c = exp x c σ p Problem becomes lnear! p Soluton of least square crteron leads to where w = Φ + t = Φ = T T ( Φ Φ Φ ( x c p In practse: use SVD! constants! E P ( = ( t ( x P = Mchel Verleysen Radal-Bass uncton Networs - 0 0

p 3-steps method: RBN: gradent descent ( x = K w exp = 3 supervsed x c σ unsupervsed p Once c, σ, w have been set by the prevous method, possblty of gradent descent on all parameters p Some mprovement, but p learnng speed p local mnma p rs of non-local bass functons p etc. Mchel Verleysen Radal-Bass uncton Networs - More elaborated models p Add constant and lnear terms K d ( w exp x c x = + w' x w' + 0 = σ = good dea (very dffcult to approxmate a constant wth ernels p Use normalzed RBN x c exp K ( σ x = w = K x c j exp = σ j j bass functons are bouded [0,] can be nterpreted as probablty values (classfcaton Mchel Verleysen Radal-Bass uncton Networs -

Bac to the wdths p choose σ accordng to standard devaton of clusters p In the lterature: p σ = d K where d max = maxmum dstance between centrods [] max p p σ = c c where ndex j scans the p nearest centrods to c [] p j = ( j p σ = r mn c c j where r s an overlap constant [3] j p.. [] S. Hayn, "Neural Networs a Comprehensve oundaton", Prentce-Hall Inc, second edton, 999. [] J. Moody and C. J. Daren, "ast learnng n networs of locally-tuned processng unts", Neural Computaton, pp. 8-94, 989. [3] A. Saha and J. D. Keeler, ''Algorthms for Better Representaton and aster Learnng n Radal Bass uncton Networs", Advances n Neural Informaton Processng Systems, Edted by Davd S. Touretzy, pp. 48-489, 989. Mchel Verleysen Radal-Bass uncton Networs - 3 Basc example p Approxmaton of f(x = wth a d-dmensonal RBN p In theory: dentcal w p Expermentally: sde effects only mddle taen nto account p Error versus wdth Mchel Verleysen Radal-Bass uncton Networs - 4

Basc example: erros vs space dmenson Mchel Verleysen Radal-Bass uncton Networs - 5 Basc example: local decomposton? Mchel Verleysen Radal-Bass uncton Networs - 6 3

Multple local mnma n error curve p Choose the frst mnmum to preserve the localty of clusters p The frst local mnmum s usually less senstve to varablty Mchel Verleysen Radal-Bass uncton Networs - 7 Some concludng comments p RBN: easy learnng (compared to MLP p n a cross-valdaton scheme: mportant! p Many RBN models p Even more RBN learnng schemes p Results not very senstve to unsupervsed part of learnng (c, σ p Open wor for a pror (proble-dependent choce of wdths σ Mchel Verleysen Radal-Bass uncton Networs - 8 4

Sources and references p Most of the basc concepts developed n these sldes come from the excellent boo: p Neural networs a comprehensve foundaton, S. Hayn, Macmllan College Publshng Company, 994. p Some supplementary comments come from the tutoral on RB: p An overvew of Radal Bass uncton Networs, J. Ghosh & A. Nag, n: Radal Bass uncton Networs, R.J. Howlett & L.C. Jan eds., Physca-Verlag, 00. p The results on the basc exemple were generated by my colleague N. Benoudjt, and are submtted for publcaton. Mchel Verleysen Radal-Bass uncton Networs - 9 5