THE frequency sensitive competitive learning (FSCL) is

Similar documents
Image compression using a stochastic competitive learning algorithm (scola)

Radial-Basis Function Networks. Radial-Basis Function Networks

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

EE368B Image and Video Compression

Classification and Clustering of Printed Mathematical Symbols with Improved Backpropagation and Self-Organizing Map

Self-Organization by Optimizing Free-Energy

Learning Decentralized Goal-based Vector Quantization

Convergence of a Neural Network Classifier

Fractal Dimension and Vector Quantization

Artificial Neural Networks Examination, March 2004

HOPFIELD neural networks (HNNs) are a class of nonlinear

Introduction to Neural Networks

MEASUREMENTS that are telemetered to the control

Learning Vector Quantization

Fractal Dimension and Vector Quantization

Pattern Recognition and Machine Learning

A Hybrid Method of CART and Artificial Neural Network for Short-term term Load Forecasting in Power Systems

y(n) Time Series Data

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

On Optimal Coding of Hidden Markov Sources

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

3.4 Linear Least-Squares Filter

Self-organized criticality and the self-organizing map

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks

Brief Introduction of Machine Learning Techniques for Content Analysis

Abstract This report has the purpose of describing several algorithms from the literature all related to competitive learning. A uniform terminology i

CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE RESOURCE ALLOCATION

Self-organizing Neural Networks

Unsupervised Classifiers, Mutual Information and 'Phantom Targets'

BASED on the minimum mean squared error, Widrow

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Artificial Neural Networks Examination, June 2004

THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS. Stanley L. Sclove

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Learning Long Term Dependencies with Gradient Descent is Difficult

Intelligent Modular Neural Network for Dynamic System Parameter Estimation

L11: Pattern recognition principles

Learning Vector Quantization (LVQ)

Simulated Annealing. 2.1 Introduction

Ch. 10 Vector Quantization. Advantages & Design

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Design and Analysis of Maximum Hopfield Networks

Classic K -means clustering. Classic K -means example (K = 2) Finding the optimal w k. Finding the optimal s n J =

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Hopfield Network Recurrent Netorks

ON THE RELATION OF PERFORMANCE TO EDITING IN NEAREST NEIGHBOR RULES

SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong

SIMU L TED ATED ANNEA L NG ING

String averages and self-organizing maps for strings

An Adaptive Clustering Method for Model-free Reinforcement Learning

Clustering non-stationary data streams and its applications

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Artificial Neural Networks Examination, June 2005

Vector Quantization and Subband Coding

FOR beamforming and emitter localization applications in

Entropy Manipulation of Arbitrary Non I inear Map pings

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

On the fast convergence of random perturbations of the gradient flow.

Neural Networks Lecture 7: Self Organizing Maps

Filter Design for Linear Time Delay Systems

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

A Structural Matching Algorithm Using Generalized Deterministic Annealing

Soft-Output Trellis Waveform Coding

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Soft-Decision Demodulation Design for COVQ over White, Colored, and ISI Gaussian Channels

One-Hour-Ahead Load Forecasting Using Neural Network

Bregman Divergences for Data Mining Meta-Algorithms

Classification on Pairwise Proximity Data

IN neural-network training, the most well-known online

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Benjamin L. Pence 1, Hosam K. Fathy 2, and Jeffrey L. Stein 3

An Evolutionary Programming Based Algorithm for HMM training

SPARSE signal representations have gained popularity in recent

A Dynamical Implementation of Self-organizing Maps. Wolfgang Banzhaf. Department of Computer Science, University of Dortmund, Germany.

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

POWER quality monitors are increasingly being used to assist

Monte Carlo methods for sampling-based Stochastic Optimization

Mixture Models and EM

Topology Preserving Neural Networks that Achieve a Prescribed Feature Map Probability Density Distribution

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

PATTERN CLASSIFICATION

798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 10, OCTOBER 1997

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Artificial Neural Networks. Edward Gatt

WHEN studying distributed simulations of power systems,

Neural-Network Methods for Boundary Value Problems with Irregular Boundaries

Stochastic Proximal Gradient Algorithm

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators

Stable IIR Notch Filter Design with Optimal Pole Placement

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Transcription:

1026 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997 Diffusion Approximation of Frequency Sensitive Competitive Learning Aristides S. Galanopoulos, Member, IEEE, Rolph L. Moses, Senior Member, IEEE, Stanley C. Ahalt, Member, IEEE Abstract The focus of this paper is a convergence study of the frequency sensitive competitive learning (FSCL) algorithm. We approximate the final phase of FSCL learning by a diffusion process described by a Fokker Plank equation. Sufficient necessary conditions are presented the convergence of the diffusion process to a local equilibrium. The analysis parallels that by Ritter Schulten Kohonen s self-organizing map (SOM). We show that the convergence conditions involve only the learning rate that they are the same as the conditions weak convergence described previously. Our analysis thus broadens the class of algorithms that have been shown to have these types of convergence characteristics. Index Terms Diffusion processes, Fokker Plank equations, neural networks, learning systems, vector quantization. I. INTRODUCTION THE frequency sensitive competitive learning (FSCL) is a conscience type competitive learning algorithm developed by Ahalt et al. [1] to overcome problems associated with the simple competitive learning (CL) Kohonen s self-organizing feature maps (SOM s) in vector quantization applications. The FSCL algorithm is a modification of simple CL in which units are penalized in proportion to some function of the frequency of winning, so that eventually all units participate in the quantization of the data space as representative vectors of a data cell of nonzero probability. This frequency-sensitive conscience mechanism overcomes the codeword underutilization problem of simple CL [4], [13]. The SOM algorithm [6], [7], also successfully solves the above problem, however, it appears less suitable some vector quantization applications the following reasons. First, since SOM was developed to establish feature maps, an essential feature of the SOM algorithm is the definition of a topology on the set of codewords, which is a difficult problem high-dimensional data spaces. Second, the required update of several codewords at each iteration makes the algorithm more computationally intensive than CL FSCL. Manuscript received June 1, 1994; revised September 16, 1996 April 2, 1997. A. S. Galanopoulos was with the Department of Electrical Engineering, The Ohio State University, Columbus, OH 43210 USA. He is now with Northern Telecom, Richardson, TX 75082 USA. R. L. Moses S. C. Ahalt are with the Department of Electrical Engineering, The Ohio State University, Columbus, OH 43210 USA. Publisher Item Identifier S 1045-9227(97)05237-5. One of the main issues in vector quantizer designs is the ability of the algorithms to generate a codebook that is as nearoptimal as possible in a reasonable amount of time. Simple CL is a stochastic gradient descent algorithm; like the batch LBG algorithm [9] it may become stuck in a local minimum of the total distortion function, although it has some ability to escape local minima because of its stochastic nature. The strongest type of convergence CL has been shown the case of sufficiently sparse patterns in [4], otherwise, sufficient necessary conditions convergence with probability one to a local minimum are the corresponding conditions of the Robbins Monro process [12]. More relaxed conditions are required weak convergence to a local minimum [8]. The convergence of the SOM algorithm has been analyzed by Ritter Schulten [11] under the assumption that the state of the algorithm is close to some local equilibrium in the limit of small learning rate. Ritter Schulten analyzed a Fokker Plank equation (FPE) describing SOM s found necessary sufficient conditions that make the mean variance of the state deviation from equilibrium vanish. Their conditions correspond to the weak convergence conditions presented in [8]. Cottrell Fort [2] analyzed a similar process, although restricted to a unim input data probability density function data spaces of dimension one two, mulated as a Robbins Monro recursion thus arriving at the conditions in [12]. In this paper, we study the final phase of the FSCL algorithm learning. We assume that both the time index the learning rate are small we follow the same analysis as described in [11] deriving the FPE that approximates the evolution of the process. From the analysis point of view, this work should be considered as an extension of Ritter Schulten s work [11] to a different class of learning algorithms. The results show that in the limit of large time, i.e., the final learning phase, only conditions on the learning rate must be imposed to guarantee the convergence of the diffusion process to a local equilibrium. This result supports the use of the FSCL algorithm in applications, e.g., video speech encoding, in which accurate representation, computation, underutilization all must be managed simultaneously in an on-line coding process. A global convergence analysis, not restricted to the neighborhood of some local equilibrium, would certainly involve all the algorithm parameters not just the learning rate. We are unaware of any global convergence analysis self organizing maps conscience type algorithms. The only 1045 9227/97$10.00 1997 IEEE

GALANOPOULOS et al.: DIFFUSION APPROXIMATION 1027 algorithm that has been proved to converge to the global minimum of the distortion function is simulated annealing (SA) [5], [10], heuristic combinations of SA with previously mentioned algorithms have been used to cope with the very large computational requirements of SA [14]. we denote by the set of possible previous inputs given previous winner present state. The transmation denotes the state at time given the state input at time. Thus II. FSCL ALGORITHM In FSCL, a network of units or codewords is trained by a set of input data vectors. The data vectors belong to a -dimensional space their distribution is described by a probability density function. The units have associated positions in the -dimensional space associated update frequencies, the update frequencies are defined as with the (count) being the number of times that unit has been updated up to time. Clearly,. The time index takes on integer values. At time, an input data vector is presented to the network a winning unit is selected as the one that minimizes the product of a fairness function,, which is an increasing function of the update frequency, times the distance (distortion measure) from the input data vector, as winner (1) Every term in the above sum is conditional upon the winner at time. Given the winner at time, the inverse transmation uniquely defines the state at time from the state the input ; every unit is transmed as To express the right-h side in terms of, instead of, we integrate with respect to The winning unit s position is updated as is the learning rate, the winning unit count is incremented by one. All other units keep the same counts positions. (2) III. DIFFUSION APPROXIMATION The FSCL network can be described as a Markov process with state defined as The matrix is of dimensions. Assuming is the winner, the update rule the state of unit is The analysis of FSCL convergence parallels the analysis of the SOM algorithm by Ritter Schulten in [11]. We consider an ensemble of networks whose states at time are distributed according to a density function defined over the set of all possible network states. This density function obeys the Chapman Kolmogoroff equation The Jacobian is independent of the winner is the transition probability from state to expressed as In the above expression only the depends on the winner selection rule the fairness function in particular. Our goal is to derive a linear FPE that approximates the

1028 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997 evolution of the process. We exp keeping only derivatives up to second order (FPE neglects higher order derivatives) of these only the leading order in In -dimensions we would roughly approximate The important conclusion is that the difference between the s of is of the order of the learning rate of, it can be neglected in our expansion we keep only the leading-order terms. Consequently, the following analysis is valid any fairness function that preserves this close relation between. The difference between the first moments of would be even smaller (if the two s have similar shapes) while the difference between the second moments is slightly larger (they would roughly relate through a factor of instead of ). Assume that the system is close to the equilibrium state. We express (3) in terms of the deviation from Let. Keeping only the leading terms in, we obtain the FPE large. Let be the set of inputs that make the winning unit when the network is at state. The sets cover the whole space without overlapping. In contrast, the sets may, in general, overlap, i.e., given present state, there are inputs such that two (or more) possible previous states exist which. Furthermore, the sets do not necessarily cover the whole space. As a result of the fairness function weighting, the regions are not separated by hyperplanes but by higher order surfaces, even if the Euclidean distance is used as the distortion measure. Assuming a fairness function of the m, is a positive parameter, we could write the following approximation the case of one-dimensional data, i.e., (3) We make the notation more compact by defining, otherwise.

GALANOPOULOS et al.: DIFFUSION APPROXIMATION 1029 Thus, we obtain the following linear FPE with timedependent coefficients Thus, the necessary sufficient conditions the mean covariance to converge to zero are (4) Assuming a given value of this process at time at time is given as, the mean Necessary sufficient condition this mean value to vanish as is The covariances These conditions are of the same type as those encountered in [8] to guarantee weak convergence of stochastic approximation processes. The similarity is not surprising since weak convergence is similar to convergence in distribution in our analysis we dem that the mean variance of deviations from the equilibrium vanish. We also note that the final phase of FSCL learning imposes conditions only on the learning rate not on the fairness function. As discussed earlier in the analysis, the same results are obtained any reasonable choice of the fairness function. The exact m of the fairness function affects the global behavior of the algorithm, its ability to approach the global minimum of the total distortion measure it also determines the set of possible equilibrium states of the algorithm. The effect of the fairness function on the equilibrium codeword distribution has been studied in [3]. obey the differential equation with solution IV. CONCLUSION Our conclusion is that, by selecting learning rates that offer sufficient excitation, one expects, regardless of the specific fairness function, to converge to a solution that is locally optimal. The fairness function can then be independently selected, e.g., to yield a desired codeword distribution [3]. Thus, this result supports the use of FSCL clustering applications such as VQ codebook design, unsupervised clustering mixture density analysis, design of radial basis funcsions. REFERENCES The covariance matrix vanishes, as, if equivalently its diagonal terms (variances) vanish. The variances obey the differential equation with solution This tends to zero as if equivalently [1] S. C. Ahalt, A. K. Krisnamurthy, P. Chen, D. E. Melton, Competitive learning algorithms vector quantization, Neural Networks, vol. 3, pp. 277 290, 1990. [2] M. Cottrell J. C. Fort, A stochastic model of retinotopy: A self organizing process, Biol. Cybern., vol. 53, pp. 405 411, 1986. [3] A. S. Galanopoulos S. C. Ahalt, Codeword distribution frequency sensitive competitive learning with one-dimensional input data, IEEE Trans. Neural Networks, vol. 7, pp. 752 756, May 1996. [4] S. Grossberg, Competitive learning: From interactive activation to adaptive resonance, Cognitive Sci., vol. 11, pp. 23 63, 1987. [5] B. Hajek, A tutorial survey of theory applications of simulated annealing, in Proc. 24th Conf. Decision Contr., Ft. Lauderdale, FL, Dec. 1985. [6] T. Kohonen, Self-organized mation of topologically correct feature maps, Biol. Cybern., vol. 43, pp. 59 69, 1982. [7], Self-Organizing Maps. Berlin: Springer-Verlag, 1995. [8] H. J. Kushner D. S. Clark, Stochastic Approximation Methods Constrained Unconstrained Systems. Berlin: Springer-Verlag, 1978. [9] Y. Linde, A. Buzo, R. M. Gray, An algorithm vector quantizer design, IEEE Trans. Commun., vol. 28, pp. 84 95, 1980. [10] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, J. Chem. Physics, vol. 21, no. 6, pp. 1087 1092, June 1953. [11] H. Ritter K. Schulten, Convergence properties of Kohonen s topology conserving maps: Fluctuations, stability, dimension selection, Biol. Cybern., vol. 60, pp. 59 71, 1988. [12] H. Robbins S. Monro, A stochastic approximation method, Ann. Math. Statist., vol. 22, pp. 400 407, 1951. [13] D. E. Rumelhart D. Zipser, Feature discovery by competitive learning, Cognitive Sci., vol. 9, pp. 75 112, 1985.

1030 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997 [14] E. Yair, K. Zeger, A. Gersho, Competitive learning soft competition vector quantizer design, IEEE Trans. Signal Processing, vol. 40, Feb. 1992. Aristides S. Galanopoulos (S 86 M 87) received the B.S. degree in electrical engineering from the National Technical University of Athens, Greece, in 1987, the M.S. Ph.D. degrees in electrical engineering from the Ohio State University, Columbus, in 1989 1996, respectively. Since 1995, he is with Northern Telecom, Broadb Systems Engineering, Richardson, TX. His research interests include communication networks neural networks. Stanley C. Ahalt (S 77 M 79) received the B.S. M.S. degrees in electrical engineering from Virginia Polytechnic Institute State University, Blacksburg, in 1978 1980, respectively. He received the Ph.D. degree in electrical computer engineering from Clemson University, SC, in 1986. He was a member of the Technical Staff at Bell Telephone Laboratories from 1980 to 1981. He has been with the Ohio State University Department of Electrical Engineering, Columbus, since 1987, he is currently an Associate Professor. During 1996, he was on sabbatical leave as a Visiting Professor at the Universidad Polytecnica de Madrid, Spain. His research is in neural networks, with application to image analysis, automatic target recognition, image processing, special purpose hardware to support real-time processing. Rolph L. Moses (S 78 M 83 SM 90) received the B.S., M.S., Ph.D. degrees in electrical engineering from Virginia Polytechnic Institute State University, Blacksburg, in 1979, 1980, 1984, respectively. During the summer of 1983, he was a SCEEE Summer Faculty Research Fellow at Rome Air Development Center, Rome, NY. From 1984 to 1985, he was with the Eindhoven University of Technology, Eindhoven, The Netherls, as a NATO Postdoctoral Fellow. Since 1985, he has been with the Department of Electrical Engineering, The Ohio State University, Columbus, is currently a Professor there. During 1994 to 1995, he was on sabbatical leave as a visiting researcher at the System Control Group at Uppsala University in Sweden. His research interests are in digital signal processing, include parametric time series analysis, radar signal processing, sensor array processing. Dr. Moses served on the Technical Committee on Statistical Signal Array Processing of the IEEE Signal Processing Society from 1991 to 1994. He is a member of Eta Kappa Nu, Tau Beta Pi, Phi Kappa Phi, Sigma Xi.