initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Similar documents
Quantization. Quantization is the realization of the lossy part of source coding Typically allows for a trade-off between signal fidelity and bit rate

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

Least Squares Optimal Filtering with Multirate Observations

The blessing of dimensionality for kernel methods

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Lyapunov Stability Stability of Equilibrium Points

Chapter 3: Cluster Analysis

V. Balakrishnan and S. Boyd. (To Appear in Systems and Control Letters, 1992) Abstract

, which yields. where z1. and z2

A Scalable Recurrent Neural Network Framework for Model-free

Distributions, spatial statistics and a Bayesian perspective

Eric Klein and Ning Sa

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

The standards are taught in the following sequence.

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Pure adaptive search for finite global optimization*

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Physical Layer: Outline

Computational modeling techniques

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

EXPERIMENTAL STUDY ON DISCHARGE COEFFICIENT OF OUTFLOW OPENING FOR PREDICTING CROSS-VENTILATION FLOW RATE

Section I5: Feedback in Operational Amplifiers

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Module 4: General Formulation of Electric Circuit Theory

B. Definition of an exponential

5 th grade Common Core Standards

Support-Vector Machines

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

Transform Coding. coefficient vectors u = As. vectors u into decoded source vectors s = Bu. 2D Transform: Rotation by ϕ = 45 A = Transform Coding

What is Statistical Learning?

EDA Engineering Design & Analysis Ltd

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Performance Bounds for Detect and Avoid Signal Sensing

AP Statistics Notes Unit Two: The Normal Distributions

Zeros of Sections of the Zeta Function. I

Perturbation approach applied to the asymptotic study of random operators.

Combining Dialectical Optimization and Gradient Descent Methods for Improving the Accuracy of Straight Line Segment Classifiers

Differentiation Applications 1: Related Rates

Collocation Map for Overcoming Data Sparseness

A new Type of Fuzzy Functions in Fuzzy Topological Spaces

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Modeling the Nonlinear Rheological Behavior of Materials with a Hyper-Exponential Type Function

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

Pressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects

Determining the Accuracy of Modal Parameter Estimation Methods

CAPACITY OF MULTI-ANTENNA ARRAY SYSTEMS IN INDOOR WIRELESS ENVIRONMENT

Computational modeling techniques

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

* o * * o o 1.5. o o. o * 0.5 * * o * * o. o o. oo o. * * * o * 0.5. o o * * * o o. * * oo. o o o o. * o * * o* o

Numerical Simulation of the Thermal Resposne Test Within the Comsol Multiphysics Environment

Source Coding and Compression

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

REPRESENTATIONS OF sp(2n; C ) SVATOPLUK KR YSL. Abstract. In this paper we have shown how a tensor product of an innite dimensional

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Part 3 Introduction to statistical classification techniques

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

NUMBERS, MATHEMATICS AND EQUATIONS

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Pattern Recognition 2014 Support Vector Machines

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

ENSC Discrete Time Systems. Project Outline. Semester

Copyright Paul Tobin 63

BLAST / HIDDEN MARKOV MODELS

Introduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem

Kinetic Model Completeness

How do scientists measure trees? What is DBH?

7.0 Heat Transfer in an External Laminar Boundary Layer

Engineering Approach to Modelling Metal THz Structures

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

David HORN and Irit OPHER. School of Physics and Astronomy. Raymond and Beverly Sackler Faculty of Exact Sciences

ENG2410 Digital Design Sequential Circuits: Part A

ELECTRON CYCLOTRON HEATING OF AN ANISOTROPIC PLASMA. December 4, PLP No. 322

Curriculum Development Overview Unit Planning for 8 th Grade Mathematics MA10-GR.8-S.1-GLE.1 MA10-GR.8-S.4-GLE.2

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan

Current/voltage-mode third order quadrature oscillator employing two multiple outputs CCIIs and grounded capacitors

Sequential Allocation with Minimal Switching

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

An Introduction to Complex Numbers - A Complex Solution to a Simple Problem ( If i didn t exist, it would be necessary invent me.

Source Coding Fundamentals

ANALYTICAL SOLUTIONS TO THE PROBLEM OF EDDY CURRENT PROBES

Math Foundations 20 Work Plan

RECHERCHES Womcodes constructed with projective geometries «Womcodes» construits à partir de géométries projectives Frans MERKX (') École Nationale Su

Video Encoder Control

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~

SOLUTION OF THREE-CONSTRAINT ENTROPY-BASED VELOCITY DISTRIBUTION

Predictive Coding. U n " S n

A Frequency-Based Find Algorithm in Mobile Wireless Computing Systems

Transcription:

Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract We study the cdewrd distributin fr a cnsciense type cmpetitive learning algrithm, Frequency Sensitive Cmpetitive Learning (FSCL), using ne dimensinal input data. We prve that the asympttic cdewrd density in the limit f large number f cdewrds is given by a pwer law f the frm Q(x) = C P (x), where P (x) is the input data density and depends n the algrithm and the frm f the distrtin measure t be minimized. The algrithm can be adjusted t minimize any L p distrtin measure with p ranging in (0; 2]. I. Intrductin Cmpetitive learning type algrithms are used fr vectr quantizatin and pattern recgnitin amng ther applicatins. Cmpared t batch algrithms, like the well knwn LBG algrithm [], they er the advantage f perating n-line and being adaptive. They are als atractive when implemented in parallel neural netwrk architectures, thus increasing the speed f encding/decding peratins. Simple Cmpetitive Learning (CL), even thugh cmputatinally very ecient, is knwn t suer frm the prblem f cdewrd underutilizatin, that is cdewrds

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Feature Map (KSFM) [6] [7] mre succesfully appraches the ptimal cdebk design and vercmes the underutilizatin prblem. Hwever it is cmputatinally mre intensive and its prperties depend n the selectin f an apprpriate neighbrhd functin, which is nt an easy prblem especially fr multi-dimensinal data spaces. A mre recent algrithm, the Frequency Sensitive Cmpetitive Learning (FSCL) [8], vercmes the cdewrd underutilizatin prblem and it is cnsiderably simpler than KSFM. In simulatin studies it has been shwn t perfrm at least as gd as the previus CL-type algrithms with respect t the ptimality f the cdebk design and the speed f cnvergence t the equilibrium state. The characterizatin f the equilibrium state f the CL type algrithms, and mre specically the asympttic cdewrd distributin fr large number f cdewrds, is a generally pen prblem fr data spaces with tw r mre dimensins. The prblem has nly been cmpletely slved fr simple CL, by Zadr [9]. Fr KSFM analytical results have been btained nly in the case f ne dimensinal data space [0]. In this brief paper we analyze the asympttic cdewrd distributin fr the FSCL algrithm in the case f ne dimensinal data. We shw that the algrithm can be adjusted t minimize a cntinuus range f distrtin measures. The results have direct appplicatin t the prblem f prbability density estimatin. 2

II. FSCL algrithm and equilibrium cnditins The FSCL algrithm belngs t the class f cnscience type cmpetitive learning algrithms. In FSCL, N units r cdewrds are assigned crrespnding psitins fw i g N i= and update frequencies ff i g N i=, where the update frequencies are dened as f i = c i t with the (cunt) c i being the number f times that unit i has been updated up t time t. Clearly, P i f i (t) = t. At time t, an input data vectr x(t) is presented t the netwrk and the winner unit is selected as the ne that minimizes the prduct f a fairness functin F, which is an increasing functin f the update frequency, times the distance (distrtin measure) frm the input data vectr, i.e., F (f i (t))kw i (t)? x(t)k. Only the winner unit j is updated using w j (t + ) = w j (t) + (t)(x(t)? w j (t)): The underutilizatin prblem is slved because frequent winners are \penalized" s that eventually all units win the cmpetitin sme prtin f the time. At equilibrium, each unit i wins the cmpetitin fr all input data inside a neighbrhd (cell) C i f w i and w i is the centrid w i = Z up (u)du C Z i : P (u)du C i where P (u) is the input data prbability density functin and the equilibrium update frequencies are given by Z f i = P (u)du: C i Fr the calculatin f the equilibrium psitins, given any number f cdewrds, 3

we can use a mdicatin f the Llyd-Max iterative algrithm [] fr scalar quantizer design. If the input data prbability density is nnzer nly n a cntinuus interval, the algrithm cnsists f a search fr the rst cdewrd psitin. The algrithm begins by assuming a psitin fr the rst cdewrd. If we knw the psitins f the rst n? cdewrds then we can exactly determine the psitin f the n th cdewrd s that the equilibrium cnditins are satised. After each iteratin thrugh the set f cdewrds we crrect the rst cdewrd's psitin depending n the errr assciated with the last cdewrd's previus psitin. III. Asympttic cdewrd density We assume ne dimensinal input data distributed with a prbability density functin P (x). We cnsider fairness functins f the frm F (f i ) = f i and we prve the fllwing therem. Therem. The asympttic cdewrd density Q(x), fr large number f cdewrds is given as 3 + Q(x) = C P (x) 3 + 3 where C is a nrmalizing cnstant. Prf. Let fw i g N i= be the cdewrd psitins and fl i g N i=0 the bundaries dening the partitin f the real line int quantizer intervals (cells), at equilibrium. Psitin w i is given as the centrid w i = Z li l i? Z li l i? up (u)du P (u)du () 4

and, frm the winner cnditin, the bundaries must satisfy F (f i )(l i? w i ) = F (f i+ )(w i+? l i ) (2) where f i = Z li l i? P (u)du: Let u n = l n? + l n 2 we can apprximate and n = l n? l n?. In the limit f large number f cdewrds Thus, equatin () takes the frm P (u n + x) P (u n ) + xp 0 (u n ): (3) w n = u n + P 0 (u n ) 2 n P (u n ) 2 (4) and equatin (2) becmes + n? "? P 0 (l n? ) P (l n? ) = + n n? 2 " (? ) + P 0 (l n ) P (l n ) # " P 0 (l n? )? P (l n? ) P (l n?) 2 # n? 3 n 2 (? ) # " + P 0 (l n ) P (l n ) Ignring higher rder terms and nticing that P 0 (l n ) P 0 (l n?) P (l n ) the abve equatin P (l n? ) becmes P 0 (l n? ) 3 + = + n?? + n : (5) P (l n? ) 6 +2 n? + n +2 In terms f the cdewrd density Q(x) we have the apprximatins = Q(l n? ) + Q 0 (l n? ) n n 2 = Q(l n? )? Q 0 (l n? ) n? n? 2 2 n 3 # : 5

Keeping the leading rder terms we have P 0 (l n? ) 3 + P (l n? ) 6 = n? + n 2 n? n Q(l n? ) + Q 0 (l n? ) 2 Q(l n? ) (6) and bserving that Q(l n? ) 2 n? + n! (7) we btain Q 0 (l n? ) Q(l n ) = 3 + P 0 (l n?) 3 + 3 P (l n? ) (8) which implies that 3 + Q(x) = CP (x) 3 + 3 Q.E.D. (9) The case where = 0 describes the cdewrd density fr simple CL, which is a stchastic gradient descent minimizatin f the mean square errr, i.e., the L 2 distrtin measure as is the case in the Llyd-Max quantizer. Our result, a =3 pwer law, is in agreement with the asympttic density f the Llyd-Max quantizer presented in [9]. We ntice that FSCL generates a cdewrd distributin clser t the data prbability density, while simple Cmpetitive Learning assigns relatively mre cdewrds t regins f small data density and less cdewrds t regins f large data density. Cdewrd distributins that bey input data prbability density pwer laws crrespnd t asympttically ptimal quantizers as shwn by Zadr [9]; a distributin pwer f r=(r + d), with r the dimensin f the data space, minimizes the L d distrtin measure. Frm ur therem, we can cnclude that FSCL generates a scalar quantizer that minimizes the L 2=(3 + ) distrtin measure. Thus, by adjusting the 6

parameter we can design an asympttically ptimal quantizer fr the L p distrtin with p taking any value in the interval (0; 2]. By cmparisn, Khnen' s Self-Organizing Feature Map (KSFM) can design scalar quantizers that asympttically minimize the L p distrtin measure with p taking values 0:5 +, n = 0; ; 2; : : : [0]. Thus, FSCL ers greater exibility, 3 2 2(2n + ) e.g., FSCL with = 0:5 minimizes the L distrtin, which cannt be minimized using KSFM (with rectangular-type neighbrhds as have been analyzed s far). IV. Simulatin Results The cdewrd density predicted frm ur theretical results very clsely describes the actual cdewrd distributin even fr a mderate number f cdewrds. We simulated the FSCL algrithm using 30 cdewrds and a training set f 00,000 input data samples btained frm a gaussian distributin truncated in a nite interval. The initial cdewrd psitins were selected randmly inside the data interval. We plt the cumulative cdewrd distributin as btained by simulatin versus the theretically predicted cumulative distributin. In Fig. we shw the results fr = 0 and in Fig. 2 we shw the results fr = 4. In practice, the parameter is usually a functin f time; a large initial is used t slve the cdewrd underutilizatin prblem and subsequently cnverges t the value that minimizes the desired distrtin measure. References [] Y. Linde, A. Buz, and R. M. Gray, \An algrithm fr vectr quantizer design," IEEE Transactins n Cmmunicatins, vl. 28(), pp. 84{95, 980. 7

[2] S. Grssberg, \Adaptive pattern classicatin and universal recding: I. parallel develpment and cding f neural feature detectrs," Bilgical Cybernetics, vl. 23, pp. 2{34, 976. [3] S. Grssberg, \Adaptive pattern classicatin and universal recding: II. feedback, expectatin, lfactin, illusins.," Bilgical Cybernetics, vl. 23, pp. 87{ 202, 976. [4] D. E. Rumelhart and D. Zipser, \Feature discvery by cmpetitive learning," Cgnitive Science, vl. 9, pp. 75{2, 985. [5] S. Grssberg, \Cmpetitive learning: Frm interactive activatin t adaptive resnance," Cgnitive Science, vl., pp. 23{63, 987. [6] T. Khnen, \Self-rganized frmatin f tplgically crrect feature maps," Bilgical Cybernetics, vl. 43, pp. 59{69, 982. [7] T. Khnen, Self-rganizatin and assciative memry. Berlin: Springer-Verlag, 984. [8] S. C. Ahalt, A. K. Krisnamurthy, P. Chen, and D. E. Meltn, \Cmpetitive learning algrithms fr vectr quantizatin," Neural Netwrks, vl. 3, pp. 277{ 290, 990. [9] P. L. Zadr, \Asympttic quantizatin errr f cntinuus signals and the quantizatin dimensin," IEEE Transactins n Infrmatin Thery, vl. IT-28, pp. 39{49, March 982. [0] H. Ritter, \Asympttic level density fr a class f vectr quantizatin prcesses," IEEE Transactins n Neural Netwrks, vl. 2, January 99. [] J. Max, \Quantizing fr minimum distrtin," IRE Transactins n Infrmatin Thery, vl. 6, 960. 8

Cumulative cdewrd distributin 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. - : theretical : simulatin beta=0, Gaussian pdf 0-5 -4-3 -2-0 2 3 4 5 psitin x Figure : Cdewrd distributin fr Gaussian data and = 0. 9

Cumulative cdewrd distributin 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. - : theretical : simulatin beta=4, Gaussian pdf 0-5 -4-3 -2-0 2 3 4 5 psitin x Figure 2: Cdewrd distributin fr Gaussian data and = 4. 0