Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract We study the cdewrd distributin fr a cnsciense type cmpetitive learning algrithm, Frequency Sensitive Cmpetitive Learning (FSCL), using ne dimensinal input data. We prve that the asympttic cdewrd density in the limit f large number f cdewrds is given by a pwer law f the frm Q(x) = C P (x), where P (x) is the input data density and depends n the algrithm and the frm f the distrtin measure t be minimized. The algrithm can be adjusted t minimize any L p distrtin measure with p ranging in (0; 2]. I. Intrductin Cmpetitive learning type algrithms are used fr vectr quantizatin and pattern recgnitin amng ther applicatins. Cmpared t batch algrithms, like the well knwn LBG algrithm [], they er the advantage f perating n-line and being adaptive. They are als atractive when implemented in parallel neural netwrk architectures, thus increasing the speed f encding/decding peratins. Simple Cmpetitive Learning (CL), even thugh cmputatinally very ecient, is knwn t suer frm the prblem f cdewrd underutilizatin, that is cdewrds
initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Feature Map (KSFM) [6] [7] mre succesfully appraches the ptimal cdebk design and vercmes the underutilizatin prblem. Hwever it is cmputatinally mre intensive and its prperties depend n the selectin f an apprpriate neighbrhd functin, which is nt an easy prblem especially fr multi-dimensinal data spaces. A mre recent algrithm, the Frequency Sensitive Cmpetitive Learning (FSCL) [8], vercmes the cdewrd underutilizatin prblem and it is cnsiderably simpler than KSFM. In simulatin studies it has been shwn t perfrm at least as gd as the previus CL-type algrithms with respect t the ptimality f the cdebk design and the speed f cnvergence t the equilibrium state. The characterizatin f the equilibrium state f the CL type algrithms, and mre specically the asympttic cdewrd distributin fr large number f cdewrds, is a generally pen prblem fr data spaces with tw r mre dimensins. The prblem has nly been cmpletely slved fr simple CL, by Zadr [9]. Fr KSFM analytical results have been btained nly in the case f ne dimensinal data space [0]. In this brief paper we analyze the asympttic cdewrd distributin fr the FSCL algrithm in the case f ne dimensinal data. We shw that the algrithm can be adjusted t minimize a cntinuus range f distrtin measures. The results have direct appplicatin t the prblem f prbability density estimatin. 2
II. FSCL algrithm and equilibrium cnditins The FSCL algrithm belngs t the class f cnscience type cmpetitive learning algrithms. In FSCL, N units r cdewrds are assigned crrespnding psitins fw i g N i= and update frequencies ff i g N i=, where the update frequencies are dened as f i = c i t with the (cunt) c i being the number f times that unit i has been updated up t time t. Clearly, P i f i (t) = t. At time t, an input data vectr x(t) is presented t the netwrk and the winner unit is selected as the ne that minimizes the prduct f a fairness functin F, which is an increasing functin f the update frequency, times the distance (distrtin measure) frm the input data vectr, i.e., F (f i (t))kw i (t)? x(t)k. Only the winner unit j is updated using w j (t + ) = w j (t) + (t)(x(t)? w j (t)): The underutilizatin prblem is slved because frequent winners are \penalized" s that eventually all units win the cmpetitin sme prtin f the time. At equilibrium, each unit i wins the cmpetitin fr all input data inside a neighbrhd (cell) C i f w i and w i is the centrid w i = Z up (u)du C Z i : P (u)du C i where P (u) is the input data prbability density functin and the equilibrium update frequencies are given by Z f i = P (u)du: C i Fr the calculatin f the equilibrium psitins, given any number f cdewrds, 3
we can use a mdicatin f the Llyd-Max iterative algrithm [] fr scalar quantizer design. If the input data prbability density is nnzer nly n a cntinuus interval, the algrithm cnsists f a search fr the rst cdewrd psitin. The algrithm begins by assuming a psitin fr the rst cdewrd. If we knw the psitins f the rst n? cdewrds then we can exactly determine the psitin f the n th cdewrd s that the equilibrium cnditins are satised. After each iteratin thrugh the set f cdewrds we crrect the rst cdewrd's psitin depending n the errr assciated with the last cdewrd's previus psitin. III. Asympttic cdewrd density We assume ne dimensinal input data distributed with a prbability density functin P (x). We cnsider fairness functins f the frm F (f i ) = f i and we prve the fllwing therem. Therem. The asympttic cdewrd density Q(x), fr large number f cdewrds is given as 3 + Q(x) = C P (x) 3 + 3 where C is a nrmalizing cnstant. Prf. Let fw i g N i= be the cdewrd psitins and fl i g N i=0 the bundaries dening the partitin f the real line int quantizer intervals (cells), at equilibrium. Psitin w i is given as the centrid w i = Z li l i? Z li l i? up (u)du P (u)du () 4
and, frm the winner cnditin, the bundaries must satisfy F (f i )(l i? w i ) = F (f i+ )(w i+? l i ) (2) where f i = Z li l i? P (u)du: Let u n = l n? + l n 2 we can apprximate and n = l n? l n?. In the limit f large number f cdewrds Thus, equatin () takes the frm P (u n + x) P (u n ) + xp 0 (u n ): (3) w n = u n + P 0 (u n ) 2 n P (u n ) 2 (4) and equatin (2) becmes + n? "? P 0 (l n? ) P (l n? ) = + n n? 2 " (? ) + P 0 (l n ) P (l n ) # " P 0 (l n? )? P (l n? ) P (l n?) 2 # n? 3 n 2 (? ) # " + P 0 (l n ) P (l n ) Ignring higher rder terms and nticing that P 0 (l n ) P 0 (l n?) P (l n ) the abve equatin P (l n? ) becmes P 0 (l n? ) 3 + = + n?? + n : (5) P (l n? ) 6 +2 n? + n +2 In terms f the cdewrd density Q(x) we have the apprximatins = Q(l n? ) + Q 0 (l n? ) n n 2 = Q(l n? )? Q 0 (l n? ) n? n? 2 2 n 3 # : 5
Keeping the leading rder terms we have P 0 (l n? ) 3 + P (l n? ) 6 = n? + n 2 n? n Q(l n? ) + Q 0 (l n? ) 2 Q(l n? ) (6) and bserving that Q(l n? ) 2 n? + n! (7) we btain Q 0 (l n? ) Q(l n ) = 3 + P 0 (l n?) 3 + 3 P (l n? ) (8) which implies that 3 + Q(x) = CP (x) 3 + 3 Q.E.D. (9) The case where = 0 describes the cdewrd density fr simple CL, which is a stchastic gradient descent minimizatin f the mean square errr, i.e., the L 2 distrtin measure as is the case in the Llyd-Max quantizer. Our result, a =3 pwer law, is in agreement with the asympttic density f the Llyd-Max quantizer presented in [9]. We ntice that FSCL generates a cdewrd distributin clser t the data prbability density, while simple Cmpetitive Learning assigns relatively mre cdewrds t regins f small data density and less cdewrds t regins f large data density. Cdewrd distributins that bey input data prbability density pwer laws crrespnd t asympttically ptimal quantizers as shwn by Zadr [9]; a distributin pwer f r=(r + d), with r the dimensin f the data space, minimizes the L d distrtin measure. Frm ur therem, we can cnclude that FSCL generates a scalar quantizer that minimizes the L 2=(3 + ) distrtin measure. Thus, by adjusting the 6
parameter we can design an asympttically ptimal quantizer fr the L p distrtin with p taking any value in the interval (0; 2]. By cmparisn, Khnen' s Self-Organizing Feature Map (KSFM) can design scalar quantizers that asympttically minimize the L p distrtin measure with p taking values 0:5 +, n = 0; ; 2; : : : [0]. Thus, FSCL ers greater exibility, 3 2 2(2n + ) e.g., FSCL with = 0:5 minimizes the L distrtin, which cannt be minimized using KSFM (with rectangular-type neighbrhds as have been analyzed s far). IV. Simulatin Results The cdewrd density predicted frm ur theretical results very clsely describes the actual cdewrd distributin even fr a mderate number f cdewrds. We simulated the FSCL algrithm using 30 cdewrds and a training set f 00,000 input data samples btained frm a gaussian distributin truncated in a nite interval. The initial cdewrd psitins were selected randmly inside the data interval. We plt the cumulative cdewrd distributin as btained by simulatin versus the theretically predicted cumulative distributin. In Fig. we shw the results fr = 0 and in Fig. 2 we shw the results fr = 4. In practice, the parameter is usually a functin f time; a large initial is used t slve the cdewrd underutilizatin prblem and subsequently cnverges t the value that minimizes the desired distrtin measure. References [] Y. Linde, A. Buz, and R. M. Gray, \An algrithm fr vectr quantizer design," IEEE Transactins n Cmmunicatins, vl. 28(), pp. 84{95, 980. 7
[2] S. Grssberg, \Adaptive pattern classicatin and universal recding: I. parallel develpment and cding f neural feature detectrs," Bilgical Cybernetics, vl. 23, pp. 2{34, 976. [3] S. Grssberg, \Adaptive pattern classicatin and universal recding: II. feedback, expectatin, lfactin, illusins.," Bilgical Cybernetics, vl. 23, pp. 87{ 202, 976. [4] D. E. Rumelhart and D. Zipser, \Feature discvery by cmpetitive learning," Cgnitive Science, vl. 9, pp. 75{2, 985. [5] S. Grssberg, \Cmpetitive learning: Frm interactive activatin t adaptive resnance," Cgnitive Science, vl., pp. 23{63, 987. [6] T. Khnen, \Self-rganized frmatin f tplgically crrect feature maps," Bilgical Cybernetics, vl. 43, pp. 59{69, 982. [7] T. Khnen, Self-rganizatin and assciative memry. Berlin: Springer-Verlag, 984. [8] S. C. Ahalt, A. K. Krisnamurthy, P. Chen, and D. E. Meltn, \Cmpetitive learning algrithms fr vectr quantizatin," Neural Netwrks, vl. 3, pp. 277{ 290, 990. [9] P. L. Zadr, \Asympttic quantizatin errr f cntinuus signals and the quantizatin dimensin," IEEE Transactins n Infrmatin Thery, vl. IT-28, pp. 39{49, March 982. [0] H. Ritter, \Asympttic level density fr a class f vectr quantizatin prcesses," IEEE Transactins n Neural Netwrks, vl. 2, January 99. [] J. Max, \Quantizing fr minimum distrtin," IRE Transactins n Infrmatin Thery, vl. 6, 960. 8
Cumulative cdewrd distributin 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. - : theretical : simulatin beta=0, Gaussian pdf 0-5 -4-3 -2-0 2 3 4 5 psitin x Figure : Cdewrd distributin fr Gaussian data and = 0. 9
Cumulative cdewrd distributin 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. - : theretical : simulatin beta=4, Gaussian pdf 0-5 -4-3 -2-0 2 3 4 5 psitin x Figure 2: Cdewrd distributin fr Gaussian data and = 4. 0