Storage capacity of hierarchically coupled associative memories Rogério M. Gomes CEFET/MG - LSI Av. Amazonas, 7675 Belo Horizonte, MG, Brasil rogerio@lsi.cefetmg.br Antônio P. Braga PPGEE-UFMG - LITC Av. Antônio Carlos, 667 Belo Horizonte, MG, Brasil apbraga@cpdee.ufmg.br Henrique E. Borges CEFET/MG - LSI Av. Amazonas, 7675 Belo Horizonte, MG, Brasil henrique@lsi.cefetmg.br Abstract This paper, taking as inspiration the ideas proposal for the TNGS (Theory of Neuronal Group Selection), presents a study of convergence capacity of two-level associative memories based on coupled Generalized-Brain-State-in-a-Box (GBSB) neural networks. In this model, the memory processes are described as being organized functionally in hierarchical levels, where the higher levels would coordinate sets of function of the lower levels. Simulations were carried out to illustrate the behaviour of the capacity of the system for a wide range of the system parameters considering linearly independent (LI) and orthogonal vectors. The results obtained show the relations amongst convergence, intensity and density of coupling. 1. Introduction The brain-state-in-a-box (BSB) neural model was proposed by Anderson and collaborators in 1977 [] and it may be viewed as a version of Hopfield s model [7] with continuous and synchronous updating of the neurons. Hui and Zak [8] extended the BSB neural model with the inclusion of a bias field. Their model is referred to as the generalizedbrain-state-in-a-box (GBSB) neural network model. The BSB and GBSB models can be used in the implementation of associative memories, where each stored prototype pattern, i.e., a memory, is an asymptotically stable equilibrium point. Thus, when the system is initialized in a pattern close enough to the stored pattern, such that it lies within the basin of attraction of the memorized pattern, then the state of the system will evolve in time towards that memorized pattern. The design of artificial neural network associative memories has been dealt with in the last two decades, and some methods have been proposed in [7], [1], [13], [9], [11]. Despite the fact that associative memories have been intensively studied, they have been only analyzed as a single system and they have not been designed as a part of a hierarchical or coupled system. Therefore, taking as inspiration the Theory of Neuronal Group Selection (TNGS) proposed by Edelman [] [3], a multi-level associative memory based on coupled GBSB neural networks was proposed and analyzed in [6] [5]. The TNGS establishes that synapses of the localized neural cells in the cortical area of the brain generate cluster units denoted as: neuronal groups (cluster of neural cells), local maps (reentrant clusters of neuronal groups) and global maps (reentrant clusters of neural maps). In accordance with this theory a neuronal group is the most basic unit in the cortical area of the brain, where memory process arises. It is formed not by a single neuron, but by a cluster of neural cells. Each one of these clusters (Neuronal Groups), is a set of localized, tightly coupled neurons, firing and oscillating in synchrony, thus, forming the building blocks of memory. These neuronal groups are our first-level memory. Some neurons which locate in a cluster, however, have synaptic connections with neurons belonging to other clusters, generating a second level physical structure denoted as Local Map in TNGS. Each of these connection arrangements amongst clusters within a given Local Map results in a certain inter-cluster activity, yielding a second-level memory. This process of grouping and connecting smaller structures generating a larger one through synaptic interconnections between neurons of different neuronal groups, can be repeated recursively. Consequently, new hierarchical levels of memories would emerge through selected correlations of the lower level memories []. In this paper, the generalized-brain-state-in-a-box (GBSB) neural model is used to create a first-level associative memory in a two-level system and is organized as follows. In section we present the model of hierarchically coupled GBSB neural networks and show how multi-level memories may emerge from it. Section 3 presents an analysis of the storage capacity of hierarchically coupled associative memories. Section illustrates the analysis made based
on experiments, showing the probability of convergence of the system into global patterns taking into consideration orthogonal and linearly independent vectors. Finally, Section 5 concludes the paper and presents some relevant extensions of this work. W (i,a)(j,a) i Second-level Memories WCor(i,a)(j,b) j. Two-level memories The GBSB (Generalized-Brain-State-in-a-Box) model [8] is described by: x k+1 = ϕ((i n + βw)x k + βf), (1) where I n is the n n identity matrix, β>0is a step size, W ɛ R n n is the weight matrix which need not be symmetrical, f ɛ R n is the bias field which allows us to have better control of the extension of the basins of attraction of the fixed points of the system and ϕ is a linear saturating function [8]. In our two-level memories, each GBSB neural network will play the role of a first-level memory or a Neuronal Group (TNGS). In order to build a second-level memory we will couple any number of GBSB networks through bidirectional synapses. These new structures will play the part of a second-level memory in which the global patterns could emerge as a selected coupling of the first-level stored patterns (Local Maps - TNGS). Fig. 1 illustrates a two-level hierarchical memory via coupled GBSB model, where each one of the neuron networks A, B and C, is represented by a GBSB network. In a given network, each neuron has synaptic connections with each other, i.e., the GBSB is a fully connected nonsymmetric neural network. Beyond this, some selected neurons in a network are bidirectionally connected with some selected neurons in other networks [1]. These internetwork connections can be represented by a weight internetwork matrix W cor, which accounts for the contribution of one network to another one due to coupling. An analogous procedure could be carried out in order to establish higher levels in the hierarchy [], [1]. In order to account for the effect in one given GBSB network due to the coupling with the remaining GBSB networks, one should, of course, extend 1, by means of adding to it a term which represents the inter-network coupling. Consequently, our multi-level associative memory model can be defined by [6]: x k+1 a = ϕ (I n + β a W a ) x k a + β a f a + γ b=1,b =a W cor x k b () where x k a is the state vector of the a th network at time k, β (i,a) > 0 is the step size and f a is the bias field of the a th network, W a is the synaptic weight of the a th network, N r A γ W Cor(j,b)(i,a) First Level memories C GBSB Nets Figure 1. Coupled neural network design is the number of networks, W cor is the inter-network weight matrix and γ is the intensity of coupling of the synapses between the a th network and the b th network, and x k b is the state vector of the b th network at time k. To sum it up, the first three terms account for the uncoupled GBSB network whilst the fourth term of, represents the intergroup connections. 3. Storage capacity and stability analysis of the coupled model In our coupled model, the first-level memories must be stored as asymptotically stable equilibrium points, moreover, it must be guaranteed that some of these stored patterns in each network form specific combinations, or globally stable emergent patterns, yielding a second-level memory. The weight matrix of each individual network was carefully designed following the algorithm proposed in [15]. This algorithm ensures that the negative patterns of the desired objets are not automatically stored as asymptotically stable equilibrium points of the network, and besides minimizing the number of spurious states. The weight matrix W proposed by Zak and collaborators [15] is described as follows: W a =(D a V a B a )V a +Λ a (I n V a V a) (3) where D a is the R n n strongly row diagonal dominant matrix, V a = [ v 1, v,...,v r] ε { 1, 1} n r, is the matrix of stored patterns, B a =[b, b,...,b] εr n r is the bias field matrix consisting of the column vector b repeated r times, V a é the pseudo-inverse matrix of stored patterns, I N is the n n identity matrix and Λ a is the R n n matrix given by: B
λ (i,a)(i,a) < n,j =i λ (i,a)(i,a) b i () In order to measure the storage capacity of the system, our two-level coupled network is initialized at time k =0in one of the networks which is chosen at random in one of the first-level memories that compose a second-level memory. The other networks, in turn, are initialized in one of the possible patterns, also, randomly. Therefore, the storage capacity is investigated in three analyses: 1. The storage capacity of the networks which are initialized in one of the first-level memories that compose a second-level memory;. The storage capacity of the networks which are initialized in one of their first-level memories, but which do not compose a second-level memory; 3. The storage capacity of the networks which are initialized in one of the possible patterns which do not belong to a first-level memory. Analysis 1: The storage capacity of the networks which are initialized in one of the first-level memories that compose a second-level memory. First of all it will be assumed that V av a = In and from [10] we find: W a V a = (D a V a f a )V av a +Λ a (I n V a V a)v a = D a V a f a (5) Now, due to the fact that in this case the network is initialized in a global pattern we must verify the conditions in which this pattern stays in this stable equilibrium point. Therefore, replacing 5 into and performing the operation L that represents an iteration of the GBSB algorithm, results in: (L(v z a)) i = ϕ {(I n v z a + β a D a v z a) i + γ w cor(i,a)(j,b) x (j,b) b=1,b =a = ϕ vz (i,a) + β a d (i,a)(j,a) v(j,a) z + γ N b=1,b =a m=1 (6) where v z a is the z th state vector of the a th network, N r is the number of networks, is the number of neurons of the individual networks and N p is the number of patterns chosen to function as both first and second-level memories. From the former equation, we define the terms γ Nr β a N b=1,b =a m=1 d (i,a)(j,a) v(j,a) z = Desc = Corr for simplification. Given that Desc has the same absolute value as v(i,a) z to provide instability it is necessary that Corr and Desc of equation 6 should hold different absolute values and Corr should also be greater than Desc in modulo. Hence, this can occur in the following situations: when v (i,a) = 1 and (Corr + Desc) > 0 or when v (i,a) = 1 and (Corr + Desc) < 0. Consequently, the probability P of error of the neuron v (i,a) can be characterized as: Perro 1 = P (v (i,a) = 1)P {(Corr + Desc) > 0} + P (v (i,a) =1)P {(Corr + Desc) < 0} (7) Considering vectors v belong to the set of global patterns chosen randomly implies that P (v(i,a) z = 1) = P (v(i,a) z =1)= 1. Thus, equation 7 can be express as follows: Perro 1 = 1 P {(Corr + Desc) > 0} + 1 P {(Corr + Desc) < 0} (8) Therefore, it is necessary to determine the probability density function of {(Corr + Desc) > 0} and of {(Corr + Desc) < 0} considering that the term Desc represents only a displacement. Regarding U as being the number of components of each vector whose value is 1 in the term Corr and D as being the density of inter-network coupling, (i.e., the actual percentage of the value of Corr due to the interconnection of the inter-network neurons), we assume: Corr = γd(n r 1) [U ( N p U)] = γd(n r 1) [U N p ] (9) Talking into account the fact that the stored vectors are chosen randomly, P (U) can be defined by the following binomial distribution: 3
P (U) = N p (N r 1) 1 N p(n r 1) (10) The binomial distribution defined in 10 can be approached by a normal distribution at average E[U] = N p(n r 1) and variance of σ (u) = NnNp(Nr 1). Then, the average and variance of the term Corr can be obtained from equation 9, E[U] and σ (u) and can be expressed by: [ ] γd(nr 1) E (U N p ) =0 (11) σ Corr = E [ (γd(nr ) ] 1) (U N p ) [ E γd(nr 1) (U N p ) = γ D (N r 1) ( E[U Nn ] N p E[U] +( N p ) ) (1) Since E[U ] = σ [U] +E [U] = (N p(n r 1)),wehave: σ Corr = γ D (N r 1) (N p (N r 1) Let K = N p (N r 1). Then: + (N p (N r 1)) (N r 1) Np + Np ) ] NnNp(Nr 1) + (13) ( σ = γ D Np (N r 1) 3 ) + K( N r ) (1) Finally, it can be easily verified that Corr is a distributed random variable in accordance with a normal distribution at average 0 and variance σ. Furthermore, it is known that a normal distribution is symmetrical in relation to its average point leading to P (Corr > 0) = P (Corr < 0). As a result, equation 8 can be rewritten in the form presented in 15, where the integral function is achieved from the standard normal probability density function at average E[X] and variance σ [x] and the term Desc, in this case, represents the modulo of displacement. Perro 1 = P (Corr > 0) = + 1 = e (u E Corr ) σ Corr du (15) πσcorr Desc Analysis : The storage capacity of the networks which are initialized in one of their first-level memories, but which do not compose a second-level memory This analysis is based on the same procedures observed in Analysis 1. However, differently from the previous analysis, it is expected that the system be unstable for this new condition, in other words, ones expects that the probability of error defined in 15 turns into a rightness probability. In this case, the probability of error in this study will be the complement of equation 15 and can be defined by: Perro = P (Corr > 0) = Desc 1 = e (u E Corr ) σ Corr du (16) πσcorr Analysis 3: The storage capacity of the networks which are initialized in one of the possible patterns which do not belong to a first-level memory. Lillo and collaborators [10] added a term to the right hand side of 3 where (I n V a V a) represents an orthogonal projection onto the null space of V a. As a result, the weight matrix of the individual networks becomes: W a y a = (D a V a B a )V ay a +Λ a (I n V a V a)y a = Λ a (I n V a V a)y a = Λ a y a (17) Then, by substituting operation term 17 into equation and carrying out an L transformation, which represents an iteration of GBSB algorithm, one can verify in which conditions the network which was initialized should not keep evolving towards the initialization vector, i.e., the network would be unstable for this vector which was not stored and does not belong to a global pattern: (L(y a )) i = ϕ {(y a + β a (Λy a + b a )) i +γ w cor(i,a)(j,b) x (j,b) b=1,b =a = ϕ y (i,a) + β a λ (i,a)(j,a) y (j,a) + b (i,a) N +γ (18) b=1,b =a m=1 As defined in the Analysis 1, wehave: β a λ (i,a)(j,a) y (j,a) + b (i,a) = Desc
γ Nr N b=1,b =a m=1 = Corr Given that Desc has different absolute value as y (i,a) to provide stability, which is not desirable for memories that have not been stored, it is necessary that Corr and Desc of 18 should hold different absolute values and Corr should also be greater than Desc in modulo. Hence, this can occur in the following situations: when y (i,a) = 1 and (Corr + Desc) < 0 or when y (i,a) = 1 and (Corr + Desc) > 0. This way, the probability P that stability or error occurs in y (i,a), can be described generically by: Perro 3 = P (y (i,a) = 1)P {(Corr + Desc) < 0} + P (y (i,a) =1)P {(Corr + Desc) > 0} (19) Considering the vectors y chosen randomly we obtain P (y (i,a) = 1) = P (y (i,a) =1)= 1. Thus, equation 19 can be expressed as follows: Perro 3 = 1 P {(Corr + Desc) < 0} + 1 P {(Corr + Desc) > 0} (0) Therefore, it is necessary to determine the probability density function of P {(Corr + Desc) < 0} and of P {(Corr + Desc) > 0} considering that the term Desc represents only a displacement. Hence, Corr can be expressed by 9, obtained in the analysis 1. At last, repeating the procedure developed by equations 10, 11, 1 and 1 we find that Corr is a randomly distributed variable in accordance with a normal distribution at average 0 and variance σ. Furthermore, it is known that a normal distribution is symmetrical in relation to its average point leading to P (Corr > 0) = P (Corr < 0). As a result, equation 0 can be rewriten in the form presented in 1, where the integral function is achieved from the standard normal probability density function at average E[X] and variance σ [x] and the term Desc, in this case, represents the module of the displacement. Perro 3 = P (Corr < 0) = Desc 1 = e (u E Corr ) σ Corr du(1) πσcorr To sum up, the total probability of convergence P conver of the coupled system could be defined by the product of the complement of the probabilities of error as in the previous analyses: P conver = (1 P 1 erro) (1 P erro) (1 P 3 erro). Simulation results The storage capacity of the system was measured by taking into consideration three GBSB networks connected as shown in Fig. 1. In our simulations each network contains 1 neurons. 6 out of 096 possible patterns ( 1 ) are selected to be stored as our first-level memories. The weight matrix of the individual networks was carefully designed following the algorithm proposed in [10]. The selected set of 6 patterns stored as first-level memories were chosen randomly considering LI or orthogonal vectors. In addition, we have chosen randomly amongst the 6 3 = 16 possible combinations of the 3 sets of first-level memories to be our second-level memories. The selected patterns extracted from the first-level memories to form a global pattern, determine the inter-network weight matrix W cor(a,b) by a generalized Hebb rule or Outer Product Method. Network A was initialized at time k =0in one of the two possible first-level memories which compose a secondlevel memory. Network B was initialized in one of the other 5 patterns which were stored as first-level memories but which do not compose second-level memories. On the other hand, Network C was initialized, randomly, in one of the remaining patterns (090) which does not belong to a first-level memory. Then, we measured the probability of convergence of the coupled system considering a density of coupling amongst the inter-network neurons of 0%, 0%, 60% and 100%. Neurons that took part of the inter-network connections were chosen randomly. Points in our experiments were averaged over 1000 trials for a given particular γ (intensity of coupling) and β (intra-network step size) values. The results for LI and orthogonal vectors can be seen in Fig. and 3, which shows that even when 0% of the inter-network neurons were connected our model presented a relevant probability rate of convergence. In addition, the differences between orthogonal and LI vectors were close 10%. We have also analyzed the relation between intensity (γ) and density (D) of coupling. We observed that when D value decreases it is necessary an increase of the γ value in such way as to improve the probability rate of convergence. 5. Conclusions In this paper, we have presented a proposal for the evaluation of the capacity of a model of multi-level associative memories based on the TNGS [], through artificial neural networks. We derive a set of equations that evaluates the probability of convergence of these coupled systems. Simulations had been carried out through a two-level memory system and the relations between convergence, intensity and density of coupling were showed considering 5
80 80 Probability of convergence (%) 70 60 50 0 30 0 10 100% 60% 0% 0% Beta=0.3 Probability of convergence (%) 70 60 50 0 30 0 10 100% 60% 0% 0% Beta=0.3 0 0 6 8 Gamma 0 0 6 8 Gamma Figure. Probability of convergence for a density of coupling amongst the internetwork neurons of 0%, 0%, 60% and 100% - LI vectors Figure 3. Probability of convergence for a density of coupling amongst the internetwork neurons of 0%, 0%, 60% and 100% - Orthogonal vectors linearly and orthogonal vectors. The storage capacity proved to be significant for both LI and orthogonal vectors and it could also be noted that the probability of convergence achieved for orthogonal vectors exceeded that in LI vectors in 10% showing that it is possible to build multi-level memories where higher levels would coordinate sets of memories of the lower levels. This work is currently being generalized in order to compare the capacity and convergence of multi-level memories with two-level memories which present the same number of first-level memories. References [1] I. Aleksander. What is thought? NATURE, 9(6993):701 70, 00. [] J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones. Distinctive features, categorical perception, probability learning: some applications of a neural model, chapter, pages 83 35. MIT Press, Cambridge, Massachusetts, 1985. [3] W. J. Clancey. Situated cognition : on human knowledge and computer representations. Learning in doing. Cambridge University Press, Cambridge, U.K. ; New York, NY, USA, 1997. [] G.M.Edelman. Neural darwinism: The theory of neuronal group selection. Basic Books, New York, 1987. [5] R. M. Gomes, A. P. Braga, and H. E. Borges. Energy analysis of hierarchically coupled generalized-brain-state-in-box GBSB neural network. In Proceeding of V Encontro Nacional de Inteligência Artificial - ENIA 005, pages 771 780, São Leopoldo, Brazil, Julho 005. [6] R. M. Gomes, A. P. Braga, and H. E. Borges. A model for hierarchical associative memories via dynamically coupled GBSB neural networks. In Proceeding of Internacional Conference in Artificial Neural Networks - ICANN 005, Warsaw, Poland, September 005. Springer-Verlag. (to be published). [7] J. J. Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Science U.S.A., 81:3088 309, May 198. [8] S. Hui and S. H. Zak. Dynamical analysis of the brain-statein-a-box (BSB) neural models. IEEE Transactions on Neural Networks, 3(5):86 9, 199. [9] J. Li, A. N. Michel, and W. Porod. Analysis and synthesis of a class of neural networks: Variable structure systems with infinite gains. IEEE Transactions on Circuits and Systems, 36:713 731, May 1989. [10] W. E. Lillo, D. C. Miller, S. Hui, and S. H. Zak. Synthesis of brain-state-in-a-box (BSB) based associative memories. IEEE Transactions on Neural Network, 5(5):730 737, set 199. [11] A. N. Michel, J. A. Farrell, and W. Porod. Qualitative analysis of neural networks. IEEE Transactions on Circuits and Systems, 36:9 3, 1989. [1] L. Personnaz, I. Guyon, and G. Dreyfus. Information storage and retrieval in spin-glass-like neural networks. Journal de Physique Lettres (Paris), 6:359 365, 1985. [13] L. Personnaz, I. Guyon, and G. Dreyfus. Collective computational properties of neural networks: New learning mechanisms. Physical Review A, 3:17 8, 1986. [1] J. P. Sutton, J. S. Beis, and L. E. H. Trainor. A hierarchical model of neocortical synaptic organization. Mathl. Comput. Modeling, 11:36 350, 1988. [15] S. H. Zak, W. E. Lillo, and S. Hui. Learning and forgetting in generalized brain-state-in-a-box (BSB) neural associative memories. Neural Networks, 9(5):85 85, 1996. 6