Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in, anil.rayala@students.iiit.ac.in 2 Indian Institute of Technology, Guwahati, India. d.munugoti@iitg.erent.in Abstract. In this research paper, a novel Convolutional Associative Memory is proposed. In the proposed model, Synapse of each neuron is modeled as a Linear FIR filter. The dynamics of Convolutional Associative Memory is discussed. A new method called Sub- Sampling is given. Proof of convergence theorem is discussed. An example depicting the convergence is shown. Special cases to the proposed convolutional Associative memory are discussed. A new vector Hopfield Associative memory is proposed. Some potential applications of the proposed model are also proposed. Keywords: Convolutional Associative Memory FIR filter Sub Sampling Matrix Hankel Matrix Vector Hopfield Network. 1 Introduction Artificial Neural Networks (ANNs) such as Multi- Layer Perceptron (MLP), Hopfield Neural Network (HNN) are based on synaptic weights that are scalars. Researchers conceived of ANNs based on dynamic synapse i.e. the synapse is modeled as a linear filter e.g. Finite Impulse Response (FIR) filter [1,2,3]. One of the motivations of such a model of synapse is the ability to filter noise corrupting the signals. In recent years, ANNs based on deep learning paradigm are able to provide excellent results in many applications (with the ability to learn features from the data automatically). Specifically Convolutional Neural Networks (CVNNs) pioneered hand writing recognition systems and other related Artificial Intelligence (AI) systems. One of the main goals of the research in AI is to build a system which has Multi-Modal Intelligence i.e. a system combining and reasoning with different inputs e.g. Vision, Sound, Smell and Touch which is similar to how the human brain works. It is biologically more appealing for such system to have a memory which is similar to that of human memory. Human memory is based on associations with the memories it contains. For example a part of a Well-Known tune is enough to bring the whole song back to mind. Associative memories can be used for tasks like adfa, p. 1, 2011. Springer-Verlag Berlin Heidelberg 2011

- Completing information if some part is missing, - Denoising if noisy input is given, - To guess information that means if a pattern is presented, the most similar stored pattern is determined etc. Hopfield Neural Network is widely used as an Associative memory. In this research paper, we propose and study a novel associative memory based on FIR filter model of synapse. Several other researches proposed certain types of Convolutional Associative Memories [5], but, the approach of Sub-Sampling performed in our method is very different from any other contribution. We expect our Convolutional Associative Memory to find many applications. This research paper is organized as follows: In Section 2, the previous known literature of Hopfield Network is reviewed. In section 3, the dynamics of the proposed novel Convolutional Associative Memory is studied. In section 4, examples of the proposed model of convolutional associative memory are given. In section 5, some applications of the proposed Convolutional Associative Memory are discussed. The paper concludes in section 6. 2 Review of literature In 1982, John Hopfield proposed a form of neural network which potentially acts as an Associative memory called Hopfield Neural Network. A Hopfield Neural network is a form of recurrent neural network which is a non-linear dynamical system based on weighted, undirected graph. In a Hopfield Network, each node of the graph represent an artificial neuron which assumes a binary value {+1 or -1} and the edge weights correspond to synaptic weights. The order of the network corresponds to number of neurons. If all the synaptic weights of the network are represented in a synaptic weight matrix M, a Hopfield neural network can be represented by M and threshold vector T. Since each neuron has a state of +1 or -1, the state space of an N neuron Hopfield network can be given by an N dimensional unit -hypercube. Let the state of the non-linear dynamical system is represented by the N x 1 vector, S (t) and let S i (t) { +1 or 1 } represent the state of i th neuron at time instant t. The state update of i th neuron is done by, S i (t + 1 ) = Sign{ M ij S j (t) T i } Depending on the number of neurons at which state is updated at a given time instant, the network operation can be classified. Two main types of network operation are: N j=1 1) Serial Mode updating. Here at any time instant, only state of one node of the network is updated. 2) Fully Parallel Mode updating: Here, at every time instant, state of all the nodes are updated.

The network is said to be stable / converged if and only if: S (t + 1) = S (t) = Sign{MS (t) T} Thus, if the network has reached stable state / converged, irrespective of the mode of operation the network remains in that state forever. The following convergence theorem summarizes the dynamics of Hopfield Network. Theorem 1. Let the pair N = (M, T) specify a Hopfield neural network. Then the following hold true: 1) If the network is operating in a serial mode and the elements of the diagonal of M are non-negative, the network will always converge to a stable state (i.e. There are no cycles in the state space). 2) If the network is operating in the fully parallel mode, the network will always converge to a stable state or to a cycle of length 2 (i.e. the cycles in the state space are of length almost 2). Proof of the above theorem can be found in [4]. 3 Dynamics of Convolutional Associative Memory Before we begin the dynamics of Convolutional Associative Memory, some important innovative concepts are discussed: 3.1 State as a Sequence In the convolutional Associative Memory, each neuron is modeled to have a state which is a sequence of binary values {+1,-1} of length L rather than a single value {+1 or -1} as in the case of Hopfield network. All the neurons are interconnected and it is assumed that there are no self connections. For practical considerations it is taken that the sequence starts at time n = 0. Notation. S represents a Matrix of size (L x N) where N correspond to the number of neurons and Si(i th column of S) correspond to the state of i th neuron. 3.2 Synapse as a Linear FIR filter: Sub - Sampling Based on biological motivation, the author for the first time proposed the idea of modeling the synapse as a linear filter in [1]. The synapse of each connection is taken to be a Linear Discrete time Finite Impulse Response Filter of length F rather than a single constant as in the case of Hopfield Network. From practical and theoretical considerations, this type of model is considered to be more realistic. It is assumed that, the sequence starts at time n = 0. Since output of each synapse is convolution of state sequence of a neuron and a filter, the output of the synapse is a sequence of length L+F- 1 and since this sequence is used to update the state of a neuron, it needs to be compressed to a sequence of length L, this compression is called as Sub - Sampling. The compression can be done in infinite ways, it can observed that the compression is a

linear operator and can be realized by a matrix of size L x L+F-1 1. Let K represent the sub-sampling matrix which does the task of compression. In digital signal processing, it is very well known that windows such as Hamming, Hanning, Blackman window are utilized. We invoke such results in the design of synaptic FIR filter. 3.3 Convolution as Matrix Multiplication In the convolutional associative memory, the convolution of the synapse filter sequence (between the neurons i, j) and state sequence can be realized by multiplying a convolution matrix H i,j (L+F-1 x L) with the state sequence of length L. The H i,j matrix can be observed to be a Toeplitz matrix and the first column can be given as a vector of length L+F-1 in which only first F elements are non-zero, and the first row is a vector of length L with only first element non-zero. For L = 4, F =3: H i,j matrix for a synapse can be given as: h(0) 0 0 0 h(1) h(0) 0 0 h(2) h(1) h(0) 0 0 h(2) h(1) h(0) 0 0 h(2) h(1) [ 0 0 0 h(2) ] It can be seen that, multiplying H i,j with a vector of length L is same as convolving a sequence of length L {state sequence of each neuron} and the impulse response of length F {h(0), h(1), h(2)}. Notation. Let H denote an NxN Cell with each element as a Matrix. H i,j represents the synaptic filter convolution matrix between the neurons i an j. It is assumed that H i,j and H j,i are same for all i and j. Since it is also assumed that there are no self-connections (Section 3.1) i.e. H i,i is a null matrix for all i. 3.4 Serial Mode Updation In serial mode updating, the state sequence of i th neuron is updated by, Si(t+1) = sign( K j * Hi,j * Sj(t)) (1) Si(t+1) in the above equation is the updated state sequence (length L) of i th neuron. Sj(t) is the state sequence (length L) of j th neuron. Hi,j is the convolution Matrix of the synapse between neuron i and j. Matrix multiplication is denoted by *. 1 A is a matrix of size M x N denotes that A has M rows and N columns.

3.5 Energy of the Network The energy function associated to the convolutional associative memory network is defined to be: E = - i,j Si(t) T * K * Hi,j * Sj(t) (2) It can be observed that for an N neuron network and for given H i,j the above defined energy function is bounded below. In equation (2), Si(t) T denotes the transpose of Si(t). 3.6 Proof of Convergence of energy in serial mode If the state of i th neuron is changed from Si(t) to Si(t+1) the change in energy can be given by, E (Si(t+1)) E (Si(t)) = -1 *{ j(si(t+1)- Si(t)) T * K * Hi,j * Sj(t) + j Sj(t) T * K *Hj,i * (Si(t+1)- Si(t))}. Note. If the energy of the network has to converge, the change in energy should be either negative or zero for all time instants. We know that, Si(t+1) - Si(t) (element wise) and j K * Hi,j * Sj(t) have same sign since Si(t+1) = sign{ j K * Hi,j * Sj(t)} Hence we can say that is always negative or zero. j(si(t+1)- Si(t)) T * K * Hi,j * Sj(t) So the first term of the change in energy is always negative or zero and if the second term is equal to the first term then we can say that the total change energy of the network is always negative or zero. If K * Hj,i is a symmetrical Matrix, then j Sj(t) T * K *Hj,i * (Si(t+1)- Si(t)) and j(si(t+1)- Si(t)) T * K * Hi,j * Sj(t) are equal. Which can be easily seen by taking transpose of either of them and also using the fact that H i,j = H j,i from 3.3. Therefore, E (Si(t+1)) E (Si(t)) is always either negative or zero if K * Hj,i is a symmetric matrix. Since the energy is a lower bounded function and since it always decreases or remains constant we can say that the energy of the system converges. We know that, Hj,i is in the form as suggested in 3.3 for all i and j, if K is a Hankel matrix, then K* Hj,i will be a symmetric matrix.

3.7 proposed form of K K has to be a Hankel Matrix. For L =4; F=3, the K is: b1 b2 b3 b4 b5 b6 b2 b3 b3 b4 b4 b5 b5 b6 b6 b7 b7 b8 b4 b5 b6 b7 b8 b9 The above proposed form is one of many forms of sub-sampling for which the network s energy converges and it should not be mistaken as the only form. 3.8 Convergence of the network to a Stable State If the energy of the system has converged, then there are two possible cases 1) Si(t+1)- Si(t) = 0 and change in energy is zero. 2) All the elements of Si(t+1)- Si(t) might not be zero but the corresponding elements in j K * Hi,j * Sj(t) might be zero hence making change in energy zero. This kind of change is only possible when one or many elements of S i changes from -1 to +1. Hence once the energy in the network has converged it is clear from the foregoing facts that the network will reach a stable state after at most L*N 2 time intervals. It is well known that convergence proof in parallel mode can be reduced to that in serial mode [6]. In view of the above proof, we have the following convergence theorem Theorem 2: Let the pair R = (S, H) represent the Convolutional Associative Memory where S in the form as discussed in 3.1 and H in the form as discussed in 3.3. Considering K to be a Hankel Matrix, then the following hold true. 1) If the network is operating in a serial mode, the network will always converge to a stable state (i.e. There are no cycles in the state space). 2) If the network is operating in the fully parallel mode, the network will always converge to a stable state or to a cycle of maximum length 2. The proof for parallel mode can be done by converting parallel mode into serial mode as proposed in [6]. 4 Special cases 4.1 K as H T If K (sub sampling matrix) associated with every synapse is transpose of corresponding convolution matrix i.e K = (Hi,j) T then K * Hi,j is a symmetric matrix, which leads to convergence of energy and then convergence of state matrix.

Some interesting results can be observed in this case. Gram matrix. Gram matrix of a matrix A is calculated as A T * A. Note. Gram matrix is always symmetric and semi-positive definite. It can be shown by a simple proof. Now, Energy of the system as given in 3.5 is - i,j Si(t) * K * Hi,j * Sj(t) If K = (H ij) T then energy = - i,j Si(t) * Gi,j * Sj(t) where G is Gram matrix of H ij. Since G is semi-positive definite matrix, We can say that if S i(t)=s j(t) i.e if all neurons have same state then energy obtained in this case will be minimum. Hence state matrix with all equal columns will behave as local energy minima. Result. If we take K = (H ij) T, then there is a very good chance that our network is converged such that States of all neurons are same. 4.2 Synapse as a Linear Time Variant System In the above proposed convolutional associative memory, Synapse is modeled as a Linear Time invariant System. If we take the synapse to be a non - causal linear time variant system, the H i,j matrix is no longer in Toeplitz form but, assumes some arbitrary matrix form since the system is linear. For such a system to converge K*H i,j should be a symmetric matrix. Since for any given matrix H i,j we can always find a K such that K*H i,j will be a symmetric matrix, we can say that if the synapse is modeled as a linear time variant system the network converges. 4.3 Vector Hopfield Network From the above discussion it is clear that if K*H i,j is a symmetric matrix and taking H i,j to be equal as H j,i, the network converges for both cases where H i,j is a Toeplitz matrix and where H i,j is some arbitrary matrix. So if we take a symmetric matrix W i,j as K*H i,j for each synapse, it is sufficient to represent the vector Hopfield network. The vector Hopfield network should not be mistaken as a normal Hopfield network operating in a parallel mode with just being represented in layers. The difference between a vector Hopfield network and normal Hopfield network operating in parallel mode is, in a Vector Hopfield network, the state of any element of i th neuron is not influenced by the states of other elements in i th neuron but in a Hopfield network operating in parallel mode, state of every neuron is influenced by all other neurons including those which are being parallely updated. For implementation purposes it is advisable to use a vector Hopfield network in place of normal Hopfield network since for implementing a Hopfield network on real data, the data should be converted to binary form of +1 s and -1 s. So if a value of 126 is obtained for a feature, then storing it in 7 neurons requires connections between them

whereas in a vector Hopfield it can be stored in a vector and does not require connections in between them. Hence to store same patterns as a Hopfield network, Vector Hopfield network requires less memory. 4.4 Information Storage algorithm for Vector Hopfield If S is the state matrix to be stored, then update the weight vectors W of vector Hopfield network by, W i,j = W i,j + f ( outer product of S i and (S j) T ) Where S i and S j are i th and j th columns of S and function f is defined as f(a)= A+ A T where A is arbitrary matrix and is the learning rate. Hence if every synapse in the network has W in the above proposed form then our required state matrix S acts as a local minima so that it can be retrieved. The proof is similar to that of Hopfield network. 5 Examples 5.1 Synapse as an LTI system For such a system H will be in the form as suggested in section 3.3. Consider the case where, Number of neurons=4; Pattern length= 3 Filter length=2 Initial state matrix = 1 1 1 1 1 1 1 1 1 where each column denotes state of a neuron 1 1 1 Let, H{i,j} corresponds to filter coefficients for a synapse between i th and j th neuron and take H{1,2}=H{2,1}= { 4, 1} H{1,3}=H{3,1}= { -2, 0} H{1,4}= H{ 4,1}= {1,3} H{2,3}=H{3,2} = {-5, 0} H{2,4}=H{4,2} = {-1,1} H{3,4} =H{4,3} = {0, -3} let us take subsampling matrix, K = which is in the required form. Initial energy of the system E in = 512 6 2 4 2 4 4 4 4 6 4 6 2

Now let us update the first neuron, the state matrix of network becomes 1 1 1 1 1 1 1 1 1 1 1 1 Updated energy of the system E upd = 0 Now let us update the seconsd neuron, the state matrix of network becomes 1 1 1 1 Updated energy of the system E upd = - 320 Now let us update the third neuron, the state matrix of network becomes 1 1 1 1 Updated energy of the system E upd = - 432 Now let us update the fourth neuron, the state matrix of network becomes 1 1 1 1 Updated energy of the system E upd = -432 Again updating the states of all the neurons doesn t alter the states of neurons which implies the system has converged. Hence the example we took converged. 5.2 Synapse as a Binary Filter Now, let us consider an example in which we take filter coefficients as only binary values i.e. +1 and -1. Consider n=4; L=3; F=3. Initial state matrix= 1 1 1 1 1 1 1 1 1 1 1 1 where i th column denotes state of i th neuron H(1,2)= H(2,1) = { -1, -1, -1 } H(1,3)= H(3,1) = { -1, 1, 1 } H(1,4)= H(4,1) = { 1, -1, 1 } H(2,3)= H(3,2) = { 1, -1, 1 } H(2,4)= H(4,2) = { -1, -1, -1 } H(3,4)= H(4,3) = { -1, -1, -1 } Let us take the sub sampling matrix as

-10-7 4 2-1 7 4 2-1 0 4 2-1 0-5 which is a Henkel Matrix If we update the network in serial mode after the first loop, the state matrix becomes s 1-1 1-1 1-1 1 1 1 1-1 -1 After second loop, state matrix is same as above which indicates convergence. 5.3 Synapse as Non causal LTV system Now, let us consider a case in which synapse between the neurons act as an LTV (Linear time variant) filter instead of LTI filter Consider n=4; L=4 and H i,j to be a matrix of size 7x4. H(1,2)=H(2,1)= -6-5 3 0-2 2 5 1 1 1 0-9 4 9-3 9-4 7-1 -2-6 -8-4 4 H(1,3)=H(3,1) = 5 3-10 6 8 5-10 -3 4 4-6 -5 3-1 2-6 -7 6 5 8-8 -7-9 -1 H(1,4)=H(4,1)= -7 7-9 -10 1 5-4 -7-4 -6 0 8 2-8 -3-9 0-2 9 6-1 7-8 -3 H(2,3)=H(3,2)= 8 8 4 2-4 8-8 4 2 6-3 4 6-4 1 9

0-4 2-3 5-2 -1 3 H(2,4)=H(4,2) = 9-4 6 4 9-10 -3 3-5 -6 4 2 1 3-10 -4-1 -6 4 7-5 4-8 6 H(3,4)=H(4,3) = -8 1-3 6 0-1 7-3 -2 9-10 9-7 3 1 3-3 2 6-10 -9 9 3-6 Now take take sub sampling matrix associated with synapse between i th and j th neuron as K ij = (H(i,j)) T in order for the product K ij * H(i,j) to be a symmetric matrix so that convergence is guaranteed. Now initial state matrix = -1-1 1-1 -1-1 -1-1 -1-1 1 1-1 -1-1 1 If we update first, second, third, fourth neurons one after the another in order then after First loop, the state matrix becomes -1-1 -1-1 -1-1 -1-1 After second loop, state matrix is same as above which indicates convergence. 6 Applications: Multi Modal Intelligence 1) One of the major challenges to the research in Artificial Intelligence has been Multi modal intelligence i.e. a system combining and reasoning with different inputs e.g. Vision, Sound, Smell, Touch. This is akin to how we humans perceive the world around us - our sight works in conjunction with our sense of hearing, touch etc. AI technologies that leverage multi-modal data are likely to be more accurate. The problem of Multi- Model Intelligence can be solved

by using a Convolutional Associative Memory. Since each neuron has a state which is a vector of length L, different elements of the vector can correspond to different inputs, i.e. the last element of the vector at each neuron can correspond to speech the above one can correspond to visual data etc. Example: I) A video camera might be able to accurately "recognize" a human if it recognizes human voices and touch. II) In Many security systems, only one input from a user is taken to know if he is authorized to enter or not. Using convolutional Associative memory proposed in this paper many inputs can be taken from user i.e. voice, fingerprint, visual etc. to have a better security system. 2) The proposed convolutional associative memory can be used in robust speech recognition, since each synapse is modeled as a linear FIR filter it is more likely that such a system filters out the noise and produces more accurate results than that of existing systems. 7 Implementation The above discussed Convolutional Associative memory is used to classify emotions in speech, other researchers have implemented Hopfield networks using LPC for emotion recognition and the efficiency observed is 46%. For emotion classification, the emotions are combined into two sets (Anger and Happiness). The MFCC of the each set are extracted. The average of all the MFCC s of anger is computed and the network is trained to store the average, similarly average of all the MFCC s of Happiness is computed and network is trained to store the average. A small set of the training data is used for validation. The network is tested for a data which is not included in training and the efficiency is observed to be 61%. The observed efficiency is more than that of a normal Hopfield network. The error observed can be justified as the Associative memories are to be used to store finite patterns and with added noise the network should converge to the stored patterns. The data we used for training and testing included speech signals of different speakers and since we are taking the averages of all the Anger signals or Happiness signals the observed accuracy can be justified. The Convolutional Associative Memory proposed in this paper outperforms the normal Hopfield memory in most of the cases and should be preferred to Hopfield Network. 8 Conclusion In this research paper, the synapse is modeled as a discrete time Linear FIR filter of length F and state of each neuron is modeled as a sequence of length L. It is proved that such a system converges in serial mode operation for the proposed type of sub sampling matrix. Some special cases to the proposed Convolutional Associative memory

are discussed. A novel vector Hopfield network is proposed. It is expected that such an artificial neural network will find many applications. References 1. G. Rama Murthy, Some Novel Real/Complex-Valued Neural Network Models" Advances in Soft Computing, Springer Series,Computational Intelligence, Theory and Applications, Proceedings of 9th Fuzzy Days ( International Conference on Computational Intelligence), Dortmund, Germany, September 18-20, 2006. 2. G.Rama Murthy, "Finite Impulse Response (FIR) Filter Model of Synapses: Associated Neural Networks," International Conference on Natural Computation,2008. 3. G. Rama Murthy, Multi-Dimensional Neural Networks: Unified Theory,New Age International Publishers, 2007. 4. J.J Hopfield, "Neural Networks and Physical systems with Emergent Collective Computational Abilities," Proceedings of National Academy of Sciences, USA Vol. 79,pp. 2554-2558, 1982. 5. Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi, "Convolutional Neural Associative Memories: Massive Capacity with Noise Tolerance," arxiv.org 6. Jehoshua Bruck and Joseph W.Goodman, "A Generalized Convergence Theorem for Neural Networks," IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 34, NO. 5, SEPTEMBER 1988