Iterative Autoassociative Net: Bidirectional Associative Memory

Similar documents
CHAPTER 3. Pattern Association. Neural Networks

Fixed Weight Competitive Nets: Hamming Net

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

ADALINE for Pattern Classification

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Artificial Intelligence Hopfield Networks

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Hopfield Neural Network

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Associative Memory : Soft Computing Course Lecture 21 24, notes, slides RC Chakraborty, Aug.

Neural Networks Lecture 6: Associative Memory II

Week 4: Hopfield Network

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

Simple Neural Nets For Pattern Classification

Hopfield Network for Associative Memory

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

On the Hopfield algorithm. Foundations and examples

Lecture 7 Artificial neural networks: Supervised learning

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

The Hypercube Graph and the Inhibitory Hypercube Network

Computational Intelligence Lecture 6: Associative Memory

Learning and Memory in Neural Networks

Associative Memories (I) Hopfield Networks

Unit III. A Survey of Neural Network Model

How can ideas from quantum computing improve or speed up neuromorphic models of computation?

Data Mining Part 5. Prediction

Bidirectional Representation and Backpropagation Learning

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

CprE 281: Digital Logic

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural networks: Unsupervised learning

Lecture 17: Neural Networks and Deep Learning

Integer weight training by differential evolution algorithms

Using a Hopfield Network: A Nuts and Bolts Approach

Binary Convolutional Codes

Neural Networks Based on Competition

CS632 Notes on Relational Query Languages I

Artificial Neural Networks Examination, March 2004

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

SCRAM: Statistically Converging Recurrent Associative Memory

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

Neural Nets and Symbolic Reasoning Hopfield Networks

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Consider the following example of a linear system:

EEE 241: Linear Systems

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017

Memories Associated with Single Neurons and Proximity Matrices

Artificial Neural Network and Fuzzy Logic

One-Sided Difference Formula for the First Derivative

Lecture 3: Error Correcting Codes

Multiple Choice Questions

Terminal attractor optical associative memory with adaptive control parameter

15. NEURAL NETWORKS Introduction to The Chapter Neural Networks: A Primer Basic Terminology and Functions

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer

Introduction to Neural Networks

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Vector, Matrix, and Tensor Derivatives

THE INVERSE PROBLEM FOR DIRECTED CURRENT ELECTRICAL NETWORKS

Neural Networks Introduction

Ma/CS 6b Class 24: Error Correcting Codes

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

Distance in Matrices and Their Applications to Fuzzy Models and Neutrosophic Models. W. B. Vasantha Kandasamy Florentin Smarandache Ilanthenral K

Computers also need devices capable of Storing data and information Performing mathematical operations on such data

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Optimization of the Hamming Code for Error Prone Media

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Lecture 5: Logistic Regression. Neural Networks

Vector spaces. EE 387, Notes 8, Handout #12

Optimization of Quadratic Forms: NP Hard Problems : Neural Networks

Artificial Neural Network

Learning Deep Architectures for AI. Part I - Vijay Chakilam

CS:4420 Artificial Intelligence

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Lecture - 24 Radial Basis Function Networks: Cover s Theorem

k k k 1 Lecture 9: Applying Backpropagation Lecture 9: Applying Backpropagation 3 Lecture 9: Applying Backpropagation

Implementing Competitive Learning in a Quantum System

Another Algorithm for Computing π Attributable to Archimedes: Avoiding Cancellation Errors

Long-Short Term Memory and Other Gated RNNs

Linear Cyclic Codes. Polynomial Word 1 + x + x x 4 + x 5 + x x + x

A mystery of transpose of matrix multiplication Why (AB) T = B T A T?

Lecture 15: Exploding and Vanishing Gradients

2. Network flows. Balance of flow: At each node, demand plus total shipments out must equal supply plus total shipments in. Reno. Denver.

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Answers in blue. If you have questions or spot an error, let me know. 1. Find all matrices that commute with A =. 4 3

Transcription:

POLYTECHNIC UNIVERSITY Department of Computer and Information Science Iterative Autoassociative Net: Bidirectional Associative Memory K. Ming Leung Abstract: Iterative associative neural networks are introduced. In particular the Bidirectional Associative Memory net is discussed. Directory Table of Contents Begin Article Copyright c 2000 mleung@poly.edu Last Revision Date: March 21, 2006

Table of Contents 1. Motivation 2. Bidirectional Associative Memory (BAM) 2.1. BAM to Associate Letters with Simple Bipolar codes 2.2. BAM with Noisy Input 2.3. BAM with Biases 3. Remarks 3.1. Hamming Distance and Correlation Encoding 3.2. Erasing a Stored Association 3.3. Updating Strategies 3.4. Convergence of a BAM and a Lyapunov Function 3.5. Storage Capacity

Section 1: Motivation 3 1. Motivation After storing a given pattern, an autoassociative net may still not be able to respond immediately to an input signal with a stored pattern, but the response may be close to a stored pattern to suggest its use as input to the net again. In a sense, we are connecting the output units back to the input units. This results in a recurrent autoassociative net. To illustrate the basic idea behind such a recurrent net, we consider the following simple example. [ ] Suppose we want to store a single pattern: 1 1 1 1. Hebb rule gives the weight matrix: W = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 or 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 if the NN has no self connections. We first consider the case where the NN has no self connections. The original pattern is of course stored successfully by the rule. Now Toc Back Doc Doc

Section 1: Motivation 4 suppose we input a pattern with 3 missing values, like [ 1 0 0 0 ], we find y in = [ 1 0 0 0 ] 0 1 1 1 1 0 1 1 1 1 0 1 = [ 0 1 1 1 ] 1 1 1 0 and so y = [ 0 1 1 1 ], which is close to but not the same as the stored pattern [ 1 1 1 1 ]. Then we feed this output pattern into the NN as input. This gives y in = [ 0 1 1 1 ] 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 = [ 3 2 2 2 ] and so y = [ 1 1 1 1 ], which is precisely the stored pattern. On the other hand, if the NN is allowed to have self connections, it is easy to see that inputs [ 1 0 0 0 ], [ 0 1 0 0 ], and [ 0 0 1 0 ] all produce the same correct output immediately.

Section 2: Bidirectional Associative Memory (BAM) 5 However, the input [ 0 0 0 1 ] converges to the negative of the stored pattern [ 1 1 1 1 ] in one step. So the NN behaves better if self connections are eliminated. 2. Bidirectional Associative Memory (BAM) Bidirectional associative memory (BAM), developed by Kosko in 1988 and 1992, has the following features: 1. Its architecture consists of two layers: an X layer containing N neurons and a Y layer containing M neurons. 2. The transfer function is taken to be 1 if y in,j > 0, f BAM (y in,j ) = previous y j if y in,j = 0, 1 if y in,j < 0. Notice that if the net input into a neuron is exactly 0, then its activation is kept the same as its previous value.

Section 2: Bidirectional Associative Memory (BAM) 6 3. A set of input and target patterns s (q) : t (q), q = 1, 2,..., Q is given, where the input vectors s (q) have N components, [ ] s (q) = s (q) 1 s (q) 2... s (q), N and their targets t (q) have M components, [ ] t (q) = t (q) 1 t (q) 2... t (q). M 4. These patterns are stored using Hebb rule to give an N M weight matrix, W, so that Q w ij = s (q) i t (q) j. q=1 Biases are not used at all. 5. Inputs can be applied to either layers. 6. Signals are sent back and forth between the 2 layers until all neurons reach equilibrium (all activations remain the same for 2 successive steps). Toc Back Doc Doc

Section 2: Bidirectional Associative Memory (BAM) 7 7. Hebb rule gives a weight matrix W whose elements are w ij = Q q=1 s(q) i t (q) j. W is used as the weight matrix for signals sent from the X to the Y layer, and W T is used as the weight matrix for signals sent from the Y to the X layer. The BAM algorithm is:

Section 2: Bidirectional Associative Memory (BAM) 8 1 Use Hebb rule to store a set of Q associated vector pairs. 2 For a given input test vector, x do steps a - c. a Set activations of the X layer to the input test pattern. b Set activations of the Y layer to 0. c While activations have not converged do steps i - iii i Update Y-layer activations N y in,j = x i w ij y j = f BAM (y in,j ). i=1 ii Update X-layer activations M x in,i = w ij y j x i = f BAM (x in,i ). j=1 iii Test for convergence of activations in both layers.

Section 2: Bidirectional Associative Memory (BAM) 9 The above algorithm assumes that the test pattern is input into the X-layer. We can also input a test pattern into the Y-layer. In that case we need to exchange the roles of the X- and the Y-layers. If we adopt matrix notation and represent all input vectors as a row vector ( a 1 N matrix), then the activation of the Y-layer is written as y in = xw, since W is an N M matrix. On the other hand, the activation of the X-layer is written as x in = yw T. 2.1. BAM to Associate Letters with Simple Bipolar codes We want to use BAM to associate the letters, A and C, with simple bipolar codes:.#. #.# ### #.# #.# with t (1) = [ 1 1 ] and Toc Back Doc Doc

Section 2: Bidirectional Associative Memory (BAM) 10.## #.. #.. #...## with t (2) = [ 1 1 ]. For each of the 2 letters, we replace. with 1 and # with 1, and concatenate the bits row-wise to form a bipolar row vector. Thus we have s (1) = [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ] s (2) = [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ] Notice that these 2 vectors have a square magnitude of N = 15 and are as uncorrelated as can be since s (1) s (2) = 1 and N is odd. Hebb rule gives the weight matrix W = s (1)T [ 1 1 ] +s (2)T [ 1 1 ] = [ s (2)T s (1)T s (2)T + s (1)T ]. In general, given 2 bipolar vectors s (1) and s (2), s (2) + s (1) is nonbipolar vector whose elements are given by 0 if the corresponding

Section 2: Bidirectional Associative Memory (BAM) 11 elements in s (1) and s (2) are different, and by twice the value in s (2) if the corresponding elements are the same. Also s (2) s (1) is nonbipolar vector whose elements are given by 0 if the corresponding elements in s (1) and s (2) are the same, and by twice the value in s (2) if the corresponding elements are different.

Section 2: Bidirectional Associative Memory (BAM) 12 Explicitly we have 0 2 0 2 2 0 0 2 0 2 2 0 0 2 W = 2 0 2 0 0 2 0 2 2 0 2 0 2 0 0 2 First we check to see if the NN gives the correct output at the Y-layer if pattern A or C is input to the X-layer. We find that with

Section 2: Bidirectional Associative Memory (BAM) 13 x = s (1) y in = s (1) W = [ s (1) (s (2)T s (1)T ) s (1) (s (2)T + s (1)T ) ] = [ 1 15 15 + 1 ] = [ 14 16 ] y = [ 1 1 ], and with x = s (2) x in = s (2) W = [ s (2) (s (2)T s (1)T ) s (2) (s (2)T + s (1)T ) ] = [ 15 1 1 + 15 ] = [ 14 16 ] y = [ 1 1 ]. These are the correct results at the Y-layer. Notice that the above results are independent of the actual initial activations in the Y-layer, since none of the net input signal happens to be 0. In the opposite direction, we want to input the targets at the Y- layer and see if the correct responses are obtained at the X-layer. We find that with y = t (1) x in = t (1) W T = [ 1 1 ] [ s (2) s (1) s (2) + s (1) = s (2) + s (1) + s (1) + s (2) = 2s (1) x = s (1), ]

Section 2: Bidirectional Associative Memory (BAM) 14 and with y = t (2) x in = t (2) W T = [ 1 1 ] [ s (2) s (1) s (2) + s (1) = s (2) s (1) + s (1) + s (2) = 2s (2) x = s (2). These are the correct activations at the X-layer. Notice that the above results are independent of the actual initial activations in the X-layer since none of the net input signal happens to be 0. In either of the cases above, iteration converges after just one back and forth passage of signals. 2.2. BAM with Noisy Input Next let us test the BAM with noisy input. Suppose the activation on Y is [ 0 1 ], and nothing is known about the activation on X so we put x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]. ]

Section 2: Bidirectional Associative Memory (BAM) 15 We find that x in = [ 0 1 ] W T = [ 0 1 ] [ s (2) s (1) s (2) + s (1) ] = s (2) + s (1). From what we said about the sum of any 2 bipolar vectors, we see that the activation on X is x = f BAM (x in ) = 1 ( s (2) + s (1)). 2 This signal has to be sent back to the Y-layer to get y in = 1 (s (2) + s (1)) [ s (2)T s (1)T s (2)T + s ] (1)T 2 = 1 2 This gives y = [ 0 1 ], [ 0 s (2) s (2)T + s (1) s (1)T + 2s (2) s (1)T ] = [ 0 N + 1 ]. which is exactly the input on Y that we started with. So the BAM has converged, but sadly to a spurious stable state. This is certainly not a surprising result at all since with knowledge of only the second

Section 2: Bidirectional Associative Memory (BAM) 16 element of y, there is no way to expect the NN (or humans) to tell which of the 2 characters we are referring to. So far we have followed closely the textbook. However there is a very troubling result of this NN that even with knowledge of the first element of y, this NN still cannot tell which of the 2 characters we are referring to. We humans should have no problem with that at all. To see that we repeat the above calculation, however this time with the activation on Y given by [ 1 0 ] We find that x in = [ 1 0 ] W T = [ 1 0 ] [ s (2) s (1) s (2) + s (1) ] = s (2) s (1). From what we said about the difference of any 2 bipolar vectors, we see that the activation on X is x = f BAM (x in ) = 1 ( s (2) s (1)). 2

Section 2: Bidirectional Associative Memory (BAM) 17 This signal has to be sent back to the Y-layer to get y in = 1 (s (2) s (1)) [ s (2)T s (1)T s (2)T + s ] (1)T 2 = 1 [ s (2) s (2)T + s (1) s (1)T 2s (2) s (1)T 0 ] = [ N 1 0 ]. 2 This gives y = [ 1 0 ], which is exactly the input on Y that we started with. So with this input, the BAM has again converged, but sadly to a spurious stable state. However given that the first element of y is 1 without any information on the second element, we should know that we are referring to character C and not A. There is something about this NN that is not satisfactory at all. The main problem is that biases were not considered. It turns out that they are important for its proper operation.

Section 2: Bidirectional Associative Memory (BAM) 18 2.3. BAM with Biases As before Hebb rule gives a weight matrix W whose elements are w ij = Q q=1 s(q) i t (q) j with i = 1, 2,..., N and j = 1, 2,..., M. W is used as the weight matrix for signals sent from the X to the Y layer, and W T is used as the weight matrix for signals sent from the Y to the X layer. The biases for the Y-layer are given by b (Y ) j = Q q=1 t(q) j, with j = 1, 2,..., M. Since signals are sent back to the X-layer, there are biases for the X-layers. These are given by b (X) i = Q q=1 s(q) i, with i = 1, 2,..., N. The algorithm for the BAM with biases is:

Section 2: Bidirectional Associative Memory (BAM) 19 1 Use Hebb rule to store a set of Q associated vector pairs. 2 For a given input test vector, x do steps a - c. a Set activations of the X layer to the input test pattern. b Set activations of the Y layer to 0. c While activations have not converged do steps i - iii i Update Y-layer activations y in,j = N i=1 x i w ij + b (Y ) j y j = f BAM (y in,j ). ii Update X-layer activations M x in,i = w ij y j + b (X) i x i = f BAM (x in,i ). j=1 iii Test for convergence of activations in both layers. Again the above algorithm assumes that the test pattern is input

Section 2: Bidirectional Associative Memory (BAM) 20 into the X-layer. We can also input a test pattern into the Y-layer. In that case we need to exchange the roles of the X- and the Y-layers. We now re-visit the character association example but this time with biases. We find that b (Y ) = [ 0 2 ], b (X) = s (2) + s (1). Again we first check to see if the NN gives the correct output at the Y-layer if pattern A or C is input to the X-layer. We find that with x = s (1) y in = s (1) W + b (Y ) = [ 14 16 ] + [ 0 2 ] = [ 14 18 ] and therefore y = [ 1 1 ], and with x = s (2) y in = s (2) W + b (Y ) = [ 14 16 ] + [ 0 2 ] = [ 14 18 ] and therefore y = [ 1 1 ]. These are the intended associations. Notice that the above results are independent of the actual initial activations in the Y-layer, since none of the net input signal happens to be 0.

Section 2: Bidirectional Associative Memory (BAM) 21 In the opposite direction, we want to input the targets at the Y- layer and see if the correct responses are obtained at the X-layer. We find that with y = t (1) x in = t (1) W T + b (X) = [ 1 1 ] [ ] s (2) s (1) s (2) + s (1) + s (2) + s (1) = 3s (1) + s (2) and so x = s (1). Similarly with y = t (2) x in = t (2) W T + b (X) = [ 1 1 ] [ s (2) s (1) s (2) + s (1) ] + s (2) + s (1) = 3s (2) + s (1) and so x = s (2). These are the correct activations at the X-layer. Notice that the above results are independent of the actual initial activations in the X-layer since none of the net input signal happens to be 0. In either of the cases above, iteration converges after just one back and forth passage of signals.

Section 2: Bidirectional Associative Memory (BAM) 22 Next we test the BAM with noisy input at the Y-layer with activation [ 0 1 ], and no knowledge about the activations on X and so they are set to 0. We find that x in = [ 0 1 ] W T + b (X) = [ 0 1 ] [ ] s (2) s (1) s (2) + s (1) + s (2) + s (1) ( = 2 s (2) + s (1)). The activation on X is therefore given by x = f BAM (x in ) = 1 ( s (2) + s (1)). 2 This signal has to be sent back to the Y-layer to get y in = 1 ( s (2) + s (1)) [ s (2)T s (1)T s (2)T + s ] (1)T + b (Y ) 2 = 1 [ 0 s (2) s (2)T + s (1) s (1)T + 2s (2) s ] (1)T + [ 0 2 ] 2 = [ 0 N + 3 ]. This gives y = [ 0 1 ], Toc Back Doc Doc

Section 2: Bidirectional Associative Memory (BAM) 23 which is exactly the input on Y that we started with. So the BAM has converged, but sadly to a spurious stable state. This is certainly not a surprising result at all since with knowledge of only the second element of y, there is no way to expect the NN (or humans) to tell which of the 2 characters we are referring to. However, if we start off with appropriate partial knowledge of the activation at the X-layer, the intended association can be made. For example with noisy input at the Y-layer with activation [ 0 1 ], and an initial activation at the X-layer x = [ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ], the output at the X-layer is then given by x = [ 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 ], which is neither s (1) or s (2). However with this as the input to the X-layer, the output at the Y-layer is [ 1 1 ]. With this as the input to the Y-layer, the output at the X-layer is then s (1). This in turn produces at the Y-layer the output [ 1 1 ]. The iteration clearly converges to [ 1 1 ] at the X-layer and to t (1) at the Y-layer. Toc Back Doc Doc

Section 2: Bidirectional Associative Memory (BAM) 24 Are the results reasonable? From the initial input at the Y-layer we cannot tell if one is referring to an A or a C. However, the input vectors s (1) and s (2) differ in the 3rd, 6th, 8th, 9th, 12th, 13th and 14th positions. Given that the only other given information is that the 6th neuron at the X-layer is initial on, strongly suggests that it is the s (1) : t (1) association that we want, and this is indeed what we obtain. Next we repeat the above calculation with the activation on Y given by [ 1 0 ]. This is the one where the previous BAM without biases failed to produce a reasonable output. Now we find that x in = [ 1 0 ] W T +b (X) = [ 1 0 ] [ s (2) s (1) s (2) + s (1) We see that the activation on X is x = f BAM (x in ) = s (2). ] +s (2) +s (1) = 2s (2). This signal has to be sent back to the Y-layer. However for this input at the X-layer, we already know that the activation at the Y-layer is y = [ 1 1 ], Toc Back Doc Doc

Section 3: Remarks 25 which is exactly t (2). So the BAM has converged, and this time to the correct result, since given that the first element of y is 1, but without any information on the second element, we should know that we are referring to character C and not A. Our new BAM with biases perform well, thus showing the importance of including biases in the net. 3. Remarks 3.1. Hamming Distance and Correlation Encoding For any 2 binary or bipolar vectors, x (1) and x (2), each of length N, the Hamming distance between the vectors is defined to be the number of corresponding bits that are different between the vectors, and it is denoted by H(x (1), x (2) ). It is related to the dot-product between these vectors. Let us define A(x (1), x (2) ) to be the number of corresponding bits in x (1) and x (2) that agrees with each other. Then it is clear that for bipolar vectors x (1) x (2) = A(x (1), x (2) ) H(x (1), x (2) ),

Section 3: Remarks 26 because the corresponding bits are either the same, and so they contribute 1 to the dot-product, or they are different and so contribute a 1. A corresponding pair of bits must either agree or disagree, and so N = A(x (1), x (2) ) + H(x (1), x (2) ). Eliminating A(x (1), x (2) ) from these 2 expressions, and solving for H(x (1), x (2) ) gives the normalized Hamming distance (our textbook calls it the averaged Hamming distance) 1 N H(x(1), x (2) ) = 1 2 (1 1N x(1) x (2) ). The dot-product of 2 bipolar vectors ranges from N for 2 antiparallel vectors to N for 2 identical vectors. The dot-product is 0 if the 2 vector are orthogonal to each other. The normalized Hamming distance between 2 bipolar vectors ranges from 0 for 2 identical vectors to 1 for 2 anti-parallel vectors. It is equal to one half if the 2 vectors are orthogonal to each other. In our example above, since s (1) s (2) = 1, and t (1) t (2) = 0, we

Section 3: Remarks 27 have 1 N H(s(1), s (2) ) = 1 ( 1 1 ) = 7 2 15 15. 1 N H(t(1), t (2) ) = 1 2. These 2 normalized Hamming distances do not differ much from each other. There is the idea of correlation encoding where a pair of input patterns that are separated by a small Hamming distance are mapped to a pair of output pattern that are also so separated, while a pair of input patterns that are separated by a large Hamming distance are mapped to a pair of output pattern that are also largely separated (dissimilar). 3.2. Erasing a Stored Association The complement of a bipolar vector, x, denoted by x c, is a vector formed from x by flipping all the bits. It is clear that storing pattern pair s c : t c is the same as storing the pair s : t since Hebb rule gives Toc Back Doc Doc

Section 3: Remarks 28 exactly the same weight matrix. To erase a given association s : t, we need to store the pair s c : t or s : t c. 3.3. Updating Strategies Several strategies may be used to update the activations in a NN. The algorithm used here for the BAM uses a synchronous updating procedure, where all neurons within a layer update their activations simultaneously. Updating may also be asynchronous (only one neuron updates its activation at each step of the iteration) or subset asynchronous (a group of neurons updates all its member s activations at each stage). 3.4. Convergence of a BAM and a Lyapunov Function The convergence of a BAM can be proved by introducing an energy or a Lyapunov function. A Lyapunov function must be a scalar function that is bounded below, and it value cannot increase with every step of the BAM algorithm. An appropriate choice for a Lyapunov function Toc Back Doc Doc

Section 3: Remarks 29 is the average signal energy for a forward and a backward pass: L = 1 ( xwy T + yw T x T ) 2 Since xwy T and yw T x T are scalars that are the of each other, they must be identical. Thus we have N M L = xwy T = x i w ij y j. i=1 j=1 For binary or bipolar neurons, L is clearly bounded below by N M L < w ij. i=1 j=1 It can be proved that this Lyapunov function decreases as the net iterates, and therefore the iteration converges, for either synchronous or subgroup updates.

Section 3: Remarks 30 3.5. Storage Capacity It can be shown that the upper bound on the memory of the present BAM is min(n, M). However, there was good evidence based on a combination of heuristics and exhaustive search that the storage capacity can be extended to min(2 N, 2 M ) if an appropriate nonzero threshold values is chosen for each neuron.

Section 3: Remarks 31 References [1] See Chapter 3 in Laurene Fausett, Fundamentals of Neural Networks - Architectures, Algorithms, and Applications, Prentice Hall, 1994.