Fixed Weight Competitive Nets: Hamming Net

Similar documents
Neural Networks Based on Competition

Iterative Autoassociative Net: Bidirectional Associative Memory

ADALINE for Pattern Classification

Simple Neural Nets For Pattern Classification

CHAPTER 3. Pattern Association. Neural Networks

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Neural Networks Lecture 7: Self Organizing Maps

John P.F.Sum and Peter K.S.Tam. Hong Kong Polytechnic University, Hung Hom, Kowloon.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Using a Hopfield Network: A Nuts and Bolts Approach

9 Competitive Neural Networks

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

CSC Neural Networks. Perceptron Learning Rule

Instruction Sheet for SOFT COMPUTING LABORATORY (EE 753/1)

Another Algorithm for Computing π Attributable to Archimedes: Avoiding Cancellation Errors

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

EEE 241: Linear Systems

Implementing Competitive Learning in a Quantum System

Data Mining Part 5. Prediction

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

One-Sided Difference Formula for the First Derivative

Backpropagation Neural Net

Lab 5: 16 th April Exercises on Neural Networks

Learning Vector Quantization

Integer weight training by differential evolution algorithms

Institute for Advanced Management Systems Research Department of Information Technologies Åbo Akademi University

CS 188: Artificial Intelligence. Outline

9 Classification. 9.1 Linear Classifiers

Unit III. A Survey of Neural Network Model

Revision: Neural Network

Multiclass Classification-1

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2005

Linear discriminant functions

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Artificial Intelligence

Lecture 11 Linear programming : The Revised Simplex Method

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

INTRODUCTION TO NEURAL NETWORKS

The Hypercube Graph and the Inhibitory Hypercube Network

Lecture 6. Notes on Linear Algebra. Perceptron

CS 5522: Artificial Intelligence II

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

CS 188: Artificial Intelligence Fall 2011

Classification with Perceptrons. Reading:

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Network : Training

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Artificial Intelligence Hopfield Networks

Classic K -means clustering. Classic K -means example (K = 2) Finding the optimal w k. Finding the optimal s n J =

CS 343: Artificial Intelligence

Neural Networks Introduction

Improved Protein Secondary Structure Prediction

Synchronous vs asynchronous behavior of Hopfield's CAM neural net

Artificial Neural Network

COMPARING PERFORMANCE OF NEURAL NETWORKS RECOGNIZING MACHINE GENERATED CHARACTERS

Dark knowledge. Geoffrey Hinton, Oriol Vinyals & Jeff Dean) Google

Neural Networks DWML, /25

Chapter 9: The Perceptron

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

EECS490: Digital Image Processing. Lecture #26

Computational statistics

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio

Machine Learning and Adaptive Systems. Lectures 3 & 4

Artificial Neural Network and Fuzzy Logic

A practical theory for designing very deep convolutional neural networks. Xudong Cao.

CSC321 Lecture 5 Learning in a Single Neuron

Artificial Neural Networks. Edward Gatt

1 Inner Product and Orthogonality

CSC321 Lecture 4 The Perceptron Algorithm

Lecture 3: Pattern Classification

Learning Vector Quantization (LVQ)

On the Hopfield algorithm. Foundations and examples

Simplex tableau CE 377K. April 2, 2015

On Sparse Associative Networks: A Least Squares Formulation

Associative Memory : Soft Computing Course Lecture 21 24, notes, slides RC Chakraborty, Aug.

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Networks and the Back-propagation Algorithm

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CAAM 454/554: Stationary Iterative Methods

The Singular Value Decomposition

Neural Network Based Response Surface Methods a Comparative Study

Classifier Selection. Nicholas Ver Hoeve Craig Martek Ben Gardner

Neural Networks Lecture 2:Single Layer Classifiers

Introduction to Artificial Neural Networks

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Learning and Memory in Neural Networks

Elementary maths for GMT

A Novel Activity Detection Method

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Neural network modelling of reinforced concrete beam shear capacity

Transcription:

POLYTECHNIC UNIVERSITY Department of Computer and Information Science Fixed Weight Competitive Nets: Hamming Net K. Ming Leung Abstract: A fixed weight competitive net known as the Hamming net is discussed. Directory Table of Contents Begin Article Copyright c 2007 mleung@poly.edu Last Revision Date: April 10, 2007

1. Introduction 2. MAXNET 2.1. Example: MAXNET 2.2. Best Choice of ɛ 3. Mexican Hat Network Table of Contents 4. Hamming Net 4.1. Example: Hamming Net

Section 1: Introduction 3 1. Introduction When a net is trained to classify the input signal into one of the output categories, A, B, C, D, E, J, or K, the net sometimes responded that the signal was both a C and a K, or both an E and a K, or both a J and a K, due to similarities in these character pairs. In this case it will be better to include additional structure in the net to force it to make a definitive decision. The mechanism by which this can be accomplished is called competition. The most extreme form of competition among a group of neurons is called Winner-Take-All, where only one neuron (the winner) in the group will have a nonzero output signal when the competition is completed. An example of that is the MAXNET. A more general form of competition useful for contrast enhancement is known as the Mexican Hat. A simple clustering net known as the Hamming net is discussed here. It is based on the use of fixed exemplar vectors and the MAXNET subnet.

Section 2: MAXNET 4 2. MAXNET MAXNET is a neural net based on competition that can be used as a subnet to choose the neuron whose activation is the largest. If a fully neural implementation is not required, one can certainly use any of a multitude of algorithms that find the maximum value of a set of numbers. The MAXNET has M neurons that are fully interconnected, each neuron is connected to every other neuron in the net, including itself. The transfer function used by the neurons is the positive-linear function { x, if x > 0 f poslin (x) = 0, if x 0. There is no training for the MAXNET. The weights are symmetrical, fixed and are given by { 1, if i = j w ij = ɛ, if i j,

Section 2: MAXNET 5 where, according to our textbook, 0 < ɛ < 1 M is a predetermined positive constant. It has to be positive and smaller than 1. The value of its upper bound will be determined later. From the diagonal terms in the weight matrix, we see that each neuron is connected to itself with a positive weight of 1. This represents selfpromotion in a competitive environment. The off-diagonal terms, ɛ, is negative and thus represent inhibition. These terms represent the suppression of the signals of all the other neurons. Self-promoting ones own achievements while suppressing those of ones peers are commonly found in highly competitive sociological environments since every one wants to be a winner. Notice that since ɛ < 1, self-promotion is more important than suppressing the signals of other neurons. The activation of the neurons are specified by a j, j = 1,..., M, and they must all be non-negative for the MAXNET to work. The MAXNET algorithm is:

Section 2: MAXNET 6 1 Choose an ɛ for the weight matrix (but no need to set it up), and initialize the activations a j (0), j = 1,..., M (to non-negative values). 2 For k = 1, 2,... repeat steps a - c while stopping condition is not met. a For each neuron i = 1,..., M, compute the net signal it receives for the next step a in,i (k) = a i (k 1) ɛ j i a j (k 1). b Update the activations for i = 1,..., M a i (k) = f poslin (a in,i (k)). c Test stopping condition: activation is nonzero. Stop only when only one We assume here that in the real world, only one neuron, not two or more neurons, can have the same maximal activation, otherwise

Section 2: MAXNET 7 special precautions must be exercised to handle the situation. Since the activations are all non-negative, it is clear that for all i a in,i (k) a i (k 1), and so as MAXNET iterates, activations of all the neurons decrease. However, the smaller their activations are, the more they decrease fractionally. (The scheme hurts the poor more than the rich.) As the net iterates, neurons with the smallest values of a in,i (k) are driven to negative first. Their transfer functions then yield zero values for their activations. Once the activation is driven to zero, it is clear that it will remain at zero with subsequent iterations. Until eventually the activities of all the neurons except one, the winner, are driven to zero. The activation for the winner then ceases to decrease any further. Given a group of neurons, the MAXNET is often used in competitive nets as a subnet to find the one that has the largest activation. With suitable modifications, it can also be used to determine the minimum of a set of given quantities. The operation of such a subnet is often encapsulated as a vector-valued function y = compet(x),

Section 2: MAXNET 8 where x is the input and y is the corresponding output of the subnet. There should only be one positive element in y, all other element must be identically zero. 2.1. Example: MAXNET The following example shows how the action of the MAXNET for the case of M = 4 neurons with initial activations a 1 (0) = 0.6, a 2 (0) = 0.2, a 3 (0) = 0.8, a 4 (0) = 0.4. We choose ɛ = 0.2. Since 1/M = 0.25, the chosen value for ɛ is within the required range. step k a 1 (k) a 2 (k) a 3 (k) a 4 (k) 0 0.6 0.2 0.8 0.4 1 0.32 0 0.56 0.08 2 0.192 0 0.48 0 3 0.096 0 0.442 0 4 0.008 0 0.422 0 5 0 0 0.421 0 In increasing order of their activations, the neurons are a 2, a 4, a 1,

Section 2: MAXNET 9 and a 3. In the first step, their activation decrease fractionally by 1, 0.8, 0.47, and 0.3, respectively. The scheme hurts the poor much more so than for the rich. They are driven to zero also in this order until only one winner survives. The larger the value of ɛ is, the faster the activations of the losing neurons are driven to zero. For example, if we choose ɛ = 0.3, then we obtain the following results. step k a 1 (k) a 2 (k) a 3 (k) a 4 (k) 0 0.6 0.2 0.8 0.4 1 0.18 0 0.44 0 2 0.048 0 0.386 0 3 0 0 0.372 0 The winner emerges only after three iterations. If ɛ is in [3/7 2/3), then only one iteration is needed to obtain a winner. For ɛ 2/3, no winner can be found since all the activations are driven to zero in one single step. Clearly the question is how large a value can one use for ɛ for fast convergence?

Section 2: MAXNET 10 2.2. Best Choice of ɛ We want to determine a good choice of ɛ for fast convergence.[2] According to our textbook 0 < ɛ < 1 M. We want to see if this is true, and how the upper bound for the value of ɛ comes about. The fastest convergence can be achieved if one can choose an ɛ such that the activations of all neurons except the winning one are driven to zero in one iteration. Since the signal receives by a given neuron, say i, in step k is a in,i (k) = a i (k 1) ɛ j i a j (k 1), and if we knew that neuron l has the largest activation, then by choosing ɛ to be slightly less than a l 1 ɛ max = M j l a = M a j j j l a l

Section 2: MAXNET 11 then in a single iteration, a in,l (k) becomes only slightly larger than zero and therefore so is a l. This means that all the other a in,i (k) becomes negative, and so a i becomes zero because of the transfer function. Of course we do not know which of the neurons has the largest activation, finding that is in fact the problem we are confronted with. So we cannot possibly know how large ɛ max is. We will replace that by a smaller number. As a result we will need to iterate MAXNET a few times before convergence. This smaller number is obtained from the above expression by replacing the denominator with a larger number. Notice that aj a l 1, and if we replace that by 1, the denominator become M 1. Thus we want to choose ɛ = 1 M 1. This value is larger than the one suggested in our textbook, and leads to a slightly faster convergence especially when M is not too large[2].

Section 3: Mexican Hat Network 12 3. Mexican Hat Network Mexican Hat Network is good for the purpose of contrast enhancement. Some common topological arrangements of the neurons include linear and two-dimensional geometries. Parameter R 1 specifies the radius of the region with positive reinforcement. Parameter R 2 (> R 1 ) specifies the radius of the region of interconnections. The weights are determined by { c 1 > 0, for k R 1 w k = c 2 < 0, for R 1 < k R 2. The transfer function used here is defined by 0, if x < 0 f satlin (x) = x, if 0 x x max x max, if x x max which can also be expressed as f satlin (x) = min(x max, max(0, x)) = max(0, min(x max, x)).

Section 3: Mexican Hat Network 13 where x max is fixed positive quantity. The Mexican Hat net has parameters, R 1, R 2, c 1, c 2, x max and the maximum number of iteration, t max. These are all model specific and must be set at the outset. The algorithm is:

Section 3: Mexican Hat Network 14 1 Set parameters, R 1, R 2, c 1, c 2, x max and t max. Set the weights w k = { c 1 > 0, for k R 1 c 2 < 0, for R 1 < k R 2. 2 Let x(0) = s. 3 For t = 1,..., t max repeat steps a - b. a For each neuron i = 1,..., N, compute the net input R 1 R 1 1 x in,i (t) = c 1 x i+k (t 1) + c 2 x i+k (t 1) k= R 1 k= R 2 R 2 +c 2 k=r 1+1 x i+k (t 1) b Apply the transfer function for i = 1,..., N x i (t) = f satlin (x in,i (t)).

Section 4: Hamming Net 15 4. Hamming Net For the Hamming net, M exemplar bipolar vectors e (1), e (2),..., e (M) are given. They are used to form the weight matrix of the network. The Hamming net is therefore a fixed-weight net. For any given input vector x, the Hamming net finds the exemplar vector that is closest to x. Therefore if a collection of input vectors are given, the Hamming net can be used to cluster these vectors into M different groups. We need to define what we mean by saying that two vectors a and b are close to each other. Again we will be working with bipolar vectors and denote the Hamming distance between the vectors (defined to be the number of corresponding bits that are different between the vectors) by H(a, b), and the number of corresponding bits that agrees with each other by A(a, b). It is clear that for bipolar vectors a b = A(a, b) H(a, b), because the corresponding bits are either the same, and so they contribute 1 to the dot-product, or they are different and so contribute 1 to the dot-product. A corresponding pair of bits must either agree

Section 4: Hamming Net 16 or disagree, and so N = A(a, b) + H(a, b). We eliminate the Hamming between these two equations and solve for A(a, b) to get A(a, b) = a b 2 + N 2. The two vectors a and b are close to each other if A(a, b) is large. We will use A(a, b) to measure the closeness of two vectors in the Hamming net. The closest two vectors are to each other the larger is A(a, b). We need to work with positive quantities that are large when two vectors are close to each other because MAXNET will be used as a subnet to find the one exemplar vector that is closest to the given input vector. Therefore we can only work with A(a, b) rather than a b or H(a, b). We want to identify a with an input vector x (assumed to be a row vector of length N), and b with one of the exemplar vectors. With M exemplar vectors, they can be scaled down by a half and stored

Section 4: Hamming Net 17 column-wise to form the weight matrix for the Hamming net. Thus the weight matrix is N M, and its elements are w ij = 1 2 e(j) i, i = 1,..., N, and j = 1,..., M. The bias vector is a row vector of length M and has elements N/2: b j = 1 N, j = 1,..., M. 2 For a given input vector x, the number of bits of agreement that it has with each of the exemplar vector form a vector y in = xw + b, of inputs to the neurons at the Y-layer. These neurons have identity transfer functions and send the signals that they received to a MAXNET to find the one with the largest agreement. The algorithm of the Hamming net is:

Section 4: Hamming Net 18 1 Set the weights and biases w ij = 1 2 e(j) i, i = 1,..., N, and j = 1,..., M, b j = 1 N, j = 1,..., M. 2 2 For a given input vector x, repeat steps a - c. a Compute the input to each unit j = 1,..., M N y in,j = x i w ij + b j. i=1 b Initialize the activations for MAXNET y j (0) = y in,j, for j = 1,..., M. c MAXNET finds the best match exemplar vector for the given input vector.

Section 4: Hamming Net 19 4.1. Example: Hamming Net We consider here an example of a Hamming net that has the following two exemplar vectors: e (1) = [ 1 1 1 1 ], e (2) = [ 1 1 1 1 ]. Notice that these two exemplar vector are orthogonal to each other. We are given the following four vectors x (1) = [ 1 1 1 1 ], x (2) = [ 1 1 1 1 ]. x (3) = [ 1 1 1 1 ], x (4) = [ 1 1 1 1 ]. We will use the Hamming net to cluster them into two groups, one group for each of the exemplar vectors. Thus for each of the input vectors we need to find the exemplar vector that is closest to it.

Section 4: Hamming Net 20 First we set up the weights and biases: 1 2 1 2 1 2 1 W = 2 1 2 1 2 1 2, b = [ 2 2 ]. For x (1), we find that 1 2 y in = x (1) W + b = [ 3 1 ]. With this as input to MAXNET, one finds that the exemplar vector that is the closest to input x (1) is e (1). This makes sense since A(x, e (1) ) = 3 and A(x, e (2) ) = 1. We repeat the procedure for x (2), x (3) and x (4), and find that x (1) and x (2) are closest to e (1), and x (3) and x (4) are closest to e (2). Thus these four input vectors are separated into two separate clusters.

Section 4: Hamming Net 21 References [1] See Chapter 4 in Laurene Fausett, Fundamentals of Neural Networks - Architectures, Algorithms, and Applications, Prentice Hall, 1994. [2] For a faster converging MAXNET, see for example, J.-C. Yen and S. Chang, Improved Winner-Take-All Neural Network, Electronics Letters, Vol.28, No.7 662-664, 1992.. 10, 11