Neural Network Essentials 1

Size: px
Start display at page:

Download "Neural Network Essentials 1"

Transcription

1 Neural Network Essentials Draft: Associative Memory Chapter Anthony S. Maida University of Louisiana at Lafayette, USA June 6, 205 Copyright c by Anthony S. Maida

2 Contents 0. Associative memory Linear threshold networks Notation Hypercubes and Hamming distance Improving symmetry Threshold functions Building an auto-associative memory Preliminaries Representing a pattern Fully connected network Notation Hologram metaphor Operation of an auto-associative memory Storing patterns Retrieving a stored pattern A Hopfield memory module Noiseless retrieval dynamics Synchronous update Asynchronous update Attractors Energy function Performance of attractor memories Limitations of the method Hetero-associative memory and central pattern generators Building a hetero-associative memory Limit cycles Pattern generators Significance of Hopfield networks Further reading References

3 CONTENTS 0.. ASSOCIATIVE MEMORY 0. Associative memory Association in humans refers to the phenomenon of one thought causing us to think of another. The phenomenon of association is partly responsible for the structure of our stream of consciousness and has been pondered both by philosophers, such as Aristotle and Locke, and by psychologists, such as Ebbinghaus and William James. This chapter presents mathematical tools which can be used to model distributed associative memory. We will conceptualize a thought as some kind of semantic item. Any concept you are capable of thinking of or conceiving is an example of a semantic item. In distributed memories, the semantic items are known as patterns and for the simple memories of this chapter, they will be represented as binary feature vectors. The term distributed refers to the way the memory is built. Distributed models are designed so that the information corresponding to a stored pattern or semantic item is distrubuted throughout the neural units in the memory. Chapter 5 will discuss another type of memory model in which information for a particular semantic item is localized to a small number of units. There are two main classifications of associative memories: auto-associative memories and hetero-associative memories. These models assume that an active thought in a neural network is represented by the instantaneous firing rate of the neurons in the network, sometimes called the network state vector. Hetero-associative memories provide a model of what we normally view as a stream-of-consciousness association, how one thought sometimes reminds us of another. Auto-associative memory addresses the question of how thoughts can be reconstructive. Auto-associative memories also address the question of how a thought can be momentarily stable when it is realized in a network of independently firing neurons. This section begins by describing a type of dynamic auto-associative memory known as a Hopfield network. Hopfield (982) [8] showed that a type of associative memory had an identical analog to a magnetic system, enabling theoretical analyses of neural systems which were previously impossible. The behavior of an auto-associative memory is similar to that of a spell checking program. It offers the most likely correction to a presented pattern. The memory stores a large number of patterns. When you submit a degraded version of one of the stored patterns to the memory, it goes into a retrieval state that represents the most closely matching stored pattern. Many people feel that human memory is reconstructive. When one sees part of a pattern, one often thinks of the whole. Hopfield networks offer a concrete example of a brain-like system that operates as a reconstructive memory. Hopfield networks illustrate how a set of neurons can collectively store information. These networks are not the first mathematical models of associative memory, nor do they have any direct biological realism. Their importance is that they are among the simplest models, are very well understood, and suggest how to approach the problem of memory storage in the brain. 0.. Linear threshold networks There is not a generally accepted name for the simplest classes of model neurons. We will simply call them units. Schematically, units always have the structure shown in Figure??. 3 Copyright c A. S. Maida

4 0.. ASSOCIATIVE MEMORY CONTENTS pre-synaptic terminals weighted sum and threshold axon w sum Σ h difference unit step function Ψ a soma w n Threshold Θ axons of presynaptic neurons Figure : Idealized neuron (left) and LTU used as a mathematical model (right). The oldest type of unit going back to McCulloch and Pitts (943) [0] is the linear threshold unit (LTU). The LTU uses the unit step function as its activation function. The name comes from the fact that the unit is active if its linearly weighted sum of inputs reaches a threshold. Hopfield networks are made of LTU s, occassionally called McCulloch Pitts neurons, after their inventors. LTU s model drastically simplified biological neurons as shown in Figure. We often use the term unit rather than neuron to avoid any misunderstandings about claiming biological realism. Networks that use LTU s may be inspired by biological neurons but the modeling itself is divorced from brain structure and the biological implications of the models are very theoretical. Nevertheless, these models are interesting metaphors for brain function. Notation For the moment, assume that the units under discussion are embedded in a network having a synchronized, global clock. Assume further that the units are always in either the on or the off state, represented (for the moment) by the binary activation vlues of 0 and. The formula below, adapted from Hertz et al. 99, describes the LTU shown in Figure. It states that the activation state of a neuron is a deterministic function of the activation states of the incoming neurons at the previous time step. s i (t + ) = Ψ(h i (t) Θ i ) () The expression s i (t + ) denotes the activation state of unit i at time t +. The state of unit i has value if its net input at the previous time step, denoted h i (t) is greater than its threshold Θ i ( theta ). The symbol Ψ( ) (spelled psi, pronounced sigh ) denotes the activation function (unit step function for now) which has value one if its input is greater than or equal to zero and has value zero otherwise. The net input of unit i is determined by There will be several changes shortly. We won t necessarily assume a global clock, units will take on the values of - and +, and Ψ( ) will denote an asymmetric sign function, rather than the unit step function. 4 Copyright c A. S. Maida

5 CONTENTS 0.. ASSOCIATIVE MEMORY the activation of the n units connected to i (along with their associated (synaptic) connection weights, w) at the previous time step, t. Net input, denoted by h, is defined below. n h i (t) w ij s j (t) = w i, s (t) + w i,2 s 2 (t) + + w i,n s n (t) (2) j= The net input is defined to be the linear sum of activations of incoming units at time t weighted by the strengths of the corresponding synaptic connections w. The notation w ij denotes the strength of the connection from unit j to unit i. The subscripts in this notation may seem backwards but there are notational reasons to use this convention. First, when w ij is embedded in a summation formula, the iteration is over the rightmost subscript. Second, when the w ij are embedded in a weight matrix, the ith row of the matrix denotes the j incoming connection weights to unit i. We have interpreted t as referring to the tth clock tick. This presumes that there is a reference global clock. The expression below describes an update step for unit i without referring to a global clock. n s i = Ψ w ij s j (3) j= s i refers to the next state of unit i, and the equation describes how the next state of unit i is determined by the previous state. In this example, we also assume that the threshold for unit i, Θ i, is 0. Hypercubes and Hamming distance Consider the state of a network of LTU s having fixed connection weights (no learning). The only thing that varies in such a network are the activation values of the individual units, so we can describe the state by describing the activations of the individual units. It is geometrically useful to view the set of possible states of such a network as corresponding to the corners of a hypercube. For example, if you have a network of three units, then the set of possible states for the network as a whole corresponds to the set of eight (2 3 ) possible bit strings of length three. Each bit string specifies the Cartesian coordinates of one of the corners of a cube in three dimensional space. If you have two distinct bit strings a and b, each of length n, and you want to judge their differences, it is natural to ask how many of the corresponding bits in the strings do not match. The number of mismatches is the Hamming distance between the two strings. If the bits in the strings are interpreted as coordinates, they will map the strings into two corners of an n-dimensional hypercube. If you wanted to travel from the corner for string a to the corner for string b by walking along the edges of the hypercube, the appropriate distance measure between the two points would be the city-block (as opposed to Euclidean) distance metric. It happens that the Hamming distance between two binary strings is the city-block distance between their coordinates in n-dimensional space. Let us return to thinking about the state of a linear threshold network. Suppose that we are interested in two distinct states that the network could be in, and that we want to judge 5 Copyright c A. S. Maida

6 0.. ASSOCIATIVE MEMORY CONTENTS the distance between these states. For example, we might be interested in the distance between the current state of the system, state a, and the retrieval state, state b. We want to judge the distance from the current state to the retrieval state. When a unit in the network changes state, it corresponds to traversing an edge of the hypercube. For the network to make a transition from state a to state b each of the units in the network that mismatch state b must make a state transition. The number of state transitions is the Hamming distance. Improving symmetry The hypercube specified by bit strings of length N has one corner at the origin. We can move the center of the hypercube to the origin if we use activation values of and + instead of 0 and. With this modification, all of the corners of the hypercube have the same distance from the origin. There is a trade-off however. Now the city-block distance between two corners is twice the Hamming distance. State transitions of units in the network, however, still correspond to walking along the edges of the hypercube. As a rule, the more symmetry there is, the easier the system is to analyze. Threshold functions The two most common threshold functions are the Heaviside-unitstep function and the sgn (pronounced sign-num ) function. The unit step function is defined below. { 0 x < 0 H(x) (4) x 0 This definition is known as a piecewise definition and should be read if x < 0 then H(x)=0; if x 0 then H(x) =. The sgn function is also defined using a piecewise definition. sgn(x) { x < 0 + x 0 (5) The two functions are related by the following formula. sgn(x) = 2H(x) (6) The two functions are used to give binary-valued units activation functions and, since they change values discretely, are sometimes classified as hard non-linearities. If the binary units take on the values 0 and, then you would use the unit step function. If they take on the values and +, then you would use the sgn function Building an auto-associative memory Preliminaries Representing a pattern Normally, we think of a visual pattern, say on a biological or artificial retina, as being two dimensional. The appropriate data structure to represent such a pattern would also be two dimensional, such as a matrix if the shape of the retina were rectilinear. This approach was taken in the chapter on vision computations. If it is not 6 Copyright c A. S. Maida

7 CONTENTS 0.. ASSOCIATIVE MEMORY Figure 2: Fully connected network consisting of three units. Each neuron has exactly one connection to every other neuron. The network is necessarily recurrent. Connections from units to themselves are symmetric but other connections need not be symmetric. necessary to preserve neighborhood information (such as which points are close to which others) on the retina, then we can represent the pattern as a vector. A Hopfield network does not preserve or encode locality information. Although units in the network could represent points in an input pattern, the connection strength between any two units would be independent of how close the points were on the retina. Consequently, the input pattern can be represented as a vector, sometimes called a pattern vector or feature vector, without losing information that the Hopfield network would need. Fully connected network A fully connected neural network is one in which each neuron receives input from every other neuron, including itself as shown in Figure 2. Let s suppose you have a vector of neurons and you are interested in describing the connection weights, strengths, or synaptic efficacies, between the neurons. The natural way to do this is to use a matrix, called a connection matrix, such as the one shown below. W = w, w,2 w,n = w 2, w 2,2 w 2,n w n, w n,2 w n,n This matrix, W, describes a network consisting of n = 4 neurons. Let s assume the neurons, or units, are labeled u, u2, u3, and u4. The first row of W describes the incoming connection strengths for unit u, the second row describes the incoming connection strengths for unit u2, and so forth. The zeros along the diagonal indicate that the connection strength of each unit to itself is zero. This is a way of saying that there is no connection while still using the framework of a fully connected network. In fact, any connection topology is a special case of a fully connected network if you use the convention that absence of a connection is represented by a connection strength of zero. The matrix is symmetric, meaning that the transpose of the matrix is equal to itself, that is W = W T. (7) 7 Copyright c A. S. Maida

8 0.. ASSOCIATIVE MEMORY CONTENTS The interpretation is that for any two units, a and b, the connection strength from unit a to unit b is the same as that from unit b to unit a. Hopfield networks require that the connection strengths be symmetric and that the weights along the diagonals are non-negative. This assumption of symmetric connnection weights is a step away from biological reality that we will return to later. Notation A common convention (cf., [7]) is to represent a stored pattern using the symbol ξ (spelled xi, pronounced ksee ) and an input pattern with the symbol ζ (zeta). We will also use S to refer to the instantaneous activations of the units, sometimes called the state vector. The set of possible values of the state vector forms the configuration space and each point in the configuration space falls on the corner of a hypercube and is equidistant from the origin. All are vectors having n components, where n is the number of units. We want these vectors to be compatible with matrix notation as is the convention in linear systems (see Appendix), so we will write them as column vectors, shown below. ξ ζ S ξ. ξ n ζ. ζ n s. s n = (ξ,..., ξ n ) T (8) = (ζ,..., ζ n ) T (9) = (s,..., s n ) T (0) Since there may be many stored patterns in the network, one may use a superscript µ as in ξ µ. This vector refers to pattern µ and can be decomposed into components, as shown below. ξ µ (ξ µ,..., ξµ n) T () Thus ξ µ i refers to the i-th component of pattern µ. Hologram metaphor Karl Pribram suggested in the 960 s [2] that the brain s memory system is like a hologram because. it was content addressable (or reconstructive); that is, a part of a pattern could be used to access the whole pattern. 2. it was distributed; the information for a stored pattern was not localized in one part of the network. This was viewed as desirable at the time because of Lashley s principle of mass action. 8 Copyright c A. S. Maida

9 CONTENTS 0.. ASSOCIATIVE MEMORY 3. it obeyed the principle of superposition; the same neurons could be reused to represent more than one pattern. Except for the brain, the hologram was the only known physical device which had these properties at the time. It was of great interest to understand how holograms worked as a source of clues to understand the operation of the brain. Hopfield networks are another concrete example of a mechanism which realizes this metaphor. The underlying mathematics of the basic Hopfield network are more accessable than that needed to understand a hologram and the structure of a Hopfield network more closely resembles the structure of the brain than does a hologram. Operation of an auto-associative memory Storing patterns Pattern vectors are stored in a Hopfield network by using a method known as correlation recording. Correlation recording sets the connection weights between the relevant units. For each matrix element w ij, we set the connection weight to the product of the desired activations of the units ξ i ξ j that are linked together by the synapse connecting unit j to unit i (recall that w ij is the connection from unit j to unit i). This recipe is known as the generalized Hebbian learning rule. Suppose we have a network consisting of four units and wish to store the two patterns below into the network. ξ = (+,,, +) T (2) ξ 2 = (+, +,, ) T (3) To store pattern ξ, we compute the outer product ξ ( ξ ) T. The outer product is a concise way of specifying the Hebbian rule. This is shown in formula (4). ξ ξ ξ ξ2 ξ ξ n W = ξ ( ξ ) T ξ2 ξ ξ2 ξ2 ξ2 ξn = = (4) ξn ξ ξn ξ2 ξn ξn Since ξ i ξ j = ξ j ξ i, the matrix is always symmetric. Since ξ i ξ i =, the weights on the diagnonals are one. Consequently, the weights on the diagonals do not encode information about the specific stored pattern and no information about the pattern is lost if one sets these weights to zero. Hopfield networks sometimes work better if they do not have self-coupling terms (connections strengths of units to themselves is zero). The matrix in (4) stores the single pattern ξ. What about storing both patterns ξ and ξ 2 at the same time? To determine the proper connection weights for encoding these patterns, take the outer product of each pattern and then add the resulting matrices together. The general formula for correlation recording of multiple patterns is the following, where p is the number of patterns. p W = ξ µ ( ξ µ ) T (5) µ= 9 Copyright c A. S. Maida

10 0.. ASSOCIATIVE MEMORY CONTENTS When one studies this formula, one sees that each w ij equals p µ= ξµ i ξµ j. A worked out example for pattern vectors ξ and ξ 2 is shown below. Notice that the resulting connection matrices are symmetric and have non-negative weights along the diagonals. W = ξ ( ξ ) T + ξ 2 ( ξ 2 ) T = + = This example illustratrates the distributed and superposition characteristics of the hologram metaphor. The connection matrix produced by the outer product operation causes each unit to participate in the representation of each bit in a pattern. Adding the two connection matrices together corresponds to superimposing the memories onto the same substrate. It is natural to wonder whether the two stored patterns will interfer with one another if they are superimposed. The answer is that they will interfer with each other if the patterns are highly correlated or if the capacity of the memory is too small. The four-unit memory in this example is very small and does not have the capacity to represent two different patterns, so they will interfer with each other. Retrieving a stored pattern Let s suppose that we have stored the single pattern, ξ, from formula (2), in the memory and therefore the connection matrix, W, is that shown in equation (4). If we set the input pattern ζ equal to the stored pattern ξ, then if the memory is working correctly, the input pattern should retrieve itself. To temporarily over simplify, the procedure to retrieve a pattern from memory is to multiply the connection matrix with the input pattern and run the result through the asymmetric sign function. Ψ(W ζ ) = Ψ(W ξ ) = Ψ 4 4 = Ψ 4 = 4 = ξ (6) The value of the function Ψ( ) is if its argument is greater than or equal to 0, and the value is - otherwise. 2 We assume the notational convention that when Ψ is given an array as argument, it computes the value for each component of the array and returns the array of results. 2 We call the function asymmetric because it arbitrarily maps 0 into +. When the units took on values of 0 and, then Ψ( ) was used to denote the unit step function. Its purpose is essentially the same in both cases. 0 Copyright c A. S. Maida

11 CONTENTS 0.. ASSOCIATIVE MEMORY Storage controller compute and add to W ξ µ ξ µ ( ) T Hopfield memory activation vector: S connection matrix: W Retrieval controller initialize S to ζ and start retrieval dynamics return S after retrieval converges ξ µ ζ S Figure 3: A Hopfield memory module consists of a storage controller, a retrieval controller, and the memory proper. The storage controller adjusts the weights to store a new pattern. The retrieval controller sets the state vector for the units to some input pattern and then sends a message to the memory to start the retrieval dynamics. After the update process converges, the retrieval controller reads out the memory state vector. There is a principled reason that the example in (6) gave the correct answer. Let s redo this example substituting ξ ( ξ ) T for W. This gives us the following. Ψ(W ζ ) = Ψ(W ξ ) = Ψ(( ξ ( ξ ) T ) ξ ) (7) = Ψ( ξ (( ξ ) T ξ )) (8) = Ψ( ξ n) (9) = ξ We can go from step (7) to (8) because matrix multiplication is associative. Matrix multiplication is not commutative and ξ ( ξ ) T is not equal to ( ξ ) T ξ. In fact, since the stored patterns are always plus and minus ones, ( ξ ) T ξ simplifies to n. This gives us step (9). A Hopfield memory module If you were to implement a Hopfield memory module in software, it would consist of three parts. First, there would be the memory itself, consisting of a vector of units and an associated connection matrix. The connection matrix would be initialized to all zeroes because the memory is not yet storing any patterns. The second and third parts of the module would be two controllers: a storage controller and a retrieval controller, as seen in Figure 3. Whenever one would want to add a new pattern to the memory, the storage controller would accept a pattern vector, ξ, whose length is equal to the number of units in the memory. It would compute the outer product to generate a connection matrix, and then it would add this matrix to the connection matrix implementing the memory. Copyright c A. S. Maida

12 0.. ASSOCIATIVE MEMORY CONTENTS Whenever you would want to retrieve a pattern from the memory, you would submit an input pattern, ζ, to the retrieval controller. The retrieval controller would initialize the unit activations to match the bits in the input pattern. At this point, the unit activations in the memory would not necessarily be consistent with the values in the connection matrix. The retrieval process would update the units in the network to be as consistent as possible with the connection matrix. After the update process completes, the unit activations should be in a retrieval state corresponding to the stored pattern that has least Hamming distance from the input pattern. Noiseless retrieval dynamics Expression (6) illustrated an over simplified retrieval process. We now describe retrieval dynamics more thoroughly. There are two classes of update procedures associated with Hopfield networks depending on whether you update all of the units simultaneously or one-by-one. The former is called synchronous update and the latter is called asynchronous update. These alternative procedures lead to different retrieval dyamics. Both types of update are based on equation (). Since the effect of the update event is deterministic without a random component, these dynamics are called noiseless. Synchronous update A single synchronous update step was illustrated in formula (6). In synchronous update, you assume that there is a global clock and each unit in the network updates itself simultaneously at each clock tick or time step. Normally, in a retrieval episode, the synchronous update step is repeated some number of times because the network rarely reaches a correct retrieval state after just one update step. If the units were linear, matrix multiplication alone would accurately model a single update step. You would simply multiply the connection matrix with the state vector describing the instantaneous activations of the units. Since the units are threshold units, we have to include thresholds in the update procedure as we had in expression (6). For the first update step, the unit activations have been initialized to the input vector ζ by the retrieval controller. Thus, the first update step is described as follows. S() = Ψ(W ζ) (20) When there is a global clock, we use the notation S(t) to refer to the unit activations at time step t. Of course, the unit activations at time t + can be described thus. S(t + ) = Ψ(W S(t)) (2) Asynchronous update In asynchronous update, only one unit is updated at a time. The term asynchronous comes from the metaphor of an idealized physical system in which units update themselves randomly and the update event is instantaneous. Then if the temporal grain size is taken to be small enough, no two units will update themselves simultaneously. In software, you would need a memory controller that randomly chooses one neuron to 2 Copyright c A. S. Maida

13 CONTENTS 0.. ASSOCIATIVE MEMORY s(t) s(t+) distance Table : The top row shows the network state at time t. The middle row shows the network state at time t + after being updated. The bottom row shows the Hamming distance between the corresponding states. This data is used to construct the transition graph in Figure 4. update at each time step. This procedure is called random, asynchronous update. There is another procedure called sequential, asynchronous update which updates one unit per time step, but always cycles through the units in the same order. For a network with n units, there are n! possible sequential update regimens. Attractors Mapping out attractors helps us determine which of the two retrieval processes, synchronous or asynchronous update, gives better performance. Recall that a fourunit Hopfield network has 6 states corresponding to the corners of a four-dimensional hypercube. If the network is effectively acting as a memory, the states corresponding to stored patterns should form basins of attraction, sometimes called fixed-point attractors. Let us map out the basins of attraction for the memory formed from the connection matrix given in Formula (4). The attractor map depends on the update procedure as well as the connection matrix. We ll start by assuming synchronous update. We need to name the possible network states so that we can indicate which state we are talking about. Since the states derive from the binary activations of the units, we can interpret the unit activations as a binary code and translate that into a decimal number (and add the prefix s to signify state). Thus, the states will have names s0 through s5. The state which is most important to us is s9 because it corresponds to the stored pattern ξ given in formula (2). The state names are also displayed on the corners of the hypercube in Figure 8. The state transition table and associated attractor map are shown in Figure 4. The state transition table at the top of Table 4 has three rows. The first row shows each of the 6 states at time t. The second row shows the new state the network reaches at time t + if it is updated synchronously. The third row shows the Hamming distance between the corresponding states at times t and t +. We can use this table to construct a very informative state transition graph shown in the bottom of Figure 4. The state transition graph tells us how well the memory works. This particular graph forms three groups. The nodes in the graph depict network states and the arcs show the state transition when the network is updated in a given state. The labels on the arcs show the Hamming distance between the two adjoining states. State s9 is in the group on the left. If the network is in s9 when it is updated, it stays in s9. So the memory is stable. If the network is in any of s, s8, s, or s3, it goes into s9 when it is updated. In this sense s9 is a basin of attraction, because the network will stabilize in state s9 if it is in any of the surrounding states. This is encouraging because it shows that s9 is 3 Copyright c A. S. Maida

14 0.. ASSOCIATIVE MEMORY CONTENTS s3 0 s9 s s4 0 s6 s2 s2 2 s0 4 s5 2 s3 s s8 s7 s4 2 s0 0 2 s5 Figure 4: Transition graph showing state changes for the network using the weight matrix in Formula (4) and using the synchronous update regimen given in Equation (2). Numbers on edges (arrows) show Hamming distance traveled by the update step. acting like a retrieval state. However, there seem to be two anomalies. Although we stored only one pattern in the memory, there are two other basins of attraction surrounding s6 and s5. State s6 is the complement state of s9 generated by flipping the bits. This is characteristic of Hopfield memories. When you store a pattern, you also store the pattern generated by complementing the bits. Notice that the complement of a pattern vector ξ T is represented as ξ = ( ξ,..., ξ n ) T. (22) From this, it follows that Equation (23) is true. ξ( ξ) T = ξ( ξ) T (23) Therefore, storing a pattern or its complement will generate the same connection matrix, and the memory looses information about which pattern was stored. The other basin of attraction, centered around s5, is spurious and should be minimized in memory design. Examining the Hamming distance between the states offers a clue as to why there is such a large spurious attractor basin at s5. First, notice that the attractor basin for our stored pattern s9 doesn t have any states whose Hamming distance from s9 is larger than one. In fact, this basin contains all of the states whose distance from s9 is one (in a four-bit pattern, there are four ways to have a one-bit mismatch). The same situation holds for pattern s6. So it seems that any pattern that is not within a Hamming distance of one from the stored patterns is captured by the spurious attractor s5. To make the memory work better, we need to make the attractor basins for stored memories larger and we need to make the spurious attractor basins smaller. Figure 5 displays graphs to show how the attractor basins have changed when sequential, asynchronous update is used. With asynchronous update, the attractors have better characteristics. The graphs were constructed from the information in Table 2. If only one unit at a time is updated, then a state change in the network corresponds to traversing exactly one edge in the hypercube. Thus, the new state is guaranteed to have a Hamming distance of one from the previous state. More important, if the connection matrix is symmetric with non-negative weights, then each network state can be associated with an energy value. A 4 Copyright c A. S. Maida

15 CONTENTS 0.. ASSOCIATIVE MEMORY s0 s2 s0 s3 2 s 0 s9 2 s5 s 2 s8 s3 s5 0 s4 s7 0 s6 s2 s4 Figure 5: Transition graph showing state changes for sequential, asynchronous update. An update step consists of updating the four units, one at a time, in the order u, u2, u3, and u4. Numbers on edges indicate Hamming distance traveled by the update step. s E u E u E u E u E Table 2: Table showing state changes when exactly one unit is updated. This information is used to contruct Figure 6. single asynchronous update event is guaranteed never to lead to a state of higher energy. The basins of attraction in the network are energy minima. Table 2 describes the state changes when only one unit is updated at a time. The results are summarized in Figure 6. In the table, the row labeled s indicates which of the 6 states is under consideration. The row labeled E shows the energy of that state. The concept of energy will be explained in the next section. The row labeled u shows the new state that is reached if unit u is updated, and the row labeled E shows the change in energy that resulted from this state transition. The rows labeled u2, u3, and u4 provide analogous information about the results of updating units u2, u3, and u4. Energy function The concept of energy will help us interpret Figure 6. Recall that the network state is given by the set of instantaneous unit activations and that this is encoded by 5 Copyright c A. S. Maida

16 0.. ASSOCIATIVE MEMORY CONTENTS 4 s0 s3 2 s5 3 2 s0 4 2 s2 4 3 s5 3 2 s s2 s4 s7 s8 s s3 s s6 s9 Figure 6: Graph showing state changes when one unit is updated at a time. Numbers on edges of graph indicate which of the four units was updated to cause the state transition. If a node has less than four arcs leaving it, then there is no state change associated with some of the unit updates. Nodes in top row have energy of 0. Nodes in middle row have energy of -2. Retrieval states in bottom row have energy of -8. the state vector S. The energy associated with S, given by formula (24), is a measure of the global consistency between the instantaneous unit activations and the connection weights. The lower the energy, the more consistent the network state is with the connection weights. In a noiseless network, the energy does not depend on the update procedure. E( S ) = n n w ij s i s j (24) 2 i= j= Figure 7 helps to interpret this formula. The figure shows a fully connected, four-unit Hopfield network in four different states (self-coupling terms are not shown). Since the connection strengths are symmetric, each arc depicts the two weights w ij and w ji. Let us examine what happens when the network stores one pattern ξ and the instantaneous state of the unit activations S is equal to ξ. This corresponds to the network labeled I in the figure. Recall that, when storing a single pattern, the connection weight needed on each w ij is ξ i ξ j. When S = ξ, then s i s j = ξ i ξ j = w ij, and the quantity, w ij s i s j, within the scope of the double summation is equal to one. Thus the energy of the retrieval state of a single stored pattern is 2 n2. In a network of four units, the energy will be 8. Since the complement of ξ yields the same connection matrix, the energy associated with this state is also -8. For the network labeled II in Figure 7, the state vector has a Hamming distance of one from the stored pattern ξ. The activation of unit u3 has been flipped from the value of +ξ 3 to ξ 3. The network is no longer globally consistent and the dashed lines show which part of the network is inconsistent. Consider all the synaptic connections that make contact with u3. Since each of these weight values were consistent with the unit activations in the top figure, they must now be inconsistent in the middle figure. If u3 is updated, it will flip 6 Copyright c A. S. Maida

17 CONTENTS 0.. ASSOCIATIVE MEMORY back to the value of +ξ 3 associated with the stored pattern. Since ten weights (including self-coupling terms) are consistent with the unit activations and six weights are inconsistent with the unit activations, the value of the double summation becomes 0 6 = 4 and the resulting energy is -2. For the network labeled III, unit u2 has been flipped and S now has Hamming distance of two from ξ. Notice that the arc depicting weights w ij and w ji is no longer dashed, indicating that it is now consistent with its adjacent units. In this case eight weights are consistent and eight weights are inconsistent, so the summation becomes 8 8 = 0, yielding an energy state of zero. Part IV of the figure shows that if we flip the activation of one more unit, u, the global consistency of the network increases. This is because the instantanteous state of the network has Hamming distance of one from complement stored pattern ξ. If the energies for all 6 states of our network are computed, one finds that the energy of a state is always either 0, 2, or 8. Only the retrieval states s6 and s9 have energies of 8. The states with energies of 2 have Hamming distances of one from a retrieval state and those with energies of zero having Hamming distances of two. Our discussion assumed that the connection matrix W was symmetric but it never made reference to specific values of the weights in the connection matrix. Therefore we know that the network associated with the matrix in expression (4) has these energy values. Asynchronous update does not increase energy (proof is in appendix), and since, for this example, energy is positively correlated with Hamming distance, update events never increase the Hamming distance of the instantaneous state from a stored pattern. Figure 8 shows the single-unit-update transitions visualized on the corners of a hypercube. Performance of attractor memories A network having n units can store about.38n random patterns without exceeding the memory capacity. To put this in concrete terms, if you have a memory consisting of 00 units, you can store 3 patterns, where each pattern consists of 00 random bits. This yields a storage capacity of 300 random bits. The weight matrix will have 0,000 entries. The memory capacity is proportional to the square root of the number of weights, which is proportional to the number of units. Hopfield networks also have spurious attractor states. One type of spurious attractor state is called a mixture state. A mixture state can be obtained by superimposing an odd number of stored patterns (such as three) and adding the corresponding bits together to produce a new vector of the same length. Thus, it is a mixture pattern. If the network state is initialized to the mixture pattern, the network will remain in that state during the asynchronous update procedure. That is, a mixture state is stable. This is undesirable because a mixture state does not correspond to a stored pattern. 3 3 We haven t introduced the concept of temperature. However, it is known that mixture states are unstable if the network is run at a high enough temperature. 7 Copyright c A. S. Maida

18 0.. ASSOCIATIVE MEMORY CONTENTS I u u2 u3 u4 S = (ξ, ξ2, ξ3, ξ4) II u u2 u3 u4 S = (ξ, ξ2, ξ3, ξ4) III u u2 u3 u4 S = (ξ, ξ2, ξ3, ξ4) IV u u2 u3 u4 S = ( ξ, ξ2, ξ3, ξ4) Figure 7: How Hamming distance from a stored pattern relates to energy state. Arcs show symmetric connections in a four-unit network. Self-coupling terms are not shown. Stored pattern is ξ. (I) Since S = ξ, there is no inconsistency. Energy is at a minimum. (II) Unit u3 is flipped giving S a Hamming distance of one from ξ. Three connection weights are inconsistent with S. (III) Unit u4 is flipped and now four weights are inconsistent but the w 2,3 /w 3,2 connection, which was inconsistent, is now consistent, despite the fact that the Hamming distance of S from ξ has increased from one to two. Energy is at a maxima. (IV) Flipping one more unit, u, starts to make the network consistent with the complement retrieval state. 8 Copyright c A. S. Maida

19 CONTENTS 0.. ASSOCIATIVE MEMORY s6 s7 s2 s3 s4 s5 s4 s5 s0 s s0 s s2 s3 s8 s9 Figure 8: Single-unit-update transitions visualized on a hypercube. State s9 represents the stored pattern ξ and s6 represents its complement. Arrows indicate transition from one state to another as a result of updating one unit. Limitations of the method We have mapped out attractor basins in a brute force manner. This is impractical for networks large enough to store more than one pattern. In order to obtain a specific example of a Hopfield memory having spurious basins (called mixture states) for further analysis, one needs a memory large enough to store three patterns. In order for a memory to be reliable, the number of stored patterns cannot exceed 0.38 the number of units. Therefore, you would need a memory consisting of 22 units to reliably store three patterns. The hypercube describing the instantaneous states of the network would have 2 22 corners, which is approximately four million Hetero-associative memory and central pattern generators We have so far only discussed auto-associative memory. We need to address the question of hetero-associative memory. Hetero-associative memory models the phenomenon of free and structured association. How does one idea or thought lead to the next? We can model this by having an attractor state become unstable after a certain period of time. But to do this right, we need some kind of (perhaps implicit) clocking mechanism in our network. Let us explore this intuition in more detail. Suppose that we have stored p patterns in an auto-associative memory. First, we would like the memory to go into a retrieval state for say pattern ξ and stay in that state long enough to have a meaningful thought, so to 9 Copyright c A. S. Maida

20 0.. ASSOCIATIVE MEMORY CONTENTS speak. Next, we would like that retrieval state to become unstable and transition to the next retrieval state, say, for pattern ξ 2. These operating characteristics raise two problems. First, how do we make a retrieval state become unstable given that we have designed it to be an attractor state? Second, once unstable, how do we control the direction of the transition? Building a hetero-associative memory Let us approach this problem by first trying to build a basic hetero-associative memory. Let us return to patterns ξ and ξ 2 given in Formulas (2) and (3). We want to build a device that allows us to think of pattern ξ 2 when given the input pattern ξ. For simplicity, we shall assume a synchronous update procedure. For auto-association we had used the storage prescription in Formula (4), repeated below. The superscript aut stands for autoassociation. W aut = ξ ( ξ ) T (25) For hetero-association, we shall use the storage prescription below. The superscript het stands for hetero-association. W het = ξ 2 ( ξ ) T (26) To retrieve a hetero-associatively stored pattern we do the following. Ψ(W het ζ ) = Ψ(W het ξ ) = Ψ( ξ 2 ( ξ ) T ξ ) = Ψ( ξ 2 n) = ξ 2 (27) This logic is the same as that used in the derivation given in Formulas (7) through (9). We can store multiple hetero-associations by analogy with the correlation recording recipe given in Formula (5). Simply superimpose the hetero-associations as described below. W het = p µ= ξ µ+ ( ξ µ ) T (28) This builds a memory where pattern ξ µ retrieves pattern ξ µ+. When multiple heteroassociations are stored in the memory, the retrieval operation can be described as shown below. W het ξ = [ ξ 2 ( ξ ) T + (W het ξ 2 ( ξ ) T )] ξ (29) = ξ 2 ( ξ ) T ξ + [(W het ξ 2 ( ξ ) T ) ξ ] (30) = ξ 2 + [(W het ξ 2 ( ξ ) T ) ξ ] (3) The expression (W het ξ 2 ( ξ ) T ) ξ is the crosstalk term. Limit cycles The state-transistion graphs we have so far seen did not contain any limit cycles. Instead of looking at the matrix in Formula (4), let s look at the matrix in Formula (7). If we 20 Copyright c A. S. Maida

21 CONTENTS 0.. ASSOCIATIVE MEMORY use this matrix, and apply a synchronous update step when the network is in state s, the network goes to state s8. If we apply synchronous update when in state s8, the network goes into state s. So the network oscillates between states s and s8. When drawn as a graph, this is called a limit cycle attractor. In this example, the oscillator depends on synchronous update, which assumes a global clock. Since an asynchronous network is biologically more realistic, it would be nice if we could wire up an asynchronous network to act as an oscillator. This is important because the brain, consisting of asynchronous neurons, does exhibit oscillations, as measured by EEG and does exhibit synchronized firing. How do these properties emerge from an asynchronous system? Pattern generators A central pattern generator is a system of neurons, perhaps consisting of a few clusters, that acts as an oscillator, perhaps generating repetitive motion. It can also cycle through a sequence of states such as what happens when a musical tune plays over and over in your head. One way to build a central pattern generator is to assume that asynchronous neurons have a primitive memory of how long they have been in a particular state (e.g., a firing state). Biologically this is plausible because neurons do show fatigue after sustained firing. Let s call this property the clocking primitive. With this primitive, it is possible to build an asynchronous network that, when in a stable state, jumps to a new stable state because the stable state becomes unstable after the neurons begin to show fatigue. Let us suppose that we have three patterns of interest, ξ, ξ 2, and ξ 3, and that we want to build the device described earlier. This system thinks of pattern ξ long enough to have a meaningful thought and then gets reminded of ξ 2, then ξ 3, and finally cycles back to ξ. Let s suppose also that we have already built an auto-associative memory for these three patterns and it is stored in the matrix W aut. Let s suppose that we have also built a heteroassociative memory along the lines described in Formula (28), and it is stored in the matrix W het. Since superposition has worked in the past, let s try using matrix W for the central pattern generator and see if we can make it work. W = W aut + W het (32) = ξ ( ξ ) T + ξ 2 ( ξ 2 ) T + ξ 3 ( ξ 3 ) T + ξ 2 ( ξ ) T + ξ 3 ( ξ 2 ) T + ξ ( ξ 3 ) T The main obstacle to success is that the memory now has conflicting information. For instance, when pattern ξ is input to the memory, should it retrieve itself or ξ 2? In order to solve this problem, we shall need to keep the matrices separate. We shall assume that the two matrix types have different hardware realizations, i.e., different types of synapses. Further the hetero-associative weights operate on a different time scale than the auto-associative weights. In particular, the hetero-associative weights will be slower, as if they are realized by slow synapses, as compared to the auto-associative synapses. This means we shall have two sets of synapses, one to store the auto-associative weights and the other to store the hetero-associative weights. 2 Copyright c A. S. Maida

22 0.. ASSOCIATIVE MEMORY CONTENTS W aut W het W aut W het W aut ξ W het ξ 2 ξ 3 Figure 9: Operation of central pattern generator. Figure 9 illustrates how the separate sets of weights mediate the transitions from thinking of ξ, ξ 2, and ξ 3 cyclically in sequence. The nodes in the graph depict the attractor states for the three patterns. The arcs, labeled with either W aut or W het, show causal influences within the network dynamics. The W aut try to keep the pattern stable. The W het weights try get the network to make a transition to the next state. Let s assume that it takes a W aut synapse one time step to propagate its information from pre-synaptic to post-synaptic but that it takes a W het synapse five time steps to do the same thing. Then, if the network goes into pattern ξ, it will be controlled by the stabilizing W aut weights before the W het weights start to exert their effect. If the W het weights overwhelm the W aut weights, then the system will transition to the state representing pattern ξ 2. Equation (33) formally specifies how this idea works. Recall, that we have used h i (t) to refer to the net input to unit i at time t. Formula (33) says that we are going to separate the total net input into that caused by the auto-associative weights and that caused by the heteroassociative weights. Further, we are going to combine the two net inputs by superposition, and we are going to amplify the effects of the heter-associative weights by the parameter λ which is a positive number greater than one, so that the effect of the hetero-associative weights overwhelms those of the auto-associative weights. The parameter τ is a time delay, which in our previous example was five. This prevents the hetero-associative weights from kicking in for five time steps. Both h aut i (t) and h het i h total i (t) h aut i (t) + λ h het i (t τ) (33) (t) are defined as you would expect, as shown below. h aut i (t) = h het i (t) = n j= n j= w aut ij s j (t) (34) w het ij s j (t) (35) In summary, although the weight matrices for the auto- and hetero-associative mechanisms are kept separate, we do superimpose their effects when it is time to update a unit. 22 Copyright c A. S. Maida

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory Physics 178/278 - David Kleinfeld - Fall 2005; Revised for Winter 2017 7 Rate-Based Recurrent etworks of Threshold eurons: Basis for Associative Memory 7.1 A recurrent network with threshold elements The

More information

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5 Hopfield Neural Network and Associative Memory Typical Myelinated Vertebrate Motoneuron (Wikipedia) PHY 411-506 Computational Physics 2 1 Wednesday, March 5 1906 Nobel Prize in Physiology or Medicine.

More information

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory Physics 178/278 - David Kleinfeld - Winter 2019 7 Recurrent etworks of Threshold (Binary) eurons: Basis for Associative Memory 7.1 The network The basic challenge in associative networks, also referred

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Artificial Intelligence Hopfield Networks

Artificial Intelligence Hopfield Networks Artificial Intelligence Hopfield Networks Andrea Torsello Network Topologies Single Layer Recurrent Network Bidirectional Symmetric Connection Binary / Continuous Units Associative Memory Optimization

More information

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen Hopfield Networks (Excerpt from a Basic Course at IK 2008) Herbert Jaeger Jacobs University Bremen Building a model of associative memory should be simple enough... Our brain is a neural network Individual

More information

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.

More information

Computational Intelligence Lecture 6: Associative Memory

Computational Intelligence Lecture 6: Associative Memory Computational Intelligence Lecture 6: Associative Memory Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 Farzaneh Abdollahi Computational Intelligence

More information

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

Hopfield networks. Lluís A. Belanche Soft Computing Research Group Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya 2010-2011 Introduction Content-addressable

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Hopfield Networks Rudolf Kruse Neural Networks

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Hopfield Neural Network

Hopfield Neural Network Lecture 4 Hopfield Neural Network Hopfield Neural Network A Hopfield net is a form of recurrent artificial neural network invented by John Hopfield. Hopfield nets serve as content-addressable memory systems

More information

Neural Nets and Symbolic Reasoning Hopfield Networks

Neural Nets and Symbolic Reasoning Hopfield Networks Neural Nets and Symbolic Reasoning Hopfield Networks Outline The idea of pattern completion The fast dynamics of Hopfield networks Learning with Hopfield networks Emerging properties of Hopfield networks

More information

Hopfield Network for Associative Memory

Hopfield Network for Associative Memory CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory 1 The next few units cover unsupervised models Goal: learn the distribution of a set of observations Some observations

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 2

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Spring 9 Lecture Notes Note Vectors and Matrices In the previous note, we introduced vectors and matrices as a way of writing systems of linear equations

More information

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Using Variable Threshold to Increase Capacity in a Feedback Neural Network Using Variable Threshold to Increase Capacity in a Feedback Neural Network Praveen Kuruvada Abstract: The article presents new results on the use of variable thresholds to increase the capacity of a feedback

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. III. Recurrent Neural Networks A. The Hopfield Network 2/9/15 1 2/9/15 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs output net

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 2

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note Vectors and Matrices In the previous note, we introduced vectors and matrices as a way of writing systems of linear equations

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory Part VII 1 The basic task Store a set of fundamental memories {ξξ 1, ξξ 2,, ξξ MM } so that, when presented a new pattern

More information

Memories Associated with Single Neurons and Proximity Matrices

Memories Associated with Single Neurons and Proximity Matrices Memories Associated with Single Neurons and Proximity Matrices Subhash Kak Oklahoma State University, Stillwater Abstract: This paper extends the treatment of single-neuron memories obtained by the use

More information

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1. CompNeuroSci Ch 10 September 8, 2004 10 Associative Memory Networks 101 Introductory concepts Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1 Figure 10 1: A key

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Neural Networks. Hopfield Nets and Auto Associators Fall 2017 Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 Story so far Neural networks for computation All feedforward structures But what about.. 2 Loopy network Θ z = ቊ +1 if z > 0 1 if z 0 y i

More information

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1 5.3 METABOLIC NETWORKS 193 5.3 Metabolic Networks 5.4 Bayesian Networks Let G = (V, E) be a directed acyclic graph. We assume that the vertices i V (1 i n) represent for example genes and correspond to

More information

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs Pattern Association or Associative Networks Jugal Kalita University of Colorado at Colorado Springs To an extent, learning is forming associations. Human memory associates similar items, contrary/opposite

More information

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Fall 07 Official Lecture Notes Note Introduction Previously, we introduced vectors and matrices as a way of writing systems of linear equations more

More information

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies

More information

NOTES ON LINEAR ALGEBRA CLASS HANDOUT

NOTES ON LINEAR ALGEBRA CLASS HANDOUT NOTES ON LINEAR ALGEBRA CLASS HANDOUT ANTHONY S. MAIDA CONTENTS 1. Introduction 2 2. Basis Vectors 2 3. Linear Transformations 2 3.1. Example: Rotation Transformation 3 4. Matrix Multiplication and Function

More information

Linear Algebra, Summer 2011, pt. 2

Linear Algebra, Summer 2011, pt. 2 Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................

More information

Sequential Circuits Sequential circuits combinational circuits state gate delay

Sequential Circuits Sequential circuits combinational circuits state gate delay Sequential Circuits Sequential circuits are those with memory, also called feedback. In this, they differ from combinational circuits, which have no memory. The stable output of a combinational circuit

More information

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form

More information

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i ) Symmetric Networks Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). How can we model an associative memory? Let M = {v 1,..., v m } be a

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

Iterative Autoassociative Net: Bidirectional Associative Memory

Iterative Autoassociative Net: Bidirectional Associative Memory POLYTECHNIC UNIVERSITY Department of Computer and Information Science Iterative Autoassociative Net: Bidirectional Associative Memory K. Ming Leung Abstract: Iterative associative neural networks are introduced.

More information

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks Commun. Theor. Phys. (Beijing, China) 40 (2003) pp. 607 613 c International Academic Publishers Vol. 40, No. 5, November 15, 2003 Effects of Interactive Function Forms in a Self-Organized Critical Model

More information

Replay argument. Abstract. Tanasije Gjorgoski Posted on on 03 April 2006

Replay argument. Abstract. Tanasije Gjorgoski Posted on  on 03 April 2006 Replay argument Tanasije Gjorgoski Posted on http://broodsphilosophy.wordpress.com/, on 03 April 2006 Abstract Before a year or so ago, I was trying to think of an example so I can properly communicate

More information

Logic Learning in Hopfield Networks

Logic Learning in Hopfield Networks Logic Learning in Hopfield Networks Saratha Sathasivam (Corresponding author) School of Mathematical Sciences, University of Science Malaysia, Penang, Malaysia E-mail: saratha@cs.usm.my Wan Ahmad Tajuddin

More information

Module - 19 Gated Latches

Module - 19 Gated Latches Digital Circuits and Systems Prof. Shankar Balachandran Department of Electrical Engineering Indian Institute of Technology, Bombay And Department of Computer Science and Engineering Indian Institute of

More information

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Boolean circuits. Lecture Definitions

Boolean circuits. Lecture Definitions Lecture 20 Boolean circuits In this lecture we will discuss the Boolean circuit model of computation and its connection to the Turing machine model. Although the Boolean circuit model is fundamentally

More information

Reification of Boolean Logic

Reification of Boolean Logic 526 U1180 neural networks 1 Chapter 1 Reification of Boolean Logic The modern era of neural networks began with the pioneer work of McCulloch and Pitts (1943). McCulloch was a psychiatrist and neuroanatomist;

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Week 4: Hopfield Network

Week 4: Hopfield Network Week 4: Hopfield Network Phong Le, Willem Zuidema November 20, 2013 Last week we studied multi-layer perceptron, a neural network in which information is only allowed to transmit in one direction (from

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Associative Memory Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Storage Analysis Sparse Coding Implementation on a

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Neural Networks Lecture 2:Single Layer Classifiers

Neural Networks Lecture 2:Single Layer Classifiers Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

Linear Regression, Neural Networks, etc.

Linear Regression, Neural Networks, etc. Linear Regression, Neural Networks, etc. Gradient Descent Many machine learning problems can be cast as optimization problems Define a function that corresponds to learning error. (More on this later)

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound

Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound Copyright 2018 by James A. Bernhard Contents 1 Vector spaces 3 1.1 Definitions and basic properties.................

More information

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi

More information

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. 2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. - For an autoassociative net, the training input and target output

More information

Associative Memories (I) Hopfield Networks

Associative Memories (I) Hopfield Networks Associative Memories (I) Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Applied Brain Science - Computational Neuroscience (CNS) A Pun Associative Memories Introduction

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge

More information

Information storage capacity of incompletely connected associative memories

Information storage capacity of incompletely connected associative memories Information storage capacity of incompletely connected associative memories Holger Bosch a, Franz J. Kurfess b, * a Department of Computer Science, University of Geneva, Geneva, Switzerland b Department

More information

INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS INTRODUCTION TO NEURAL NETWORKS R. Beale & T.Jackson: Neural Computing, an Introduction. Adam Hilger Ed., Bristol, Philadelphia and New York, 990. THE STRUCTURE OF THE BRAIN The brain consists of about

More information

6. Finite State Machines

6. Finite State Machines 6. Finite State Machines 6.4x Computation Structures Part Digital Circuits Copyright 25 MIT EECS 6.4 Computation Structures L6: Finite State Machines, Slide # Our New Machine Clock State Registers k Current

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

The Hypercube Graph and the Inhibitory Hypercube Network

The Hypercube Graph and the Inhibitory Hypercube Network The Hypercube Graph and the Inhibitory Hypercube Network Michael Cook mcook@forstmannleff.com William J. Wolfe Professor of Computer Science California State University Channel Islands william.wolfe@csuci.edu

More information

Implementing Competitive Learning in a Quantum System

Implementing Competitive Learning in a Quantum System Implementing Competitive Learning in a Quantum System Dan Ventura fonix corporation dventura@fonix.com http://axon.cs.byu.edu/dan Abstract Ideas from quantum computation are applied to the field of neural

More information

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870 Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870 Jugal Kalita University of Colorado Colorado Springs Fall 2014 Logic Gates and Boolean Algebra Logic gates are used

More information

Matrices and Vectors

Matrices and Vectors Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 135, FIM 72 GU, PhD Time: Place: Teachers: Allowed material: Not allowed: October 23, 217, at 8 3 12 3 Lindholmen-salar

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

DIMACS Technical Report March Game Seki 1

DIMACS Technical Report March Game Seki 1 DIMACS Technical Report 2007-05 March 2007 Game Seki 1 by Diogo V. Andrade RUTCOR, Rutgers University 640 Bartholomew Road Piscataway, NJ 08854-8003 dandrade@rutcor.rutgers.edu Vladimir A. Gurvich RUTCOR,

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Marr's Theory of the Hippocampus: Part I

Marr's Theory of the Hippocampus: Part I Marr's Theory of the Hippocampus: Part I Computational Models of Neural Systems Lecture 3.3 David S. Touretzky October, 2015 David Marr: 1945-1980 10/05/15 Computational Models of Neural Systems 2 Marr

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay Lecture - 15 Momentum Energy Four Vector We had started discussing the concept of four vectors.

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Sequential Logic (3.1 and is a long difficult section you really should read!)

Sequential Logic (3.1 and is a long difficult section you really should read!) EECS 270, Fall 2014, Lecture 6 Page 1 of 8 Sequential Logic (3.1 and 3.2. 3.2 is a long difficult section you really should read!) One thing we have carefully avoided so far is feedback all of our signals

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3 Linear Algebra Row Reduced Echelon Form Techniques for solving systems of linear equations lie at the heart of linear algebra. In high school we learn to solve systems with or variables using elimination

More information

Extending the Associative Rule Chaining Architecture for Multiple Arity Rules

Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Nathan Burles, James Austin, and Simon O Keefe Advanced Computer Architectures Group Department of Computer Science University

More information

Lecture notes on Turing machines

Lecture notes on Turing machines Lecture notes on Turing machines Ivano Ciardelli 1 Introduction Turing machines, introduced by Alan Turing in 1936, are one of the earliest and perhaps the best known model of computation. The importance

More information