Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

Similar documents
A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

Using a Hopfield Network: A Nuts and Bolts Approach

Artificial Intelligence Hopfield Networks

Neural Nets and Symbolic Reasoning Hopfield Networks

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Storage Capacity of Letter Recognition in Hopfield Networks

Hopfield Network for Associative Memory

Hopfield Neural Network

Neural Network Essentials 1

Chapter 10 Associative Memory Networks We are in for a surprise though!

Machine Learning. Neural Networks

Learning and Memory in Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Computational Intelligence Lecture 6: Associative Memory

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Week 4: Hopfield Network

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Chapter 9: The Perceptron

Introduction to Neural Networks

18.6 Regression and Classification with Linear Models

Hopfield and Potts-Hopfield Networks

Good vibrations: the issue of optimizing dynamical reservoirs

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

Sections 18.6 and 18.7 Artificial Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

A.I.: Beyond Classical Search

Lecture 4: Feed Forward Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Memories Associated with Single Neurons and Proximity Matrices

Learning in State-Space Reinforcement Learning CIS 32

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Introduction to Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Associative Memories (I) Hopfield Networks

CSC321 Lecture 7: Optimization

Neural Networks Lecture 2:Single Layer Classifiers

CSC321 Lecture 8: Optimization

Logic Learning in Hopfield Networks

Artificial Neural Networks The Introduction

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Systems Biology: A Personal View IX. Landscapes. Sitabhra Sinha IMSc Chennai

Optimization and Gradient Descent

Storkey Learning Rules for Hopfield Networks

Stochastic Networks Variations of the Hopfield model

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Grundlagen der Künstlichen Intelligenz

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer

Classification: The rest of the story

Neural networks: Unsupervised learning

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Any live cell with less than 2 live neighbours dies. Any live cell with 2 or 3 live neighbours lives on to the next step.

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Neural Networks Lecture 6: Associative Memory II

Introduction to Neural Networks

9 Classification. 9.1 Linear Classifiers

arxiv: v2 [nlin.ao] 19 May 2015

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

How to do backpropagation in a brain

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Error Functions & Linear Regression (1)

Financial Informatics XVII:

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Markov Chains and MCMC

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

SGD and Deep Learning

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

Linear Regression, Neural Networks, etc.

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Data Mining Part 5. Prediction

Learning Long Term Dependencies with Gradient Descent is Difficult

Neural Networks 2. 2 Receptive fields and dealing with image inputs

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Lecture 15: Exploding and Vanishing Gradients

Multilayer Perceptron

Artificial Neural Networks. Historical description

Lecture - 24 Radial Basis Function Networks: Cover s Theorem

Transcription:

Hopfield Networks (Excerpt from a Basic Course at IK 2008) Herbert Jaeger Jacobs University Bremen

Building a model of associative memory should be simple enough... Our brain is a neural network Individual neurons are quite well understood Almost universally shared belief: memories are coded in synaptic connectivity So it only remains to find out how the memories go there and how they are retrieved

Our topic for today How can "information" be "coded" in the "connectivity" of "neural networks"? Eerh... hrmm... What does this mean? All of these concepts are so imprecise... Somebody's gotta set the rules. We need a decision. A tough decision. And stick to it. http://www.sillyjokes.co.uk/

The Sheriff says... We can store discrete items in memory (finitely many, individual, fundamental patterns) Fundamental patterns are addressed by auto-association: the fundamental pattern is reconstituted from a "similar" cue input Example: "similar" = "corrupted": Pattern restauration Example: "similar" = "partial": Pattern completion Images from: Hertz et al 99

Other "similarity" relationships cue - pattern Thinking only of visual patterns for simplicity... "similar" = distorted "similar" = shifted, mirrored, rotated "similar" = B/W (pattern in color) "similar" = line drawing (pattern photo) "similar" = preceding in time (pattern appears in animated sequence) "similar" = still (cueing an animated scene)

The rolling-downhill-ball model of memory Consider space of all possible neural "pattern states" (in fig.: 2-dim vector space spanned by V, V2) Within this space, some points are the fundamental patterns ( i in fig.) Above the pattern space, an energy landscape (or a potential) E is defined. Fundamental patterns lie at the minima of the energy. Any pattern may serve as cue. The process of associative retrieval of a pattern from a "similar" cue is determined by gradient descent from E( ) to E( ). E 2 3 www.ift.uib.no/~antonych/protein.html

Aerial view of the same This contour plot shows the energy landscape like in a map. The pattern space becomes divided into basins of attraction. E.g., all cue patterns that are "attracted" by 2 form the basin of attraction of the fixed point attractor 2. 5 4 3 2 2 0 0 2 3 4 5 One may say that all these are instances of the category (concept, class) represented by 2. 3

Same, with "real" patterns... 5 4 3 2 0 0 2 3 4 5 Pixel images from Haykin 999

Structure of a Hopfield network (HN) A HN is made of N binary neurons (in figure: N = 4) Each neuron is connected to all other neurons (except itself - no auto-feedback) Connection between neurons i and j is symmetric (same as connection between j and i) and has a weight w ij = w ji R. At a given time, neuron i has state s i {, }. The entire network has a state S = (s,..., s N ) T (written as a column vector) A simple demo HN w 3 = w 3 = 0.5 s = w 2 = w 2 = 3 s 3 = w 4 = w 4 = 0.2 w 32 = w 23 = s 2 = s 4 = w 34 = w 43 = 2 s s2 S = = s 3 s4 w 42 = w 24 = 0.

Energy of a state By definition, a state S = (s,..., s N ) T has an energy s = 3 s 2 = E( S) = N w ij i, j i< j =,..., N s i s j 0.5 0.2 0. Example right: E(S) = /4 ( w 2 w 3 w 4 + w 23 + w 24 + w 34 ) = /4 (+ 3 0.5 0.2 + + 0. 2) = 0.35 Electric metaphor: higher energy means more "neighboring opposite charges" (almost like a charged battery) s 3 = 2 s s2 S = = s 3 s4 s 4 =

A closeup on the neuron model The (discrete-time) HN is made of McCulloch-Pitts neurons (same as the perceptron) Update rule: Neuron i first sums its connection-weighted inputs, then takes difference to a threshold i, then passes the sum through a binary decision function, the sign function sgn(x) = if x < 0, sgn(x) = + if x 0 Biology view Engineering view s ( t i + ) = sgn w j=... N j i ij s j ( t) Θ i Math view (Figures from the Maida lecture notes)

Convention: in HNs, all i = 0 Updating neurons and networks Single neuron update, for instance of neuron 3: s 3 ( t + ) = sgn w j=,2,4 ( t) = sgn( 0.5*+ * 2*) = sgn(.5) = ij s j s = 0.5 0.2 3 s 2 = 0. Entire network update:. Pick one neuron at random (stochastic choice) 2. Update it as above (deterministic update) 3. Iterate HNs evolve over time by individual updates of randomly picked neurons. s 3 = 2 s s2 S = = s 3 s4 s 4 =

Updating neurons and networks 2 From previous slide: update of s 3 was s 3 (t) = s 3 (t+) = This is a network update S( t) = S( t + ) = s = 0.5 0.2 3 s 2 = 0. Energy at time t was E(S(t)) = 0.35. Calculate: E(S(t+)) =.. Observation: E(S(t+)) < E(S(t)). s 3 = 2 s 4 = Fact: whenever S(t+) S(t), then E(S(t+)) < E(S(t)). "All state changes reduce the energy in a Hopfield network." s s2 S = = s 3 s4

Representing patterns by HN states A "pattern" is identified with a state (thus, every neuron takes part in representing a pattern) HNs "patterns" are therefore just N-dimensional {-,} vectors. To represent real-world patterns, one must code them as binary vectors. Example: The image could be coded as = S

HN as memory: Problem statement Given: p fundamental patterns,..., p, coded as N-dimensional {-,}- vectors. Wanted: a HN with N neurons, which has these fundamental patterns as the minima of its energy landscape. E(S) S 2 3 4 Reward: we would have a neural network, which when started in some pattern (state) "similar" to a fundamental pattern, would evolve toward it by the stochastic state update and the energy minimization principle.

Remember our intuition... 5 4 3 2 0 0 2 3 4 5

Hypercube state space of HNs The states of a HN are not continuous but discrete. A HN with N neurons can host 2 N different states - a finite number. These states can conveniently be arranged at the corners of an N-dimensional hypercube. N = 2 N = ) ( ) ( N = 3 N = 4

"Rolling downhill" in a hypercube Just to give you an impression... Pattern states in a N = 6 HN (slightly misleading though, images have N = 20!) Indicates energy at state (hypercube corner) Single-neuron update (flips one component of state vector)

Problem statement, again Given: p fundamental patterns,..., p, coded as N-dimensional {-,}-vectors. Wanted: a HN with N neurons, which has these fundamental patterns as the minima of the energy values at hypercube corners.... Seems a tough problem.

Solution: the HN should learn that by itself! Observation: the dynamic behaviour of a HN depends only on the weights w ij. Idea: train the weights by "showing" the fundamental patterns,..., p repeatedly to the network. At each exposure, the weights are adapted a little bit such that the energy for that pattern is a bit reduced. After numerous exposures, the energy landscape should have deep troughs at the states,..., p. Where do we get such a clever weight adaptation rule from?

Donald O. Hebb (904-985) Started out as English teacher Turned to an academic carreer in psychology 949 The Organization of Behavior: A Neuropsychological Theory williamcalvin.com/bk9/bk9inter.htm

Hebb's postulate (949) "When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased." (Hebb, The Organization of Behavior, 949) Also known as, "Cells that fire together wire together" Or simply as, "Hebb's learning rule"

Training HNs with Hebb's rule Hebb's rule spells out like this: "Presenting = (,..., N ) to the network" means just this, "set the N neuron states equal to (,..., N )." The neurobiological, Hebbian version of adapation: " i and j fire together if sgn( i ) = sgn( j ). Then, increase w ij a bit." Corresponds to energy reduction. Remember, Thus, if sgn( i ) = sgn( j ), increasing w ij reduces E( ). µ µ E( ξµ ) = w ij ξi ξ j. N i, j,..., i< j = N This leads to the following training scheme.. Present,..., p to the HN repeatedly (in random or cyclic order). 2. When is presented, update all weights by w ij w ij + λ i j (where λ is a small learning rate)

Training HNs with Hebb's rule, continued Training scheme (repeated):. Present,..., p to the HN repeatedly. 2. When is presented, update all weights by w ij w ij + λ i j Effect: On each pattern presentation of some, the energy of the corresponding state is lowered a bit. HOPE: We will eventually get a network that has energy troughs at the fundamental pattern states (and thus can serve as a "ball-rolling-downhill" memory)

A shortcut to the iterative training scheme Fact: the iterative learning scheme will converge (up to an irrelevant scaling constant) to a unique set of weights. Luck: there exists a simple one-shot computation for this asymptotic weight set, given by w ij p µ = ξ µ i ξ j N µ=

Summary so far Given p fundamental patterns,..., p (coded as N-dimensional {-, } vectors), we can "store" them into an N-dimensional HN by setting the weights according to w ij p µ = ξ µ i ξ j N µ= These weights are equivalent (up to irrelevant scaling) to the weights that we would ultimately get by using Hebbian learning, which at each step "tries" to reduce the energy of one of the states.

Natural questions Do we really get local energy minima at the fundamental pattern states? How many patterns can we store? Do we get energy minima only at the fundamental pattern states (or do we also create other local minima, that is, "false" memories)? Answers to these questions are known, and the analysis behind all of this has made Hopfield Networks so famous.

Q: fundamental patterns = local minima? Test for this: present a fundamental pattern to the trained network. If (and only if) the state corresponds to a local minimum, then all bits i of that state vector must be stable under the state update rule, that is ξ µ i µ µ ( t + ) = sgn w ξ ( t) = ξ ( t) Outcome: unfortunately, not always are all fundamental patterns stable. Analytic result: If p patterns are stored in a size N network, the probability that a randomly picked bit in a randomly picked pattern will flip when updated, is j i ij j i P(i unstable) = Φ( N p ) Note: Φ(a) is the shaded area under the standard Gaussian distribution a 0

More on stability... Situation: p patterns are stored in a size N network. Then the probability that a pattern bit is unstable when updated is P(i unstable) = Φ( N p ) Φ(a) a 0 Consequence: the pattern bit stability is related to p/n, called the load of the HN. The more patterns stored relative to the network size, the higher the chances that bits of fundamental patterns may flip. Beware of avalanches! The bit stability refers only to one isolated update within a presented pattern. But... If the entire network ist "run" by iterated state bit updates, flipped bits may pull further bits with them...

Avalanche stability Situation: p patterns are stored in a size N network. Pick a pattern and present it to the network (that is, make it the starting state). Some of its bits may be unstable under bit update. Two things may happen under iterated stochastic network update:. While some bits may flip, their flips don't induce more and more flips - the dynamics stabilizes in a pattern not too different from the original one. 2. The unstable bits trigger an avalanche, just like in a nuclear chain reaction. Analysis: whether or not avalanches occur depends on the load p/n. In the limit of large N, at a load of p/n > 0.38 every pattern becomes avalanche-instable. Figure: % of changed pattern bits under iterated network update, vs. load (here denoted by ). (D. J. Amit et al., Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Let. 55(4), 985)

Avalanche stability: comments Load below 0.38: essentially stable memories Load above 0.38: suddenly all memories are completely unstable p/n = 0.38 is a critical value of the load parameter; when it is crossed, the behaviour of the system changes abruptly and dramatically. This sudden change of qualitative behaviour at critical values is common in nature; here we have a case of a phase transition.

Q: are there "false memories" (other local minima)? Let's train a 20-neuron HN on these 8 fundamental patterns: Which minimal-energy patterns will emerge in the HN? (how many local troughs will the energy landscape have?)

Some "false memories" Starting from 43,970 random states ended in these final stable states (from Haykin 998). There are three kinds of "false memories":. Inverted states (occur necessarily) 2. Mixtures of an odd number of fundamental patterns (example: mixture of, 4, 9 patterns) 3. spurious states (aka spin glass states) which are uncorrelated to stored patterns

False memories and network load For all loads p/n: stable spin glass states exist. For p/n > 0.38: spin glass states are the only stable ones. For 0 < p/n < 0.38: stable states close to desired fundamental patterns exist. For 0 < p/n < 0.05: pattern-related stable states have lower energy than spin glass states. For 0.05 < p/n < 0.38: spin glass states dominate (some of them have lower energy than pattern-related states) For 0 < p/n < 0.03: additional mixture states exist, with energies not quite as low as the pattern-related states. In sum, a HN works really well only for a load 0.03 < p/n < 0.05. Results on this slide reported after: MacKay 2003 (chapter 42).

Hopfield networks: Pro's Simple model, can be mathematically analysed Biologically not immediately unrealistic Has strongly influenced how neuroscientists think about memory Connections to other field of computational physics (spin glass models) Are robust against "brain damage" (not discussed here) Have historically helped to salvage neural network research, and J. J. Hopfield was traded over several years as a Nobel prize candidate

Hopfield networks: Con's Small memory capacity: can store only about 5-0% of Nr. of patterns compared to network size (but is this really "small" -? Considering how many neurons we have... -?) All the "nice" results hold for uncorrelated fundamental patterns - an unrealistic assumption Not technically useful Unwanted spurious, inverted and superimposed memories Has strongly influenced how neuroscientists think about memory

Variants, ramifications Continuous-time, continuous-state HNs (see Hopfield chapters in Haykin's and MacKay's textbooks) Have also been used to tackle hard optimization problems (e.g., google "Traveling Salesman" + "Hopfield network") Still and repeatedly and always anew the subject of studies. Biologically plausible connectivity patterns have been studied, as well as small-world connectivity patterns (Davey et al 2005)