Hopfield networks. Lluís A. Belanche Soft Computing Research Group

Similar documents
7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

Artificial Intelligence Hopfield Networks

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Hopfield Network for Associative Memory

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Markov Chains and MCMC

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Neural Network Essentials 1

Computational Intelligence Lecture 6: Associative Memory

Iterative Autoassociative Net: Bidirectional Associative Memory

Week 4: Hopfield Network

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Neural Networks Lecture 6: Associative Memory II

Neural Nets and Symbolic Reasoning Hopfield Networks

Artificial Neural Networks Examination, March 2004

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

CHAPTER 3. Pattern Association. Neural Networks

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

( ) ( ) ( ) ( ) Simulated Annealing. Introduction. Pseudotemperature, Free Energy and Entropy. A Short Detour into Statistical Mechanics.

Data Mining (Mineria de Dades)

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Using a Hopfield Network: A Nuts and Bolts Approach

Associative Memories (I) Hopfield Networks

Introduction to Neural Networks

How can ideas from quantum computing improve or speed up neuromorphic models of computation?

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Logic Learning in Hopfield Networks

Analysis of Algorithms. Unit 5 - Intractable Problems

Chapter 10 Associative Memory Networks We are in for a surprise though!

NP and Computational Intractability

Q520: Answers to the Homework on Hopfield Networks. 1. For each of the following, answer true or false with an explanation:

Learning and Memory in Neural Networks

Week Cuts, Branch & Bound, and Lagrangean Relaxation

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Optimization Exercise Set n.5 :

Optimization Exercise Set n. 4 :

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Stochastic Networks Variations of the Hopfield model

Vector, Matrix, and Tensor Derivatives

Artificial Neural Networks Examination, June 2004

Neural networks. Chapter 20. Chapter 20 1

Storage Capacity of Letter Recognition in Hopfield Networks

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CSC321 Lecture 8: Optimization

Travelling Salesman Problem

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Artificial Neural Networks Examination, June 2005

Lecture 7 Artificial neural networks: Supervised learning

8.5 Sequencing Problems

Computational complexity theory

2 Notation and Preliminaries

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

CSC321 Lecture 7: Optimization

Polynomial-Time Reductions

Neural networks. Chapter 19, Sections 1 5 1

Computational complexity theory

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Computational statistics

8.3 Hamiltonian Paths and Circuits

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

Branislav K. Nikolić

Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik. Combinatorial Optimization (MA 4502)

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

Convolutional Associative Memory: FIR Filter Model of Synapse

Swarm Intelligence Traveling Salesman Problem and Ant System

Introduction to Artificial Neural Networks

15.083J/6.859J Integer Optimization. Lecture 2: Efficient Algorithms and Computational Complexity

Neural networks: Unsupervised learning

Solving TSP Using Lotka-Volterra Neural Networks without Self-Excitatory

Imperial College London

arxiv: v2 [nlin.ao] 19 May 2015

Introduction to Chaotic Dynamics and Fractals

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES

Finding optimal configurations ( combinatorial optimization)

Data Structures and Algorithms (CSCI 340)

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Data Structures in Java

ACO Comprehensive Exam October 14 and 15, 2013

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

CMSC 421: Neural Computation. Applications of Neural Networks

Memories Associated with Single Neurons and Proximity Matrices

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1

Lecture 20 : Markov Chains

Proceedings of Neural, Parallel, and Scientific Computations 4 (2010) xx-xx PHASE OSCILLATOR NETWORK WITH PIECEWISE-LINEAR DYNAMICS

18.6 Regression and Classification with Linear Models

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Christian Mohr

Transcription:

Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya 2010-2011

Introduction Content-addressable autoassociative memory Deals with incomplete/erroneous keys: Julia xxxxázar: Rxxxela H ENERGY FUNCTION Attractors (memories) Basins of attraction A mechanical system tends to minimum-energy states (e.g. a pole) Symmetric Hopfield networks have an analogous behaviour: low-energy states attractors memories

Introduction A Hopfield network is a singlelayer recurrent neural network, where the only layer has n neurons (as many as the input dimension)

Introduction The data sample is a list of keys (aka patterns) L = {ξ i 1 i l}, with ξ i {0, 1} n that we want to associate with themselves. What is the meaning of autoassociativity? when a new pattern Ψ is shown to the network, the output is the key in L closest (in Hamming distance) to Ψ. Evidently, this can be achieved by direct programming; yet in hardware it is quite complex (and the Hopfield net does it quite distinctly) We therefore set up two goals: 1. Content-addressable autoassociative memory (not easy) 2. Tolerance to small errors, that will be corrected (rather hard)

Intuitive analysis The state of the network is the vector of activations e of the n neurons. The state space is {0, 1} n : 001 011 101 111. ξ 2 000 010. ξ 1. 100 110 ξ 3 When the network evolves, it traverses the hypercube

Formal analysis (I) For convenience, we change from {0, 1} to { 1, +1}, by s = 2e 1. Dynamics of a neuron: s i := sgn j=1 w ij s j θ i, with sgn(z) = { +1 : z 0 1 : z < 0 Let us simplify the analysis by using θ i = 0, and uncorrelated keys ξ i (from a uniform distribution).

Formal analysis (II) How are neurons chosen to update their activations? 1. Synchronous: all at once 2. Asynchronous: there is a sequential order or a fair probability distribution The memories do not change; the end attractor (the recovered memory) does We will assume asynchronous updating When does the process stop? When there are no more changes in the network state (we are in a stable state)

Formal analysis (III) PROCEDURE: 1. Formula for setting the weights w ij to store the keys ξ i L 2. Sanity check that the keys ξ i L are stable (they are autoassociated with themselves) 3. Check that small perturbations of the keys ξ i L are corrected (they are associated with the closest pattern in L)

Formal analysis (IV) Case l = 1 (only one pattern ξ): Present ξ to the net: s i := ξ i, 1 i n Stability: s i := sgn w ij s j = sgn w ij ξ j =... should be... = ξi j=1 j=1 One way to get this is: w ij = αξ i ξ j, α > 0 R (called Hebbian learning) sgn w ij ξ j = sgn αξ i ξ j ξ j = sgn αξ i = sgn (αnξ i ) = sgn(ξ i ) = ξ i j=1 j=1 j=1

Formal analysis (V) Take α = 1/n. This has a normalizing effect: w i 2 = 1/ n Therefore W n n = (w ij ) = 1 n ξ ξ, i.e. w ij = 1 n ξ iξ j If the perturbation consists in b flipped bits, where b (n 1) 2, then the sign of n j=1 w ij s j does not change! = if the number of correct bits exceeds that of incorrect bits, the network is able to correct the error bits

Formal analysis (VI) Case l 1: Superposition: w ij = 1 n l k=1 ξ ki ξ kj and W n n = 1 n l k=1 ξ k ξ k Defining E n l = [ξ 1,..., ξ l ], we get W n n = 1 n EEt Note W n n is a symmetric matrix

Formal analysis (VII) Stability of a pattern ξ v L: Initialize the net to s i := ξ vi, 1 i n Stability: s i := sgn h vi = n j=1 w ij ξ vj = 1 n ( j=1 l j=1 k=1 w ij ξ vj ) = sgn(h vi ) ξ ki ξ kj ξ vj = 1 n j=1 ( l k=1,k v ) ξ ki ξ kj ξ vj + ξ vi ξ vj ξ }{{ vj } +1 = ξ vi + 1 n l j=1 k=1,k v ξ ki ξ kj ξ vj = ξ vi + crosstalk(v, i) If crosstalk(v, i) < 1 then sgn(h vi ) = sgn(ξ vi + crosstalk(v, i)) = sgn(ξ vi ) = ξ vi

Formal analysis (and VIII) Analogously to the case l = 1, if the number of correct bits of a pattern ξ v L exceeds that of incorrect bits, the network is able to correct the bits in error (the pattern ξ v L is indeed a memory) When is crosstalk(v, i) < 1? If l << n nothing is wrong, but as we have l closer to n, the crosstalk term is quite strong compared to ξ v : there is no stability The million euro question: given a network of n neurons, what is the maximum value for l? What is the capacity of the network?

Capacity analysis For random uncorrelated patterns (for convenience): l max = { n 2ln(n)+lnln(n) : perfect recall 0,138n : small errors ( 1,6 %) Realistic patterns will not be random, though some coding can make them more so An alternative setting is to have negatively correlated patterns: ξ v ξ u 0, v u This condition is equivalent to stating that the number of different bits is at least the number of equal bits

Spurious states When some pattern ξ is stored, the pattern ξ is also stored. In consequence, all initial configurations of ξ where the majority of bits are incorrect will end up in ξ! Hardly surprising: from the point of view of ξ, there are more correct bits than incorrect bits The network can be made more stable (less spurious states) if we set w ii = 0 for all i W n n = 1 n l k=1 ξ k ξ k l n I n n = 1 n (EEt li n n )

Example ξ 1 = (+1 1 + 1), ξ 2 = ( 1 + 1 1), E 3 2 = +1 1 (+1 1 + 1) = +1 +1 1 +1 1 +1 1 +1 1 +1 +1 1 1 +1 +1 1 1 +1 1 ( 1 + 1 1) = +1 1 +1 1 +1 1 +1 1 +1 1 3 0 2 +2 2 0 2 +2 2 0 (+1 + 1 + 1) ( 1 1 + 1) (+1 1 1) (+1 + 1 1) (+1 1 + 1), ( 1 1 1) ( 1 + 1 + 1) ( 1 + 1 1) The network is able to correct 1 erroneous bit

2. Memorias asociativas: 20 field Practical applications (I) Una red de 120 nodos (12x10) CONJUNTO DE OCHO PATRONES SECUENCIA DE SALIDA PARA LA ENTRADA "3" DISTORSIONADA de iteraciones. 0 1 2 3 4 5 6 7 A Hopfield network that stores and recovers several digits UPV: 18 de diciembre de 2001 Redes Neuronales Artificiales. 2001-2002 Facultat d Informática UPV: 18 de diciembre de 2001 2. Memorias asociativas: 22 4. Otros paradigmas Conexionistas 2. Memorias asociativas: 23

Practical applications (II) A Hopfield network that stores fighter aircraft shapes is able to reconstruct them. The network operates on the video images supplied by a camera on board a USAF bomber

The energy function (I) If the weights are symmetric, there exists a so-called energy function: H(s) = i=1 j>i w ij s i s j which is a function of the state of the system. The central property of H is that it always decreases (or remains constant) as the system evolves Therefore the attractors (the stored memories) have to be the local minima of the energy function

The energy function (II) Proposition 1: A change of state in one neuron i yields a change H < 0 Let h i = n j=1 w ij s j. If neuron i changes is because s i sgn(h i ) Therefore H = ( s i )h i ( s i h i ) = 2s i h i < 0 Proposition 2: H(s) n i=1 j=1 w ij (the energy function is lower-bounded)

The energy function (III) The energy function H is also useful to derive the proposed weights Consider the case of only one pattern ξ: we want H to be minimum when the correlation between the current state s and ξ is maximum and equal to ξ, s 2 = ξ, ξ 2 = ξ 2 = n Therefore we choose H(s) = K 1 n ξ, s 2, where K > 0 is a suitable constant For the many-pattern case, we can try to make each of the ξ i into local minima of H: H(s) = K 1 n = K ξ v, s 2 = K 1 n v=1 i=1 j=1 ( 1 n ( ) 2 ξ vi s i = K 1 n v=1 i=1 ) ξ vi ξ vj s i s j = K v=1 i=1 j=1 ( v=1 i=1 w ij s i s j = 1 2 ξ vi s i ) i=1 ξ vj s j j=1 w ij s i s j j>i

The energy function (and IV) Exercises: 1. In the previous example, a) Check that H(s) = 2 3 (s 1s 2 s 2 s 3 + s 1 s 3 ) b) Show that H decreases as the network evolves towards the stored patterns 2. Prove Proposition 2

Spurious states (continued) We have not shown that the explicitly stored memories are the only attractors of the system We have seen that the reversed patterns ξ are also stored (they have the same energy than ξ) There are stable mixture states, linear combinations of an odd number of patterns, e.g.: ξ (mix) i = sgn α ±ξ vk,i k=1, 1 i n, α odd

Dynamics (continued) Let us change the dynamics of a neuron by using an activation with memory: s i (t + 1) = +1 if h i (t + 1) > 0 h i (t) if h i (t + 1) = 0 1 if h i (t + 1) < 0 where h i (t + 1) = n j=1 w ij s j (t)

Application to optimization problems Problem: find the optimum (or close to) of a scalar function subject to a set of restrictions e.g.: The traveling-salesman problem (TSP) or graph bipartitioning (GBP); often we deal with NP-complete problems. TSP(n) = tour n cities with the minimum total distance; TSP(60) 6,93 10 79 possible valid tours. In general, TSP(n) = n! 2n. GBP(n) = given a graph with an even number n of nodes, partition it in two subgraphs of n/2 nodes each such that the number of crossing edges is minimum Idea: setup the problem as a function to be minimized, setting the weights so that the obtained function is the energy function of a Hopfield network

Application to graph bipartitioning Representation: n graph nodes = n neurons, with s i = { +1 : node i in left part 1 : node i in right part Connectivity matrix: c ij = 1 indicates a connection between nodes i and j.

Application to graph bipartitioning H(S) = c ij S i S j i=1 j>i }{{} connections between parts + α 2 S i i=1 }{{} bipartition = H(S) = i=1 j>i (c ij α)s i S j i=1 j>i w ij S i S j + i=1 θ i S i = w ij = c ij α, θ i = 0 (general idea: w ij is the coefficient of S i S j in the energy function)

Application to the traveling-salesman problem Representation: 1 city n neurons, coding for position; hence n cities = n 2 neurons. S n n = (S ij ) (network states); D n n (distances between cities, given) with S ij = +1 if the city i is visited in position j and 1 otherwise. Restrictions on S: 1. A city cannot be visited more than once every row i must have only one +1: S ij S ik = 0 i=1 j=1 k>j R 1 (S) 2. A city can only be visited in sequence (no ubiquity gift every column j must have only one +1: R 1 (S t ) 3. Every city must be visited ( 2 S ij n) = 0 i=1 j=1 there must be n +1s overall: R 2 (S)

Application to the traveling-salesman problem 4. The tour has to be the shortest possible D ij S ik (S j,k 1 + S j,k+1 ) k=1 i=1 j>i Total sum of distances in the tour: R 3 (S, D) H(S) = w ij,kl S ij S kl + θ ij S ij αr 1 (S)+βR 1 (S t )+γr 2 (S)+ɛR 3 (S, D) i=1 j=1 k>i l>j i=1 j>i = w ij,kl = αδ ik (1 δ jl ) βδ jl (1 δ ik ) γ ɛd ik (δ j,l+1 + δ j,l 1 ) and θ ij = γn This is a continuous Hopfield network: S ij ( 1, 1) since S ij := tanh χ (h ij θ ij ) Problem: set up good values for α, β, γ, ɛ > 0, χ > 0

Application to the traveling-salesman problem Example for n = 5: 120 valid tours (12 different), 25 neurons 2 25 potential attractors; actual number of them: 1,740 (among them, the valid 120). 1,740-120 = 1,620 spurious attractors (invalid tours) The valid tours correspond to the states of lower energy (correct design) But... we have a network with 30 times more invalid tours than valid ones: a random initialization will end up in an invalid tour with probability 0.966 Moreover, we aim for the best tour! A possible workaround is to use stochastic units with simulated annealing