Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs output net input (local field) threshold 3 4 Equations Hopfield Network Net input: New neural state: # h i = % $ n j=1 h = Ws θ ( ) ( ) " s i = σ h i s " = σ h & ( θ ' Symmetric weights: = w ji No self-action: w ii = 0 Zero threshold: θ = 0 Bipolar states: s i { 1, +1} Discontinuous bipolar activation function: σ( h) = sgn( h) = % 1, h < 0 $ & +1, h > 0 5 6 1
Part 3A: Hopfield Network What to do about h = 0? There are several options: σ(0) = +1 σ(0) = 1 σ(0) = 1 or +1 with equal probability h i = 0 no state change (s iʹ = s i ) Not much difference, but be consistent Last option is slightly preferable, since symmetric Positive Coupling Positive sense (sign) Large strength 7 8 Negative Coupling Negative sense (sign) Large strength Either sense (sign) Little strength Weak Coupling 9 10 State = 1 & Local Field < 0 State = 1 & Local Field > 0 h < 0 h > 0 11 12 2
Part 3A: Hopfield Network State Reverses State = +1 & Local Field > 0 h > 0 h > 0 13 14 State = +1 & Local Field < 0 State Reverses h < 0 h < 0 15 16 NetLogo Demonstration of Hopfield State Updating Run Hopfield-update.nlogo 17 Hopfield Net as Soft Constraint Satisfaction System States of neurons as yes/no decisions Weights represent soft constraints between decisions hard constraints must be respected soft constraints have degrees of importance Decisions change to better respect constraints Is there an optimal set of decisions that best respects all constraints? 18 3
Part 3A: Hopfield Network Convergence Demonstration of Hopfield Net Dynamics I Run Hopfield-dynamics.nlogo Does such a system converge to a stable state? Under what conditions does it converge? There is a sense in which each step relaxes the tension in the system But could a relaxation of one neuron lead to greater tension in other places? 19 20 Quantifying Tension If > 0, then s i and want to have the same sign (s i = +1) If < 0, then s i and want to have opposite signs (s i = 1) If = 0, their signs are independent Strength of interaction varies with Define disharmony ( tension ) D ij between neurons i and j: D ij = s i D ij < 0 they are happy D ij > 0 they are unhappy 21 Total Energy of System The energy of the system is the total tension (disharmony) in it: E{ s} = D ij = s i ij ij = 1 2 s i = 1 2 i i 22 j i = 1 2 st Ws s i j Review of Some Vector Notation " x 1 % $ ' x = $ ' = x 1 x n # & x n x T y = n i=1 ( ) T (column vectors) x i y i = x y " x 1 y 1 x 1 y n % xy T $ ' = $ ' # x m y 1 x m y n & x T My = m n i=1 j=1 x i M ij y j (inner product) (outer product) (quadratic form) 23 Another View of Energy The energy measures the degree to which the neurons states are in disharmony with their local fields (i.e. of opposite sign): E{ s} = 1 2 s i 24 i = 1 2 s i i j = 1 2 s i h i i = 1 2 st h j 4
Part 3A: Hopfield Network Do State Changes Decrease Energy? Suppose that neuron k changes state Change of energy: ΔE = E { s #} E{ s} = s # i s # j + ij s i 25 = s # k w kj + s k w kj j k = s # k s k ij j k ( ) w kj = Δs k h k < 0 j k Energy Does Not Increase In each step in which a neuron is considered for update: E{s(t + 1)} E{s(t)} 0 Energy cannot increase Energy decreases if any neuron changes Must it stop? 26 Proof of Convergence in Finite Time There is a minimum possible energy: The number of possible states s { 1, +1} n is finite Hence E min = min {E(s) s {±1} n } exists Must reach in a finite number of steps because only finite number of states Conclusion If we do asynchronous updating, the Hopfield net must reach a stable, minimum energy state in a finite number of updates This does not imply that it is a global minimum 27 28 Lyapunov Functions A way of showing the convergence of discreteor continuous-time dynamical systems For discrete-time system: need a Lyapunov function E ( energy of the state) E is bounded below (E{s} > E min ) ΔE < (ΔE) max 0 (energy decreases a certain minimum amount each step) then the system will converge in finite time Problem: finding a suitable Lyapunov function 29 Example Limit Cycle with Synchronous Updating w > 0 w > 0 30 5
Part 3A: Hopfield Network The Hopfield Energy Function is Even A function f is odd if f ( x) = f (x), for all x A function f is even if f ( x) = f (x), for all x Observe: Conceptual Picture of Descent on Energy Surface E { s} = 12 ( s) T W( s) = 12 st Ws = E {s} 31 (fig. from Solé & Goodwin) 32 Energy Surface + Flow Lines Energy Surface (fig. from Haykin Neur. Netw.) 33 Flow Lines (fig. from Haykin Neur. Netw.) 35 34 Bipolar State Space Basins of Attraction (fig. from Haykin Neur. Netw.) 36 6
Part 3A: Hopfield Network Basins in Bipolar State Space Demonstration of Hopfield Net Dynamics II Run initialized Hopfield.nlogo energy decreasing paths 37 (fig. from Solé & Goodwin) 39 Restoration 38 Restoration Storing Memories as Attractors (fig. from Arbib 1995) 40 (fig. from Arbib 1995) 42 Restoration (fig. from Arbib 1995) 41 7
Part 3A: Hopfield Network Restoration Restoration (fig. from Arbib 1995) 43 (fig. from Arbib 1995) 44 Completion Completion (fig. from Arbib 1995) 45 (fig. from Arbib 1995) 46 Completion Completion (fig. from Arbib 1995) 47 (fig. from Arbib 1995) 48 8
Part 3A: Hopfield Network Completion Association (fig. from Arbib 1995) 49 (fig. from Arbib 1995) 50 Association Association (fig. from Arbib 1995) 51 (fig. from Arbib 1995) 52 Association Association (fig. from Arbib 1995) 53 (fig. from Arbib 1995) 54 9
Part 3A: Hopfield Network Applications of Hopfield Memory restoration completion generalization association Hopfield Net for Optimization and for Associative Memory For optimization: we know the weights (couplings) we want to know the minima (solutions) For associative memory: we know the minima (retrieval states) we want to know the weights 55 56 Hebb s Rule When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. Donald Hebb (The Organization of Behavior, 1949, p. 62) Neurons that fire together, wire together 57 Hebbian Learning: Imprinted 58 Hebbian Learning: Partial Reconstruction Mathematical Model of Hebbian Learning for One # Let W ij = x x, if i j i j $ % 0, if i = j Since x i x i = x i 2 =1, W = xx T I For simplicity, we will include self-coupling: W = xx T 59 60 10
Part 3A: Hopfield Network A Single Imprinted is a Stable State Suppose W = xx T Then h = Wx = xx T x = nx since n ( ) 2 x T x = x i 2 = ±1 i=1 Hence, if initial state is s = x, then new state is sʹ = sgn (n x) = x For this reason, scale W by 1/n May be other stable states (e.g., x) n i=1 = n Questions How big is the basin of attraction of the imprinted pattern? How many patterns can be imprinted? Are there unneeded spurious stable states? These issues will be addressed in the context of multiple imprinted patterns 61 62 Imprinting Multiple s Let x 1, x 2,, x p be patterns to be imprinted Define the sum-of-outer-products matrix: W ij = 1 n p k=1 x i k x j k ( ) T W = 1 n x k x k p k=1 63 Definition of Covariance Consider samples (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) Let x = x k and y = y k Covariance of x and y values : C xy = ( x k x )( y k y ) = x k y k x y k x k y + x y = x k y k x y k x k y + x y = x k y k x y x y + x y C xy = x k y k x y 64 Weights & the Covariance Matrix Sample pattern vectors: x 1, x 2,, x p Covariance of i th and j th components: C ij = x i k x j k x i x j If i : x i = 0 (±1 equally likely in all positions) : C ij = x i k x j k nw = pc p = 1 p x i k x j k k =1 Characteristics of Hopfield Memory Distributed ( holographic ) every pattern is stored in every location (weight) Robust correct retrieval in spite of noise or error in patterns correct operation in spite of considerable weight damage or noise 65 66 11
Part 3A: Hopfield Network Stability of Imprinted Memories Demonstration of Hopfield Net Run Malasri Hopfield Demo 67 Suppose the state is one of the imprinted patterns x m Then: h = Wx m = 1 n x k x k [ ( ) T k ] x m = 1 n x k ( x k ) T x m k = 1 x m n ( x m ) T x m + 1 n x k ( x k ) T x m = x m + 1 n ( x k x m )x k 68 Interpretation of Inner Products x k x m = n if they are identical highly correlated x k x m = n if they are complementary highly correlated (reversed) x k x m = 0 if they are orthogonal largely uncorrelated x k x m measures the crosstalk between patterns k and m 69 Cosines and Inner products u v = u v cosθ uv If u is bipolar, then u 2 = u u = n Hence, u v = n n cosθ uv = n cosθ uv Hence h = x m + x k cosθ km 70 θ uv u v Conditions for Stability Stability of entire pattern : % x m = sgn' x m + & ( x k cosθ km * ) Stability of a single bit : % x m i = sgn' x m i + & ( x k i cosθ km * ) Sufficient Conditions for Instability (Case 1) Suppose x i m = 1. Then unstable if : ( 1) + x k i cosθ km > 0 x i k cosθ km >1 71 72 12
Part 3A: Hopfield Network Sufficient Conditions for Instability (Case 2) Suppose x i m = +1. Then unstable if : ( +1) + x k i cosθ km < 0 x i k cosθ km < 1 Sufficient Conditions for Stability x k i cosθ km 1 The crosstalk with the sought pattern must be sufficiently small 73 74 Capacity of Hopfield Memory Depends on the patterns imprinted If orthogonal, p max = n but every state is stable trivial basins So p max < n Let load parameter α = p / n 75 equations Single Bit Stability Analysis For simplicity, suppose x k are random Then x k x m are sums of n random ±1 binomial distribution Gaussian in range n,, +n with mean µ = 0 and variance σ 2 = n Probability sum > t: ) 1 1 erf # t &, 2 + % (. * $ 2n '- [See Review of Gaussian (Normal) Distributions on course website] 76 Approximation of Probability ( ) Let crosstalk C i m = 1 n x i k x k x m We want Pr C i m >1 Note : nc i m = { } = Pr{ nc m i > n} p n k=1 j=1 x i k x j k x j m A sum of n(p 1) np random ±1s Probability of Bit Instability { } = 1 1 erf n + 2 % Pr nc i m > n ) # &, (. * + $ 2np '-. [ 1 erf( n 2p) ] [ 1 erf( 1 2α )] = 1 2 = 1 2 Variance σ 2 = np 77 (fig. from Hertz & al. Intr. Theory Neur. Comp.) 78 13
Part 3A: Hopfield Network Tabulated Probability of Single-Bit Instability P error 0.1% 0.105 0.36% 0.138 1% 0.185 5% 0.37 10% 0.61 α Spurious Attractors Mixture states: sums or differences of odd numbers of retrieval states number increases combinatorially with p shallower, smaller basins basins of mixtures swamp basins of retrieval states overload useful as combinatorial generalizations? self-coupling generates spurious attractors Spin-glass states: not correlated with any finite number of imprinted patterns occur beyond overload because weights effectively random (table from Hertz & al. Intr. Theory Neur. Comp.) 79 80 Basins of Mixture States Fraction of Unstable Imprints (n = 100) x k 1 x mix x k 3 x k 2 ( ) x i mix = sgn x i k 1 + x i k 2 + x i k 3 81 (fig from Bar-Yam) 82 Number of Stable Imprints (n = 100) Number of Imprints with Basins of Indicated Size (n = 100) (fig from Bar-Yam) 83 (fig from Bar-Yam) 84 14
Part 3A: Hopfield Network Summary of Capacity Results Absolute limit: p max < α c n = 0.138 n If a small number of errors in each pattern permitted: p max n If all or most patterns must be recalled perfectly: p max n / log n Recall: all this analysis is based on random patterns Unrealistic, but sometimes can be arranged III B 85 15