Neural Nets and Symbolic Reasoning Hopfield Networks

Similar documents
A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Artificial Intelligence Hopfield Networks

Introduction to Neural Networks

Nonmonotonic Logic and Neural Networks

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

Using a Hopfield Network: A Nuts and Bolts Approach

Stochastic Networks Variations of the Hopfield model

Neural networks. Chapter 19, Sections 1 5 1

CHAPTER 3. Pattern Association. Neural Networks

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Associative Memories (I) Hopfield Networks

Learning and Memory in Neural Networks

Hopfield Network for Associative Memory

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Artificial Neural Networks Examination, March 2004

Markov Chains and MCMC

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Neural networks. Chapter 20, Section 5 1

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Neural networks. Chapter 20. Chapter 20 1

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

Artificial Neural Networks Examination, June 2004

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

Introduction to Artificial Neural Networks

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Lecture 7 Artificial neural networks: Supervised learning

Week 4: Hopfield Network

How can ideas from quantum computing improve or speed up neuromorphic models of computation?

Artificial Neural Networks Examination, June 2005

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

AI Programming CS F-20 Neural Networks

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

CS:4420 Artificial Intelligence

Chapter 9: The Perceptron

CSC242: Intro to AI. Lecture 21

Computational Intelligence Lecture 6: Associative Memory

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

Unit 8: Introduction to neural networks. Perceptrons

arxiv: v2 [nlin.ao] 19 May 2015

Credit Assignment: Beyond Backpropagation

CMSC 421: Neural Computation. Applications of Neural Networks

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Motivation. Lecture 23. The Stochastic Neuron ( ) Properties of Logistic Sigmoid. Logistic Sigmoid With Varying T. Logistic Sigmoid T = 0.

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Machine Learning. Neural Networks

Neural Networks biological neuron artificial neuron 1

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

Neural networks: Unsupervised learning

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Neural Network Essentials 1

Neural Networks Lecture 6: Associative Memory II

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 4: Perceptrons and Multilayer Perceptrons

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 4: Feed Forward Neural Networks

Storage Capacity of Letter Recognition in Hopfield Networks

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Artificial Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Artificial Neural Networks

Associative Memory : Soft Computing Course Lecture 21 24, notes, slides RC Chakraborty, Aug.

Linear Regression, Neural Networks, etc.

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Part 8: Neural Networks

IN THIS PAPER, we consider a class of continuous-time recurrent

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

HOPFIELD neural networks (HNNs) are a class of nonlinear

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Knowledge Extraction from DBNs for Images

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Probabilistic Models in Theoretical Neuroscience

Transcription:

Neural Nets and Symbolic Reasoning Hopfield Networks

Outline The idea of pattern completion The fast dynamics of Hopfield networks Learning with Hopfield networks Emerging properties of Hopfield networks Conclusions 2

The idea of pattern completion 3

Example from visual recognition Noisy or underspecified input Mechanism of pattern completion (using stored images) Stored patterns are addressable by content, not pointers (as in traditional computer memories) 4

Example from semantics: A fast car a. a fast car [one that moves quickly] b. a fast typist [a person that performs the act of typing quickly] c. a fast book [one that can be read in a short time] d. a fast driver [one who drives quickly] 5

Example from semantics: A red apple A red apple? What color is an apple? Q What color is its peel? Q 2 What color is its pulp? a. a red apple [red peel] b. a sweet apple [sweet pulp] c. a reddish grapefruit [reddish pulp] d. a white room/ a white house [inside/outside] 6

2 The fast dynamics of Hopfield networks 7

Abstract description of neural networks A neural network N can be defined as a quadruple <S,F,W,G>: S: Space of all possible states W: Set of possible configurations. w W describes for each pair i,j of "neurons" the connection w ij between i and j F: Set of activation functions. For a given configuration w W: the function f w F describes how the neuron activities spread through that network (fast dynamics) G: Set of learning functions (slow dynamics) 8

Dynamic systems Discrete dynamic systems r r s( t + ) = g( s( t)) s(t) is a vector of the space of states S Continuous dynamic systems d dt r r s( t) g( s( t)) = the function g describes the fast dynamics Dynamical Systems + Neural Networks =Neurodynamics Tools from dynamical systems, statistics and statistical physics can be used. Very rich field. The triple <S,F,W> corresponds to the fast neurodynamics 9

The importance of recurrent systems What can recurrent networks do? Associative memories Pattern completion Noise removal General networks (can implement everything feedforward networks can do, and even emulate Turing machines) Spatio-temporal pattern recognition Dynamic reconstruction 0

Hopfield networks J.J.Hopfield (982), "Neural networks and physical systems with emergent collective computational abilities", Proceedings of the National Academy of Sciences 79, 2554-2558. An autoassociative, fully connected network with binary neurons, asynchronous updates and a Hebbian learning rule. The classic recurrent network Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons). The physical meaning of content-addressable memory is described by an appropriate phase space flow of the state of the system.

Concise description of the fast dynamics Let the interval [-,+] be the working range of each neuron +: maximal firing rate 0: resting - : minimal firing rate) Step Step 2 Step 3 Step 4 S = [-, ] n w ij = w ji, w ii = 0 ASYNCHRONOUS UPDATING: θ (Σ j w ij s j (t)), if i = rand(,n) s i(t+) = s i (t), otherwise Step 9865 Step 98652 2

Definition of resonances of a dynamical system A state 0 s in S is called a resonance of a dynamic system [S, f] iff. f(s) = s (equilibrium) 2. For each ε>0 there exists a 0<δ ε such that for all n f n (s )-s < ε whenever s -s < δ (stability) 3. For each ε>0 there exists a 0<δ ε such that lim n f n (s ) = s whenever s -s < δ (asymptotic stability) 3

Energy and convergence How can we be sure the dynamics converges to any attractor? Why cannot it enter an endless loop s s 2 s 3 s...? Answer (and the secret of the Hopfield network s popularity): the energy function (also called Ljapunov function). An energy function (Lyapunov function) always decreases monotonically as we change state and is bounded below. The descent to lower energy levels will have to end eventually at a local minimum. Energy landscapes are a popular (and somewhat dangerous) analogy. 4

Resonance systems Definition A neural network [S,W,F] is called a resonance system iff lim n f n (s) exists and is a resonance for each s S and f F. Theorem (Cohen & Großberg 983) Hopfield networks are resonance systems. (The same holds for a large class of other systems: The McCulloch-Pitts model (943), Cohen-Grossberg models (983), Rumelhart's Interactive Activation model (986), Smolensky's Harmony networks (986), etc.) 5

Ljapunov-function Lemma (Hopfield 982) The function E(s) = -½ i,j w ij s i s j is a Ljapunov-function of the system in the case of asynchronous updates. That means: when the activation state of the network changes, E can either decrease or remain the same. Consequence: The output states lim n f n (s) can be characterized as the local minima of the Ljapunov-function. Remark: E(s) = -½ i,j w ij s i s j = - i<j w ij s i s j (symmetry!) 6

Proof For the proof we assume a discrete working space (s i = ± ) Let node be selected for update: s (t+) = θ (Σ j w j s j (t)); s j (t+) = s j (t) for j Case : s (t+) = s (t), then E(t+) E(t) = 0 Case 2: s (t+) = -s (t) [binary threshold!, working space {-, +}] We have E(s(t)) = - i<j w ij s i (t) s j (t). For the difference E(t+) E(t) only the terms with index i=, <j matter. Consequently, E(t+) E(t) = - j> w j s (t+) s j (t) + j> w j s (t) s j (t) = - j w j s (t+) s j (t) - j w j s (t+) s j (t) = -2 s (t+) j w j s j (t) < 0 because the two factors have the same sign! 7

Example 0.2 0.3 x - y What is the Ljapunov-function E(x,y) for this system? What is the global minimum? (assuming binary activations ±) 8

Example 0.2 0.3 E = - i<j w ij s i s j x E(x,y) = 0.2x 0.3y + xy - y x y E - - - -.5 -.9 -..5 9

Global Minima Theorem 2 (Hopfield 982) The output states lim n f n (s) can be characterized as the global minima of the Ljapunov-function if certain stochastic update functions f are considered ("simulated annealing"). E start A asynchronous updates B asynchronous updates with fault What we need is a probability distribution for the states P(s) for each time and a stochastic update rule which respects that there is some stochastic disturbance during updating the activation vectors. 20

Simulated annealing How to escape local minima? One idea: add randomness, so that we can go uphill sometimes. Then we can escape shallow local minima and more likely end up in deep minima. We can use a probabilistic update rule. Too little randomness: we end up in local minima. Too much, and we jump around instead of converging. Solution by analogy from thermodynamics: annealing through slow cooling. Start the network in a high temperature state, and slowly decrease the temperature according to an annealing schedule. 2

Deriving a probability distribution What is a plausible probability distribution for activation states?. Assume that the probability of an activation state is an function of its energy: P(s) = f(e(s)) 2. Assume independent probability distributions for independent subnets E(s s') = E(s) + E(s'); P(s s') = P(s) P(s') (*) f(e+e') = f(e) f(e') Assume f is a continuous function that satisfies f, then it must be an exponential function. Hence, P(s) = const e k E(s), or with k = - /T: P(s) = const e -E(s)/T 22

A probabilistic update rule Assume (without restricting generality) that unit is selected for updating at time t and has activity at time t. Should it flip from to + at time t+? The energy E = i<j w ij s i s j is relevant! t: s =, s 2,, s n t+ s =, s 2,, s n s = +, s 2,, s n Energy E E 2 j w j s j Prob c exp( E/T) c exp(( E+2 j w j s j )/T) If j w j s j is positive, then flip the activation with a probability that increases exponentially with the energy difference 2 j w j s j : P(s = +) = σ(2( j w j s j )/T) with the sigmoid function σ. Classical rule for T 0 23

The Metropolis algorithm At each step, select a unit and calculate the energy difference E between its current state and its flipped state. If E 0 don't flip the unit. If E < 0, flip the unit with probability σ(- E/T). After around n such steps, lower the temperature further and repeat the cycle again. Geman and Geman (984) If the temperature in cycle k satisfies for every k and T 0 is large enough, then the system will with probability one converge to the minimum energy configuration. 24

3 Learning with Hopfield networks 25

The basic idea Generalized Hebbian rule for a single neuron confronted with a input vector s d, working space S = {-, } w = η s d r d d D Hopfield used this rule for his networks: w ij = η s d d j r i or equivalently w ij = η s d d j s i d D and i j! In this case, the resulting connection matrix can be shown to be w ij = / N Σ d D s d d j s i for i j; zero for i=j 26

Example Consider a network with 3 neurons. Teach the system with the input vector ( ). What is the weight matrix? Take the same system, but now teach the system with two input vectors ( ) and (- - -). Is there a behavioural difference that corresponds by adding the second input vector? Take the activation function to be a binary threshold. 27

28 Example Consider a network with 3 neurons. Teach the system with the input vector ( ). What is the weight matrix? = 0 0 0 w Take the same sytem, but now teach the system with two input vectors ( ) and (- - -). Is there a behavioural difference that corresponds by adding the second input vector? Take the activation function to be a binary threshold. = 0 0 0 w No behavioural difference to the case before

4 Emerging properties of Hopfield networks 29

Auto-associative memory Store a set of patterns { s d } in such a way that when presented with a pattern s x it will respond with the stored pattern that is most similar to it. Maps patterns to patterns of the same type. (e.g. a noisy or incomplete record maps to a clear record). Mechanism of pattern completion: The stored patterns are attractors. If the system starts outside any of the attractors it will begin to move towards one of them. Stored patterns are addressable by content, not pointers (as in traditional computer memories) See the link Hopfield network as associative memory on the website 30

Some properties The inputted patterns of activation are resonances. However, they are not the only resonances of the system The state (-, -,, -) is always a resonance If s is a resonance, so is -s 3

Capacity How many patterns can a n unit network store? The more patterns added, the more crosstalk and spurious states (second. resonances). The larger the network, the greater the capacity. It turns out that the capacity is roughly proportional to n: M = α n, where M is the number of inputted pattern that can be correctly reproduced (with an error probability of p) 32

Phase transition If we want p error = 0.0, then M = 0.05 n. This is an upper bound. At M = 0.38 n patterns the network suddenly breaks down and cannot recall anything useful ( catastrophic forgetting ). The behaviour around α = 0.38 corresponds to a phase transition in physics (solid liquid). Physical analogy: spin glasses. Unit states correspond to spin states in a solid. Each spin is affected by the spins of the others plus thermal noise, and can flip between two states ( and -). 33

5 Conclusions Hopfield nets are "the harmonic oscillator" in modern neurodynamics the idea of resonance systems the idea of simulated annealing the idea of content addressable memory very simple learning theory based on the generalized Hebbian rule. 34