Chapter 10 Associative Memory Networks We are in for a surprise though!

Similar documents
Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

MAT Mathematics in Today's World

Artificial Intelligence Hopfield Networks

Hopfield Network for Associative Memory

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Hopfield Neural Network

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Sampling Distribution Models. Central Limit Theorem

Storage Capacity of Letter Recognition in Hopfield Networks

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

Probability and the Second Law of Thermodynamics

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Sampling Distribution Models. Chapter 17

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

STA Module 8 The Sampling Distribution of the Sample Mean. Rev.F08 1

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

The Mean Value Theorem. Bad examples

Day 1: Over + Over Again

Conditional probabilities and graphical models

Going between graphs of functions and their derivatives: Mean value theorem, Rolle s theorem, and intervals of increase and decrease

Week 4: Hopfield Network

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

Learning with Momentum, Conjugate Gradient Learning

Neural Nets and Symbolic Reasoning Hopfield Networks

Experiment 1: The Same or Not The Same?

Chapter 18. Sampling Distribution Models /51

Lecture 7 Artificial neural networks: Supervised learning

Momentum & Collisions

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Chapter 3 ALGEBRA. Overview. Algebra. 3.1 Linear Equations and Applications 3.2 More Linear Equations 3.3 Equations with Exponents. Section 3.

Physics Motion Math. (Read objectives on screen.)

Activity 3.2: What holds the atoms of a molecule together?

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Replay argument. Abstract. Tanasije Gjorgoski Posted on on 03 April 2006

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

MARS, THE RED PLANET.

Solving with Absolute Value

Logic Learning in Hopfield Networks

Name: Firas Rassoul-Agha

Conceptual Explanations: Simultaneous Equations Distance, rate, and time

1 Closest Pair of Points on the Plane

Lecture 15: Exploding and Vanishing Gradients

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

A Historic Look at Time Lost on the Keck Telescopes

A.I.: Beyond Classical Search

STA Module 4 Probability Concepts. Rev.F08 1

Learning in State-Space Reinforcement Learning CIS 32

Notes on Mathematics Groups

1 Lecture 25: Extreme values

Eclipses. Solar and Lunar

Singular Value Decomposition

SESSION 5 Descriptive Statistics

Lecture 18: Quantum Information Theory and Holevo s Bound

Figure 5.1: Force is the only action that has the ability to change motion. Without force, the motion of an object cannot be started or changed.

Lecture 26 Fundamentals of Physics Phys 120, Fall 2015 Quantum Fields

Roberto s Notes on Linear Algebra Chapter 4: Matrix Algebra Section 7. Inverse matrices

Chapter Seven Notes: Newton s Third Law of Motion Action and Reaction

When they compared their results, they had an interesting discussion:

Lesson 5b Solving Quadratic Equations

Machine Learning. Neural Networks

Lesson 1: Forces. Fascinating Education Script Fascinating Intro to Chemistry Lessons. Slide 1: Introduction. Slide 2: Forces

String Theory And The Universe

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Surreal Numbers An Introduction

An automatic report for the dataset : affairs

CHAPTER 3. Pattern Association. Neural Networks

Math 101 Review of SOME Topics

V. Graph Sketching and Max-Min Problems

MAT Mathematics in Today's World

Name Date Block LESSON CLUSTER 6: Expansion and Contraction

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Business Statistics. Lecture 5: Confidence Intervals

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

ASTRO 114 Lecture Okay. We re now gonna continue discussing and conclude discussing the entire

Lecture 10: Generalized likelihood ratio test

Experiment 7: Conservation of Energy and Momentum in a Collision

12 - Nonparametric Density Estimation

Implicit Differentiation Applying Implicit Differentiation Applying Implicit Differentiation Page [1 of 5]

Markov Chains and MCMC

Math Lecture 23 Notes

Equipotential and Electric Field Mapping

Equipotential and Electric Field Mapping

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Gravitational Potential Energy

Graphing Linear Inequalities

Finding Limits Graphically and Numerically

Explain how Earth's movement and the moon's orbit cause the phases of the moon. Explain the difference between a solar eclipse and a lunar eclipse.

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

Quantum Entanglement. Chapter Introduction. 8.2 Entangled Two-Particle States

Chapter 26: Comparing Counts (Chi Square)

Transcription:

We are in for a surprise though! 11 If we return to our calculated W based on the stored ξ and use x = (-1,1,-1,-1,-1,1) T as the input then we get the output y = (-1,1,-1,-1,-1,1) T, i.e. the same output as input! But the x used as input was not stored, then why was it retrieved? We notice that the x used is identical to ξ, except that the sign of each element has been changed. It is generally true that when we store a vector ξ in a weight matrix W, then we also store its very opposite. You can easily check this with the general relationship on 100. But there is even more surprise in store for us. When we store several vectors, then we also store certain combinations of these vectors and some other vectors whose relationship to the fundamental memories is not obvious. All these inadvertently stored vectors are called spurious attractors. Let us look at some spurious attractors.

An impressive assortment of spurious attractors generated by learning! (From Haykin, 700) 12

13 Then what does all this mean? It means that you cannot store memories that are similar to each other, because if you have a slightly corrupted version of one of two similar memories, then you can easily end up in the other one. It also means that even if the stored memories are not similar to each other, there will be other, spurious memories in-between. And if your corrupted initial input is closer to its own fundamental memory than to all the other fundamental memories it is still not certain that the proper fundamental memory will be retrieved, it might well be a spurious attractor instead. The risk of retrieving the opposite of a fundamental memory is usually not great your initial input has to be very corrupted for that to happen. All this means that the associative memory networks as we have described them are far from ideal from a legal witness point of view. But their shortcomings are not unheard of from human experience. So their shortcomings don t rule them out as first-order models of human memory.

16 The energy landscape There is an energy associated with the states of an associative memory network. It is called energy because Hopfield, who used the energy concept to describe the retrieval process in an associative memory network (in the beginning of the 1980 s) is a physicist and saw the purely formal similarity with energy functions in mechanics. Each attractor is a minimum, i.e. a lower point than its immediate surroundings, in this energy landscape. If the retrieval process starts from a corrupted memory, then it starts at a high energy and, like a ball in a real landscape, it rolls down to a minimum, hopefully to the right one. A problem is that the ball rolls in a high-dimensional landscape, which makes it difficult to illustrate on paper. The energy of the spurious attractors is generally higher than the energy of the fundamental memories, so if you can feel this energy then you have a chance to say that what you seem to remember might be wrong. The opposite attractors of the fundamental memories have the same low energies as the attractors themselves so in this case you are left without assistance.

, Energy 17 The energy of a particular state x is defined as The minus sign ensures that we have minima for the ball to roll into, rather than peaks to climb. Below is an imaginary energy landscape, from Lytton.

18, Energy The four possible states of a two-dimensional memory network are shown. One is a fundamental memory, one is its opposite and two lie on a limit cycle as Lytton sees it. Hopfield who updates one element at a time would not have a limit cycle.

A higher dimensional example 19 Let us study an example. We begin by storing one vector as a fundamental memory. The particular choice of vector is not important, let us choose ξ1 = [1 1 1 1 1 1 1 1 1 1]'. The energy minimum has two minima at E = -90, one for x = [1 1 1 1 1 1 1 1 1 1]' and one for x = [-1-1 -1-1 -1-1 -1-1 -1-1]', just as was stated before. There are several more energy levels, but the levels are discrete since the elements can only assume the values 1 and 1. If we store two vectors, ξ1 = [1 1 1 1 1 1 1 1 1 1]' and ξ2 = [1 1 1 1 1-1 -1-1 -1-1]', then as expected we have minima at E = -80 for ξ1 and ξ2 and their opposites.

20 A higher dimensional example If we store three vectors, ξ1 = [1 1 1 1 1 1 1 1 1 1]' and ξ2 = [1 1 1 1 1-1 -1-1 -1-1]' and ξ3=[1-1 1-1 1-1 1-1 1-1]', then we have minima at E = -74 but only at ξ2 and ξ3. The energy at ξ1 is slightly higher, at 70. If we store four vectors, ξ1 = [1 1 1 1 1 1 1 1 1 1]' and ξ2 = [1 1 1 1 1-1 -1-1 -1-1]', ξ3 = [1-1 1-1 1-1 1-1 1-1]' and ξ4 = [1 1-1 -1 1 1-1 -1-1 1]', then we have minima at E = -68 but only at ξ2, ξ3 and ξ4. Again the energy at ξ1 is slightly higher, at 60. There are more differences between are four cases which become clear when we study the histograms for the energies. Let us do that.

A higher dimensional example 21

22 A higher dimensional example We see that as the number of fundamental memories increase we get many more intermediate energy levels. Also there is much less difference between the minimum energy levels and other low energy levels. In other words, the energy minima of the attractors are not as distinct when we have more fundamental memories. We can also expect some spurious attractors to have appeared, i.e. attractors that don t correspond to fundamental memories (others than opposite attractors, which are always present). We might wonder if we have lost ξ1 as a fundamental memory, but that is happily not the case. If we let the input to the network be x = ξ1 then the output will be ξ1. This is of course not a big feat, but we haven t lost the fundamental memory. If we let the input be ξ1 with any one element changed from 1 to 1 we can calculate the energy level at 36, thus all these slightly corrupted versions of ξ1 lie on a higher energy level than ξ1 itself. We would expect to retrieve ξ1 from such corrupted versions.

23 Retrieval of images in memory Three very different images are fundamental memories. Corrupted (noisy) versions of the first and third and an incomplete version of the second are keys to start the retrieval process (from Lytton).