Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Similar documents
Using a Hopfield Network: A Nuts and Bolts Approach

Artificial Intelligence Hopfield Networks

Neural Nets and Symbolic Reasoning Hopfield Networks

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Artificial Neural Networks Examination, March 2004

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Learning and Memory in Neural Networks

AI Programming CS F-20 Neural Networks

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

CHAPTER 3. Pattern Association. Neural Networks

Hopfield Neural Network

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

CMSC 421: Neural Computation. Applications of Neural Networks

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Artificial Neural Networks Examination, June 2005

Optimization and Gradient Descent

Introduction to Neural Networks

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

Week 4: Hopfield Network

Stochastic Networks Variations of the Hopfield model

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Basic Principles of Unsupervised and Unsupervised

Introduction to gradient descent

Input layer. Weight matrix [ ] Output layer

Neural networks. Chapter 20, Section 5 1

Introduction to Machine Learning

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Associative Memories (I) Hopfield Networks

Artificial Neural Networks Examination, June 2004

CS:4420 Artificial Intelligence

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

How can ideas from quantum computing improve or speed up neuromorphic models of computation?

Introduction to Artificial Neural Networks

Deep Belief Networks are compact universal approximators

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

CSC321 Lecture 8: Optimization

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

CSE 5526: Introduction to Neural Networks Hopfield Network for Associative Memory

Neural Networks: Basics. Darrell Whitley Colorado State University

Lecture 7 Artificial neural networks: Supervised learning

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Hopfield Network for Associative Memory

CSC321 Lecture 7: Optimization

Machine Learning. Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

epochs epochs

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Cheng Soon Ong & Christian Walder. Canberra February June 2018

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

Lecture 5: Logistic Regression. Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Artificial Neural Network

arxiv: v2 [nlin.ao] 19 May 2015

Least Mean Squares Regression. Machine Learning Fall 2018

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Revision: Neural Network

Neural Networks Lecture 6: Associative Memory II

Neural Network Training

Least Mean Squares Regression

Performance Surfaces and Optimum Points

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks (Part 1) Goals for the lecture

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Feed-forward Network Functions

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

Multiclass Classification-1

Artificial Intelligence

Data Mining Part 5. Prediction

Hopfield networks. Lluís A. Belanche Soft Computing Research Group

Midterm: CS 6375 Spring 2015 Solutions

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Neural Networks Lecture 4: Radial Bases Function Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks and Deep Learning

Grundlagen der Künstlichen Intelligenz

Markov Chains and MCMC

Neural Networks Lecture 2:Single Layer Classifiers

Artificial Neural Networks. MGS Lecture 2

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

y(x n, w) t n 2. (1)

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Linear & nonlinear classifiers

Artificial Neural Networks

Transcription:

Neural Networks Hopfield Nets and Auto Associators Fall 2017 1

Story so far Neural networks for computation All feedforward structures But what about.. 2

Loopy network Θ z = ቊ +1 if z > 0 1 if z 0 y i = Θ j i w ji y j + b i The output of a neuron affects the input to the neuron Each neuron is a perceptron with +1/-1 output Every neuron receives input from every other neuron Every neuron outputs signals to every other neuron 3

Loopy network Θ z = ቊ +1 if z > 0 1 if z 0 y i = Θ j i w ji y j + b i A symmetric network: w ij = w ji Each neuron is a perceptron with +1/-1 output Every neuron receives input from every other neuron Every neuron outputs signals to every other neuron 4

Hopfield Net Θ z = ቊ +1 if z > 0 1 if z 0 y i = Θ j i w ji y j + b i A symmetric network: w ij = w ji Each neuron is a perceptron with +1/-1 output Every neuron receives input from every other neuron Every neuron outputs signals to every other neuron 5

Loopy network +1 if z > 0 Θ z = ቊ 1 if z 0 y i = Θ j i w ji y j + b i A neuron flips if weighted sum of other neuron s outputs is of the opposite sign But this may cause other neurons to flip! Each neuron is a perceptron with a +1/-1 output Every neuron receives input from every other neuron Every neuron outputs signals to every other neuron 6

Loopy network y i = Θ j i w ji y j + b i +1 if z > 0 Θ z = ቊ 1 if z 0 At each time each neuron receives a field σ j i w ji y j + b i If the sign of the field matches its own sign, it does not respond If the sign of the field opposes its own sign, it flips to match the sign of the field 7

Example Red edges are -1, blue edges are +1 Yellow nodes are +1, black nodes are -1 8

Example Red edges are -1, blue edges are +1 Yellow nodes are +1, black nodes are -1 9

Example Red edges are -1, blue edges are +1 Yellow nodes are +1, black nodes are -1 10

Example Red edges are -1, blue edges are +1 Yellow nodes are +1, black nodes are -1 11

Loopy network If the sign of the field at any neuron opposes its own sign, it flips to match the field Which will change the field at other nodes Which may then flip Which may cause other neurons including the first one to flip» And so on 12

20 evolutions of a loopy net +1 if z > 0 Θ z = ቊ 1 if z 0 y i = Θ w ji y j + b i j i A neuron flips if weighted sum of other neuron s outputs is of the opposite sign But this may cause other neurons to flip! All neurons which do not align with the local field flip 13

120 evolutions of a loopy net All neurons which do not align with the local field flip 14

Loopy network If the sign of the field at any neuron opposes its own sign, it flips to match the field Which will change the field at other nodes Which may then flip Which may cause other neurons including the first one to flip Will this behavior continue for ever?? 15

Loopy network y i = Θ j i w ji y j + b i +1 if z > 0 Θ z = ቊ 1 if z 0 Let y i be the output of the i-th neuron just before it responds to the current field Let y i + be the output of the i-th neuron just after it responds to the current field If y i = sign σ j i w ji y j + b i, then y i + = y i If the sign of the field matches its own sign, it does not flip y i + w ji y j + b i y i w ji y j + b i = 0 j i j i 16

Loopy network y i = Θ j i w ji y j + b i If y i sign σ j i w ji y j + b i, then y i + = y i +1 if z > 0 Θ z = ቊ 1 if z 0 y i + w ji y j + b i y i w ji y j + b i = 2y + i w ji y j + b i j i j i This term is always positive! Every flip of a neuron is guaranteed to locally increase j i y i w ji y j + b i j i 17

Globally Consider the following sum across all nodes D y 1, y 2,, y N = y i w ji y j + b i i j<i = w ij y i y j + b i y i i,j<i Definition same as earlier, but avoids double counting and assumes w ii = 0 i For any unit k that flips because of the local field D y k = D y 1,, y k +,, y N D y 1,, y k,, y N 18

Upon flipping a single unit D y k = D y 1,, y k +,, y N D y 1,, y k,, y N Expanding D y k = y k + y k j i w jk y j + y k + y k b k All other terms that do not include y k cancel out This is always positive! Every flip of a unit results in an increase in D 19

Hopfield Net Flipping a unit will result in an increase (non-decrease) of D is bounded D = w ij y i y j + b i y i i,j<i i D max = i,j<i w ij The minimum increment of D in a flip is + i b i D min = min i, {y i, i=1..n} 2 j i w ji y j + b i Any sequence of flips must converge in a finite number of steps 20

The Energy of a Hopfield Net Define the Energy of the network as E = w ij y i y j b i y i i,j<i Just the negative of D i The evolution of a Hopfield network constantly decreases its energy Where did this energy concept suddenly sprout from? 21

Analogy: Spin Glasses Magnetic diploes Each dipole tries to align itself to the local field In doing so it may flip This will change fields at other dipoles Which may flip Which changes the field at the current dipole 22

Analogy: Spin Glasses Total field at current dipole: f p i = j i rx j p i p j 2 + b i intrinsic external p i is vector position of i-th dipole The field at any dipole is the sum of the field contributions of all other dipoles The contribution of a dipole to the field at any point falls off inversely with square of distance 23

Analogy: Spin Glasses Total field at current dipole: f p i = j i rx j p i p j 2 + b i Response of current diplose x i = x i if sign x i f p i = 1 x i otherwise A Dipole flips if it is misaligned with the field in its location 24

Analogy: Spin Glasses Total field at current dipole: f p i = j i rx j p i p j 2 + b i Response of current diplose x i = x i if sign x i f p i = 1 x i otherwise Dipoles will keep flipping A flipped dipole changes the field at other dipoles Some of which will flip Which will change the field at the current dipole Which may flip Etc.. 25

Analogy: Spin Glasses Total field at current dipole: f p i = j i rx j p i p j 2 + b i Response of current diplose When will it stop??? x i = x i if sign x i f p i = 1 x i otherwise 26

Analogy: Spin Glasses Total field at current dipole: f p i = j i rx j p i p j 2 + b i Response of current diplose x i = x i if sign x i f p i = 1 x i otherwise The total potential energy of the system E = C 1 2 rx i x j x i f p i = C 2 i i j>i p i p j i The system evolves to minimize the PE Dipoles stop flipping if any flips result in increase of PE b i x j 27

PE Spin Glasses The system stops at one of its stable configurations Where PE is a local minimum state Any small jitter from this stable configuration returns it to the stable configuration I.e. the system remembers its stable state and returns to it 28

Hopfield Network y i = Θ j i w ji y j + b i +1 if z > 0 Θ z = ቊ 1 if z 0 E = w ij y i y j b i y i i,j<i i This is analogous to the potential energy of a spin glass The system will evolve until the energy hits a local minimum 29

Hopfield Network y i = Θ j i w ji y j + b i +1 if z > 0 Θ z = ቊ 1 if z 0 Typically will not utilize bias: The bias is similar to having a single extra neuron E = that wis ij pegged y i y j to 1.0 i,j<i i b i y i Removing This is analogous the bias term to the does potential not affect energy the of rest a spin of glass the discussion in any manner The system will evolve until the energy hits a local minimum But not RIP, we will bring it back later in the discussion 30

Hopfield Network y i = Θ w ji y j j i +1 if z > 0 Θ z = ቊ 1 if z 0 E = w ij y i y j i,j<i This is analogous to the potential energy of a spin glass The system will evolve until the energy hits a local minimum 31

PE Evolution E = w ij y i y j i,j<i state The network will evolve until it arrives at a local minimum in the energy contour 32

PE Content-addressable memory state Each of the minima is a stored pattern If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern This is a content addressable memory Recall memory content from partial or corrupt values Also called associative memory 33

Evolution E = w ij y i y j i,j<i Image pilfered from unknown source The network will evolve until it arrives at a local minimum in the energy contour 34

Evolution E = w ij y i y j i,j<i The network will evolve until it arrives at a local minimum in the energy contour We proved that every change in the network will result in decrease in energy So path to energy minimum is monotonic 35

Evolution E = w ij y i y j i,j<i For threshold activations the energy contour is only defined on a lattice Corners of a unit cube on [-1,1] N For tanh activations it will be a continuous function 36

Evolution E = w ij y i y j i,j<i For threshold activations the energy contour is only defined on a lattice Corners of a unit cube on [-1,1] N For tanh activations it will be a continuous function 37

Evolution E = 1 2 yt Wy In matrix form Note the 1/2 For threshold activations the energy contour is only defined on a lattice Corners of a unit cube For tanh activations it will be a continuous function 38

Energy contour for a 2-neuron net Two stable states (tanh activation) Symmetric, not at corners Blue arc shows a typical trajectory for sigmoid activation 39

Energy contour for a 2-neuron net Why symmetric? Because 1 2 yt Wy = 1 2 ( y)t W( y) Two stable states (tanh activation) If y is Symmetric, a local minimum, not at corners so is y Blue arc shows a typical trajectory for sigmoid activation 40

3-neuron net 8 possible states 2 stable states (hard thresholded network) 41

Examples: Content addressable memory http://staff.itee.uq.edu.au/janetw/cmc/chapters/hopfield/ 42

Hopfield net examples 43

Computational algorithm 1. Initialize network with initial pattern y i 0 = x i, 0 i N 1 2. Iterate until convergence y i t + 1 = Θ w ji y j, 0 i N 1 j i Very simple Updates can be done sequentially, or all at once Convergence E = w ji y j y i i j>i does not change significantly any more 44

Issues How do we make the network store a specific pattern or set of patterns? How many patterns can we store? 45

Issues How do we make the network store a specific pattern or set of patterns? How many patterns can we store? 46

How do we remember a specific pattern? How do we teach a network to remember this image For an image with N pixels we need a network with N neurons Every neuron connects to every other neuron Weights are symmetric (not mandatory) N(N 1) 2 weights in all 47

Storing patterns: Training a network 1-1 -1 1 1-1 1-1 -1 1 A network that stores pattern P also naturally stores P Symmetry E(P) = E( P) since E is a function of y i y j E = w ji y j y i i j<i 48

PE A network can store multiple patterns -1 1-1 -1 1 1 state 1-1 Every stable point is a stored pattern So we could design the net to store multiple patterns 1-1 Remember that every stored pattern P is actually two stored patterns, P and P 49

Storing a pattern -1 1 1-1 1-1 E = w ji y j y i i j<i -1 1 1-1 Design {w ij } such that the energy is a local minimum at the desired P = {y i } 50

Storing specific patterns -1 1-1 -1 1 Storing 1 pattern: We want sign w ji y j = y i i j i This is a stationary pattern 51

Storing specific patterns -1 HEBBIAN LEARNING: 1-1 w ji = y j y i -1 1 Storing 1 pattern: We want sign w ji y j = y i i j i This is a stationary pattern 52

Storing specific patterns -1 1-1 HEBBIAN LEARNING: -1 1 w ji = y j y i sign σ j i w ji y j = sign σ j i y j y i y j = sign j i y j 2 y i = sign y i = y i 53

Storing specific patterns -1 1-1 HEBBIAN LEARNING: -1 1 w ji = y j y i The pattern is stationary sign σ j i w ji y j = sign σ j i y j y i y j = sign j i y j 2 y i = sign y i = y i 54

Storing specific patterns -1 1-1 HEBBIAN LEARNING: -1 1 w ji = y j y i E = w ji y j y i = y i 2 y j 2 i j<i i j<i = 1 = 0.5N(N 1) i j<i This is the lowest possible energy value for the network 55

Storing specific patterns -1 1-1 HEBBIAN LEARNING: -1 1 w ji = y j y i The pattern is STABLE E = w ji y j y i = y 2 2 i y j i j<i i j<i = 1 = 0.5N(N 1) i j<i This is the lowest possible energy value for the network 56

Storing multiple patterns -1 1 1-1 1-1 -1 1 1-1 w ji = y p p i y j p {y p } {y p } is the set of patterns to store Superscript p represents the specific pattern 57

Storing multiple patterns -1 1 1-1 1-1 -1 1 1-1 Let y p be the vector representing p-th pattern Let Y = y 1 y 2 be a matrix with all the stored pattern Then.. W = (y p y T p I) = YY T N p I p Number of patterns 58

Storing multiple patterns Note behavior of E y = y T Wy with W = YY T N p I Is identical to behavior with W = YY T Since Energy landscape only differs by an additive constant y T YY T N p I y = y T YY T y NN p But the latter W = YY T is easier to analyze. Hence in the following slides we will use W = YY T 59

Storing multiple patterns -1 1 1-1 1-1 -1 1 1-1 Let y p be the vector representing p-th pattern Let Y = y 1 y 2 be a matrix with all the stored pattern Then.. W = YY T Positive semidefinite! 60

Issues How do we make the network store a specific pattern or set of patterns? How many patterns can we store? 61

Consider the energy function E = 1 2 yt Wy b T y Reinstating the bias term for completeness sake Remember that we don t actually use it in a Hopfield net 62

Consider the energy function This is a quadratic! W is positive semidefinite E = 1 2 yt Wy b T y Reinstating the bias term for completeness sake Remember that we don t actually use it in a Hopfield net 63

Consider the energy function This is a quadratic! For Hebbian learning W is positive semidefinite E = 1 2 yt Wy b T y E is convex Reinstating the bias term for completeness sake Remember that we don t actually use it in a Hopfield net 64

The energy function E = 1 2 yt Wy b T y E is a convex quadratic 65

The energy function E = 1 2 yt Wy b T y E is a convex quadratic Shown from above (assuming 0 bias) But components of y can only take values ±1 I.e y lies on the corners of the unit hypercube 66

The energy function E = 1 2 yt Wy b T y E is a convex quadratic Shown from above (assuming 0 bias) But components of y can only take values ±1 I.e y lies on the corners of the unit hypercube 67

The energy function Stored patterns E = 1 2 yt Wy b T y The stored values of y are the ones where all adjacent corners are higher on the quadratic Hebbian learning attempts to make the quadratic steep in the vicinity of stored patterns 68

Patterns you can store 4-bit patterns Stored patterns would be the corners where the value of the quadratic energy is lowest 69

Patterns you can store Ghosts (negations) Stored patterns Ideally must be maximally separated on the hypercube The number of patterns we can store depends on the actual distance between the patterns 70

How many patterns can we store? Hopfield: For a network of N neurons can store up to 0.15N patterns Provided the patterns are random and far apart 71

How many patterns can we store? Problem with Hebbian learning: Focuses on patterns that must be stored What about patterns that must not be stored? More recent work: can actually store up to N patterns Non Hebbian learning W loses positive semi-definiteness 72

Non Hebbian Storing N patterns Requirement: Given y 1, y 2,, y P Design W such that sign Wy p = y p for all target patterns There are no other binary vectors for which this holds I.e. y 1, y 2,, y P are the only binary Eigenvectors of W, and the corresponding eigen values are positive 73

Storing N patterns Simple solution: Design W such that y 1, y 2,, y P are the Eigen vectors of W Easily achieved if y 1, y 2,, y P are orthogonal to one another Let Y = y 1 y 2 y P r P+1 r N N is the number of bits r P+1 r N are synthetic non-binary vectors y 1 y 2 y P r P+1 r N are all orthogonal to one another W = YΛY T Eigen values λ in diagonal matrix Λ determine the steepness of the energy function around the stored values What must the Eigen values corresponding to the y P s be? What must the Eigen values corresponding to the r s be? Under no condition can more than N values be stored 74

Storing N patterns For non-orthogonal W, solution is less simple y 1, y 2,, y P can no longer be Eigen values, but now represent quantized Eigen directions sign Wy p = y p Note that this is not an exact Eigen value equation Optimization algorithms can provide Ws for many patterns Under no condition can we store more than N patterns 75

Alternate Approach to Estimating the Network E = 1 2 yt Wy b T y Estimate W (and b) such that E is minimized for y 1, y 2,, y P E is maximized for all other y We will encounter this solution again soon Once again, cannot store more than N patterns 76

Storing more than N patterns How do we even solve the problem of storing N patterns in the first place How do we increase the capacity of the network Store more patterns Common answer to both problems.. 77

Lookahead.. Adding capacity to a Hopfield network 78

Expanding the network N Neurons K Neurons Add a large number of neurons whose actual values you don t care about! 79

Expanded Network N Neurons K Neurons New capacity: ~(N+K) patterns Although we only care about the pattern of the first N neurons We re interested in N-bit patterns 80

Introducing N Neurons K Neurons The Boltzmann machine Next regular class 81