Hopfield Training Rules 1 N

Similar documents
Unsupervised Learning

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

1 GSW Iterative Techniques for y = Ax

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

CHAPTER III Neural Networks as Associative Memory

Central tendency. mean for metric data. The mean. "I say what I means and I means what I say!."

Lecture 4 Hypothesis Testing

Chapter Newton s Method

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Negative Binomial Regression

10-701/ Machine Learning, Fall 2005 Homework 3

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

18.1 Introduction and Recap

arxiv: v1 [cs.lg] 17 Jan 2019

Kernel Methods and SVMs Extension

Lecture 4: Universal Hash Functions/Streaming Cont d

LECTURE NOTES. Artifical Neural Networks. B. MEHLIG (course home page)

Problem Set 9 Solutions

Lecture Notes on Linear Regression

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

CHAPTER 14 GENERAL PERTURBATION THEORY

THE SUMMATION NOTATION Ʃ

Foundations of Arithmetic

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Linear Approximation with Regularization and Moving Least Squares

x = , so that calculated

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Physics 2A Chapters 6 - Work & Energy Fall 2017

Meta-Analysis of Correlated Proportions

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

The internal structure of natural numbers and one method for the definition of large prime numbers

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Comparison of Regression Lines

Chapter - 2. Distribution System Power Flow Analysis

Generalized Linear Methods

Associative Memories

Chapter 13: Multiple Regression

Pattern Classification

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Basically, if you have a dummy dependent variable you will be estimating a probability.

Limited Dependent Variables

CS407 Neural Computation

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

The Hopfield model. 1 The Hebbian paradigm. Sebastian Seung Lecture 15: November 7, 2002

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

Section 8.3 Polar Form of Complex Numbers

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

} Often, when learning, we deal with uncertainty:

Chapter 11: Simple Linear Regression and Correlation

Maximizing the number of nonnegative subsets

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Multigradient for Neural Networks for Equalizers 1

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Uncertainty and auto-correlation in. Measurement

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Module 9. Lecture 6. Duality in Assignment Problems

= z 20 z n. (k 20) + 4 z k = 4

Convergence of random processes

Grover s Algorithm + Quantum Zeno Effect + Vaidman

NP-Completeness : Proofs

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Primer on High-Order Moment Estimators

1 Convex Optimization

Week 11: Chapter 11. The Vector Product. The Vector Product Defined. The Vector Product and Torque. More About the Vector Product

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Neural Networks & Learning

Lecture 2: Prelude to the big shrink

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

Support Vector Machines CS434

Ensemble Methods: Boosting

Assignment 5. Simulation for Logistics. Monti, N.E. Yunita, T.

Boostrapaggregating (Bagging)

Uncertainty in measurements of power and energy on power networks

CHAPTER 17 Amortized Analysis

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

AGC Introduction

HMMT February 2016 February 20, 2016

U-Pb Geochronology Practical: Background

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

VQ widely used in coding speech, image, and video

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Online Classification: Perceptron and Winnow

Exercises of Chapter 2

Multiple Contrasts (Simulation)

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Chapter 12 Analysis of Covariance

Transcription:

Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the netor do hen the memorsed pattern s presented? For each node,, ts actvaton, s, ll be gven by - s = sgn p = sgn p p p = = = sgn p = ( p ) 2 = = sgn ( p ) = p In other ords, the actvaton of each node ll reman unchanged. The memorsed pattern s a stable state of the netor. ote that any pattern presented to the netor hch s smlar to the memorsed pattern ll mgrate toards the memorsed pattern as the actvaton rule s repeatedly appled. In fact, f more than half the bts of a presented pattern are the same as the memorsed pattern then the memorsed pattern ll eventually be recreated n ts entrety. If less than half are the same then the nverse of the memorsed pattern (+s nstead of s and vce versa) ll be generated. The memorsed pattern and ts nverse are attractors and the netor ll eventually end up at one of them. - -

To memorse more than one pattern o suppose e have n patterns hch e sh to memorse. Extendng the equaton e used to set the eghts for a sngle memory, e could try - n = p p = here, s the eght beteen nodes & s the number of nodes n the netor n s the number of patterns to be learnt p s the value requred for the -th node n pattern Ths equaton ll ncrease the eght beteen to nodes, &, henever they are both actve together. ote hoever that t s not uncommon to set to 0 for all.ths should remnd you of the Hebb Rule. In fact the equaton does more than ths, t also reduces the eght beteen any par of nodes here one node s actve and the other s nactve. For ths reason t s sometmes called the Generalsed Hebb Rule. All of the above propertes are sound, both computatonally and bologcally. Hoever, there s one further property of the above equaton hch s not bologcally feasble; the eght beteen any to nodes hch are smultaneously nactve s also ncreased by the equaton. Ths s hy Hopfeld netors alays create pars of memores (the desred ones and ther nverses). It does not (ndeed cannot) dstngush beteen these to stuatons hen the eghts are beng set. Summary of Hopfeld etor Equatons Weght settng (tranng) for n memores n an node Hopfeld etor - n = p p = Executon (teraton untl convergence) - s = sgn s = - 2 -

Hopfeld's Energy Analyss Ho can e be sure that repeated applcatons of the actvaton equaton ll result n a stable state beng acheved? Hopfeld's most mportant contrbuton to the study of As as hs dea of calculatng an energy level for hs netor. He defned the energy n such a ay that states of the netor (actvatons of the nodes) hch represented learned memores had the loest levels of energy. Any other states had a hgher energy level and he shoed that applyng the actvaton equaton reduced the energy of the netor, thus movng the netor closer to a learnt memory. Hopfeld defned the energy as follos - H here, = s p 2 s s the actvaton of node p s the -th bt of a memory Clearly, H ll be at a mnmum hen s = p AD hen s = -p so both the memory and ts nverse ll be energy mnma of the netor, as requred. Whlst the precse value of the constant of proportonalty s not crucal, t s convenent to set t to /(2*). Thus - H = s p 2 = 2 When consderng more than one memory e can attempt to mae them all local mnma of H by summng the above over all of the memores - H n = s p 2 = = 2 here, n s the number of memores - 3 -

Multplyng ths out e get - n H = s p s p 2 = = = n = p p s s 2 & = and f e substtute usng the Generalsed Hebb Rule hch s used to set the eghts e get - H n = p p = = s s 2 & ******************* o, gven symmetrc connectons, =, e can rerte the above equaton as - H C s s = ( ) here () means all dstnct pars,, (I.e. countng 2 as the same par as 2) and here the terms are ncluded n the constant C. Follong applcaton of the actvaton equaton to a node,, that node's actvaton ll change from s to s '. s = sgn s = We note that an update to the actvaton of node ll only have occurred f s ' = -s and that there ll be a consequent change n the energy of the netor hch e can descrbe as H'-H and hch ll be gven by - 4 -

H H = C C + s s s s snce only node has changed and the rest of the terms n the summaton of dstnct pars ll cancel out leavng ust those terms nvolvng node. Gven that s ' = -s the to summatons above are actually the same once the dfference n ther sgns s taen nto account - H H = s s 2 = 2s s 2 s s = o e no that s has a dfferent sgn to s ' so and 2s s = 2s s < 0 = n = = > 0 n p p = unless e've chosen to set to 0 for all of course. So H'-H s somethng less than 0 mnus somethng greater than or equal to 0. I.e. H H < 0 **************** Therefore any applcaton of the actvaton rule ll result n ether no change (convergence on an energy mnmum has been acheved) or a decrease n the energy of the system (convergence toards a mnmum contnues). - 5 -

Stablty of Memores n Hopfeld etors Let net be the total nput to a node hen a pattern s to be memorsed. For ths pattern to be a stable state of the netor e need - o, ( ) sgn net = p n l l = = = = l= net p p p p Wthn the summaton over l e have the specal case hen l =. The summaton over l yelds a term p p p = p hch s then summed tmes n the summaton over and then dvded by gvng a separated-out p term- l l = + = l net p p p p From ths e can see that f the second term s zero e have stablty the actvaton of node ll be the same as p. Furthermore, e can see that f the second term (the crosstal term) s small enough then t ll not alter the stablty of the node. If the magntude of the crosstal term s less than then the sgn of net cannot be changed and stablty ll be preserved. - 6 -

Storage Capacty of the Hopfeld etor We cannot specfy the storage capacty of a Hopfeld netor n a determnstc ay. Some memores nterfere th each other more than others. Any attempt to calculate the capacty of a Hopfeld netor has to be statstcal n nature. We shall no use our earler stablty analyss to create a mathematcal artefact hch ll enable us to derve some statstcal results about the memory capacty of a Hopfeld netor. Consder the quantty l l = l C p p p p I.e. -p multpled by the crosstal term If C s negatve then the crosstal term has the same sgn as p so node s stable. If C s postve and greater than then the sgn of net ll be changed and bt of pattern has become unstable. In fact, f e placed pattern onto the netor and appled the actvaton equaton (executed the net) then a dfferent pattern ould emerge th node havng changed value, from + to or vce versa. ote that C depends only on the patterns e are tryng to store. We shall no consder hat happens to a pattern chosen at random n hch each bt has an equal probablty of beng + or and derve an estmate of the probablty that any bt s unstable P error = P( C > ) P error ll obvously ncrease as e ncrease the number of patterns to be memorsed. We need a crteron for acceptable performance. For example, f e sh the probablty of a bt beng unstable to be less than 0.0 (P error < 0.0) then e ant to be able to calculate the maxmum number of patterns, n max, e can store. It turns out that C s bnomally dstrbuted th mean 0 and varance n/. If *n s large e can use the Central Lmt Theorem to approxmate the dstrbuton of C th a ormal (Gaussan) dstrbuton th the same mean and varance. - 7 -

o e can dra up a table by selectng dfferent values of P error and determnng hat mplcatons they have for n max / P error n max / 0.00 0.05 0.0036 0.38 0.0 0.85 0.05 0.37 0. 0.6 So, f e choose P error < 0.0, for nstance, then n max must not exceed 0.85*. If e tred to memorse 0.85* patterns then less than % of the bts ould be unstable ntally! Unfortunately even one unstable bt can have dramatc noc-on effects as e teratvely apply the actvaton equaton and t s qute possble for an ntal % of unstable bts to evolve nto a stuaton here the maorty have become unstable. It can, n fact, be shon that ths nd of avalanche occurs hen the number of patterns to be memorsed exceeds 0.38* (or hen P error > 0.0036). There s not one sngle correct ay n hch to approach the queston of the capacty of a Hopfeld netor and alternatves to the above analyss have been nvestgated. One of the more useful approaches s to consder the probablty of perfect recall of bts and patterns. The outcomes of ths alternatve approach are merely stated here. See the Hertz, Krogh and Palmer boo on the readng lst for further detals. In order to recall all bts of a pattern correctly th a 99% probablty n max 2ln In order to recall all bts of all of the patterns perfectly th a 99% probablty n max 4ln - 8 -