The Hopfield model. 1 The Hebbian paradigm. Sebastian Seung Lecture 15: November 7, 2002

Size: px

Start display at page:

Download "The Hopfield model. 1 The Hebbian paradigm. Sebastian Seung Lecture 15: November 7, 2002"

Patience Greene
6 years ago
Views:

1 MIT Department of Bran and Cogntve Scences 9.29J, Sprng Introducton to Computatonal euroscence Instructor: Professor Sebastan Seung The Hopfeld model Sebastan Seung 9.64 Lecture 5: ovember 7, 2002 The Hebban paradgm In hs 949 book The Organzaton of Behavor, Donald Hebb predcted a form of synaptc plastcty wth the followng property When an axon of cell A s near enough to excte a cell B and repeatedly or persstently takes part n frng t, some growth process or metabolc change takes place n one or both cells such that A s effcency, as one of the cells frng B, s ncreased. In 973, the phenomenon of long-term potentaton (LTP) was dscovered. The study of LTP s now a small ndustry wthn the feld of neuroscence. Among the varous forms of LTP, those that depend on the MDA subtype of glutamate receptor are regarded as Hebban. These forms depend on temporal contguty of presynaptc and postsynaptc actvty. The requrement of temporal contguty s expressed n the followng dtty eurons that fre together, wre together. eurons that fre out of sync, fal to lnk. Hebban synaptc plastcty s remarkable for havng been predcted on theoretcal grounds well before t was dscovered expermentally. Ths s not a common occurrence n bology. How dd Hebb make hs remarkable predcton? He was motvated by an old tradton n Western thought known as assocatonsm. Ths s the dea that the bran s nothng more than an engne for storng and retrevng assocatons. In the late 9th century, t was found that the bran conssted of neurons connected by an ntrcate web of synapses. Ths transformed the older assocatonst tradton nto connectonsm, the doctrne that assocatons are stored as synaptc connectons. Ths s an example of the dea that structure determnes functon n bology. Hume and other emprcst phlosophers had already expressed the dea that assocatons were learned from temporal contguty. It only stood to reason that connectons should also be learned from temporal contguty. Whle these deas are very suggestve, they are admttedly rather vague and metaphorcal. The challenge of connectonsm s to make these deas precse. A number of theorsts have formulated neural network models wth the goal of explanng how Hebban synaptc plastcty could be used to store memores.

2 2 Bnary model neurons For the most part, we have studed neural network models n whch the actvty of each neuron s descrbed by a sngle analog varable. A more drastc smplfcaton s to use bnary neurons, whch are ether actve (s = ) or nactve (s = ). In such a dynamcs, each neuron updates tself accordng to the rule s = sgn W s, () where s s the state of neuron after the update. ote that sometmes s = s, n whch case the update results n no change. If the argument of the sgn functon happens to be zero, there s an ambguty to the defnton, whch could be resolved by a con flp or arbtrarly defnng sgn(0) =. Equaton () can be used to defne several types of network dynamcs, dependng on the exact manner n whch t s appled. In a sequental dynamcs, the neurons are updated one at a tme, typcally n a random order. In a parallel dynamcs, Eq. () s appled to all neurons smultaneously. It s often easer to prove theorems about the sequental case, but the parallel case s sometmes easer to mplement (for example n MATLAB t s more natural). In a smulaton on a dgtal computer, t s natural to make the updates at dscrete tmes. However, the update tmes could n prncple be contnuous. For example, they could occur at random for each neuron accordng to a Posson process wth some mean rate. As we shall see, all these verson of network dynamcs behave qualtatvely the same n general, but there can be subtle dfferences. 3 Hopfeld model Suppose that synapses change accordng to the learnng rule W = ηs s where η > 0 s a learnng rate. Ths could be called Hebban, although t has some werd features. The synapse s potentated f both neurons are actve, or both are nactve. The synapse s depressed f one neuron s actve, and the other s nactve. Suppose further that the network s exposed to P bnary patterns ξ,..., ξ P durng a tranng phase. More specfcally, the state s s set to each pattern n turn, and the synaptc weghts are changed by the Hebban update. If W = 0 at the begnnng of the tranng phase, Hebban learnng wll result n W = ξ ξ The prefactor / corresponds to a partcular choce of learnng rate η = /. It s a convenent choce for the calculatons we are about to perform, but s otherwse arb (2) 2

3 trary, as W can be multpled by any postve prefactor wthout affectng the dynamcs (). Ths synaptc weght matrx s the famous Hopfeld model, along wth the dynamcs (), and the assumpton that the patterns are chosen at random. Ths last assumpton makes sense f we assume that there s a data compresson stage that encodes sensory data effcently before t reaches the Hopfeld network. In hs orgnal paper, Hopfeld set the dagonal terms of the weght matrx to be zero (W = 0). If ths s not done, the dynamcs takes the form = sgn P s s + W s,,= The network stll bascally works, but there are some subtle dfferences, whch we wll dscuss later. As we shall see n the followng, the Hopfeld model functons as an assocatve memory, because the patterns are stored as dynamcal attractors. If the network s ntalzed wth a corrupted or ncomplete verson of a pattern, convergence to an attractor can recall the correct pattern. 4 Sngle pattern Let s start wth a smple case, a sngle pattern W = ξ ξ (3) The dynamcs takes the form s ξ ξ = sgn s In terms of the overlap m = ξ between ξ and s, we can wrte s = sgn (ξ m) Snce m = when s = ξ, t s a steady state of the dynamcs. Smlarly, m = when s = ξ, so t s also a steady state of the dynamcs. If m > 0, updates can only cause m to ncrease. Lkewse f m < 0, updates can only cause m to decrease. Therefore these steady states are attractors of the dynamcs. s 3

4 5 Many patterns The case of many patterns s more nterestng. The weght matrx (2) s ust the superposton of sngle pattern outer products (3). The danger here s that the patterns mght nterfere wth each other. If the patterns were exactly orthogonal to each other, there would be no nterference. If they are chosen randomly, there s some possblty of crosstalk, whch s quantfed below. To check whether the patterns are steady states, we calculate ξ W ξ ξ ξ = (4),=,= ξ = ξ + ξ ξ (5),=,= ξ ξ ξ = + ξ (6) ξ,=,= The last term s called the crosstalk or nterference term. If t s greater than for all, then the th pattern s a steady state ξ = sgn W ξ If we try to store too many patterns, then the nterference between them becomes large, and storage s not possble. To study the stablty of a pattern, magne that the network s ntalzed at the pattern, and try to estmate how many bt flps take place. We can approxmate the crosstalk term as the sum of p ndependent con flps. There s an error when the sum s more negatve than. The dervaton n the book uses the Gaussan approxmaton to the bnomal dstrbuton. We wll do somethng coarser, whch s the Hoeffdng bound on the devaton between frequency and probablty. Accordng to the one-sded Hoeffdng bound for m con tosses P rob[ˆp < p ] < exp( 2 2 m) Here we are nterested n devatons of the frequency of heads from 2 by more than 2p. We fnd that P error < exp( 2p(/2p) 2 ) = exp 2p There are several defntons of capacty. (fracton of bts are corrupted, P error < 0.0) Suppose that we would lke less than one percent of the bts corrupted. Then we need exp( /(2p)) < 0.0, or p < /(2 log(0.0)) = 0.. The problem wth ths calculaton s that the flppng of the frst bts could cause an avalanche. In realty, the capacty s about p = 0.4. Calculatng ths number requres some heavy mathematcs, lke the replca trck. 4

5 (fracton of patterns are corrupted, P error < 0.0/ ) We could mpose a more strngent requrement, whch s than less than one percent of the patterns have a bt corrupted. p = 2 log (fracton of samples are corrupted, P error < 0.0/(p ). The most strngent requrement s that no pattern n the sample s corrupted, wth confdence greater than 99%. Takng log p log, we obtan p = 4 log Is ths good? Accordng to the Cover argument, the maxmum should be p = 2. We can store patterns as attractors of the network dynamcs (wth some corrupton). However, there can be other attractors, or spurous states. These are of three types. There are reversed states ξ due to the ± symmetry of the network dynamcs. There are mxture states, whch are a superposton of an odd number of patterns. For example, ξ mx = sgn(±ξ ± ξ 2 ± ξ 3 ) An even number won t work, because the sum works out to zero on some stes. There are spn glass states for large p, whch are not correlated wth any fnte number of the patterns ξ. 6 Energy functon If the nteractons W are symmetrc, then E = W s s 2 s nonncreasng under the dynamcs (), assumng asynchronous update. To prove ths, note that the = terms n the sum are unchangng, snce s 2 =. Therefore the change n E due to the update () s gven by E = 2 (s s ) W s (7),= = 2 (s s ) W s + (s s )W s (8) 2 If s = s, then E = 0 and we are done. In the other case, s = s, and we can wrte E = s W s W s 2 0 5

6 Therefore the dynamcs can be understood as descent on an energy landscape. Statstcal physcsts lke bnary neurons because they are analogous to magnetc spns. The synaptc nteracton W can be compared to a magnetc nteracton. If W > 0, the spns want to lne up n the same drecton, as n a ferromagnet. If W < 0, the nteracton s called antferromagnetc. 7 Energy functon Let s consder the case W = for all and. Ths corresponds to a pure ferromagnet, n whch all nteractons try to make the spns lne up n the same drecton. In ths case, ( E = 2 Obvously there are two mnma, the fully magnetzed states s = for all and s = for all. Any ntal condton wth more up spns than down spns wll converge to the all up state, and a smlar statement can be made about the down state. Suppose that we would lke to store a sngle pattern ξ as an attractor of the dynamcs. Ths s bascally the same as the prevous case. If we choose then the energy functon s W = ξ ξ s 2 E = 2 ξ ξ s s (9) ( = ξ s (0) 2 The attractors of ths dynamcs are s = ξ and s = ξ, as requred. In statstcal mechancs, ths s known as the Matts model. ote that the energy functon for ths network s ( E = s ξ 2 It s plausble that the patterns should be local mnma, but they mght nterfere wth each other. 2 8 Content-addressable memory The nput s encoded n the ntal condton of the network dynamcs. Convergence to an attractor corresponds to recall of the closest stored memory. 6

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the