The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning

Similar documents
Using Expectation-Maximization for Reinforcement Learning

Beyond Hebbian plasticity: Effective learning with ineffective Hebbian learning rules

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Learning and Memory in Neural Networks

How do biological neurons learn? Insights from computational modelling of

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

Model of a Biological Neuron as a Temporal Neural Network

Effective Neuronal Learning with Ineffective Hebbian Learning Rules

TD(X) Converges with Probability 1

(a) (b) (c) Time Time. Time

Synaptic plasticity in neuromorphic hardware. Stefano Fusi Columbia University

Covariance Learning of Correlated Patterns in Competitive. Networks. Ali A. Minai. University of Cincinnati. (To appearinneural Computation)

Information storage capacity of incompletely connected associative memories

Memory capacity of networks with stochastic binary synapses

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

The Role of Weight Normalization in Competitive Learning

Neuronal Tuning: To Sharpen or Broaden?

Self-organized Criticality and Synchronization in a Pulse-coupled Integrate-and-Fire Neuron Model Based on Small World Networks

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them

Plasticity and Learning

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

F.P. Battaglia 1. Istituto di Fisica. Universita di Roma, La Sapienza, Ple Aldo Moro, Roma and. S. Fusi

Memory Capacity of Linear vs. Nonlinear Models of Dendritic Integration

Viewpoint invariant face recognition using independent component analysis and attractor networks

Open Theoretical Questions in Reinforcement Learning

How to read a burst duration code

Neural networks: Unsupervised learning

Back-propagation as reinforcement in prediction tasks

A MEAN FIELD THEORY OF LAYER IV OF VISUAL CORTEX AND ITS APPLICATION TO ARTIFICIAL NEURAL NETWORKS*

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Storage Capacity of Letter Recognition in Hopfield Networks

A Cell Assembly Model of Sequential Memory

Biologically Plausible Local Learning Rules for the Adaptation of the Vestibulo-Ocular Reflex

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Fundamentals of Computational Neuroscience 2e

Optimizing One-Shot Learning with Binary Synapses

Rate- and Phase-coded Autoassociative Memory

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

An Efficient Method for Computing Synaptic Conductances Based on a Kinetic Model of Receptor Binding

Lecture 14 Population dynamics and associative memory; stable learning

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function

When is an Integrate-and-fire Neuron like a Poisson Neuron?

Neural Network with Ensembles

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks

Introduction to Neural Networks

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Bursting and Chaotic Activities in the Nonlinear Dynamics of FitzHugh-Rinzel Neuron Model

In the Name of God. Lecture 9: ANN Architectures

Introduction Biologically Motivated Crude Model Backpropagation

Memories Associated with Single Neurons and Proximity Matrices

Autoassociative Memory Retrieval and Spontaneous Activity Bumps in Small-World Networks of Integrate-and-Fire Neurons

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

arxiv:quant-ph/ v1 17 Oct 1995

Biological Modeling of Neural Networks:

Probabilistic Models in Theoretical Neuroscience

An Introductory Course in Computational Neuroscience

Neural Networks and Brain Function

Commonly used pattern classi ers include SVMs (support

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Memory Capacities for Synaptic and Structural Plasticity

Effects of Interactive Function Forms and Refractoryperiod in a Self-Organized Critical Model Based on Neural Networks

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Learning a world model and planning with a self-organizing, dynamic neural system

Learning in neural network memories

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Memory capacity of neural networks learning within bounds

A recurrent model of transformation invariance by association

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Replacing eligibility trace for action-value learning with function approximation

Effects of refractory periods in the dynamics of a diluted neural network

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

5.0 References. Unsupervised Learning for Boltzmann Machines 15

Financial Informatics XVII:

Hopfield Network for Associative Memory

Hopfield Networks. (Excerpt from a Basic Course at IK 2008) Herbert Jaeger. Jacobs University Bremen

arxiv: v1 [cs.ai] 5 Nov 2017

Causality and communities in neural networks

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer

0 b ject ive Functions for Neural Map Format ion

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 28 Jul 2003

Nonlinear reverse-correlation with synthesized naturalistic noise

Empirical Entropy Manipulation and Analysis

An objective function for self-limiting neural plasticity rules.

Signal, donnée, information dans les circuits de nos cerveaux

Neuroinformatics. Marcus Kaiser. Week 10: Cortical maps and competitive population coding (textbook chapter 7)!

Slide02 Haykin Chapter 2: Learning Processes

Brains and Computation

Supporting Online Material for

On the Computational Complexity of Networks of Spiking Neurons

W (x) W (x) (b) (a) 1 N

A biologically realistic cleanup memory: Autoassociation in spiking neurons

Properties of Associative Memory Model with the β-th-order Synaptic Decay

The Perceptron. Volker Tresp Summer 2014

Neural Nets and Symbolic Reasoning Hopfield Networks

INTRODUCTION TO NEURAL NETWORKS

Information capacity in recurrent McCulloch-Pitts networks with sparsely coded memory states

Transcription:

NOTE Communicated by David Willshaw The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning Peter Dayan Terrence J. Sejnowski Computational Neurobiology Laboratory, The Salk Institute, P. 0. Box 85800, San Diego, CA 92186-5800 USA Hebbian synapses lie at the heart of most associative matrix memories (Kohonen 1987; Hinton and Anderson 1981) and are also biologically plausible (Brown et al. 1990; Baudry and Davis 1991). Their analytical and computational tractability make these memories the best understood form of distributed information storage. A variety of Hebbian algorithms for estimating the covariance between input and output patterns has been proposed. This note points out that one class of these involves stochastic estimation of the covariance, shows that the signal-to-noise ratios of the rules are governed by the variances of their estimates, and considers some parallels in reinforcement learning. Associations are to be stored between R pairs [a(w), b(w)] of patterns, where a(w) E (0, l)'n and b(w) E (0, lin, using the real-valued elements of an rn x n matrix W. Elements of a(w) and b(w) are set independently with probabilities p and r, respectively, of being 1. A learning rule specifies how element W, changes in response to the input and output values of a particular pair-the model adopted here (from Palm 1988a,b) considers local rules with additive weight changes for which: n Wq = AO(w), where Aij(w) = f [ai(w), bj(w)] w=l and f can be represented as [a,/3, y, 61 based on One way to measure the quality of a rule is the signal-to-noise ratio (S/N) of the output of a single "line" or element of the matrix, which is a measure of how well outputs that should be 0 can be discriminated from outputs that should be 1. The larger the S/N, the better the memory will perform (see Willshaw and Dayan, 1990 for a discussion). A wide variety of Hebbian learning rules has been proposed for hetero- Neural Computation 5, 205-209 (1993) @ 1993 Massachusetts Institute of Technology

206 Peter Dayan and Terrence Sejnowski and autoassociative networks (Kohonen 1987; Sejnowski 1977a; Hopfield 1982; Perez-Vincente and Amit 1989; Tsodyks and Feigel'man 1988). The covariance learning rule fcov = [pr, -p(l - r), -(I - p)r, (1 - p)(l - r)], has the highest S/N (Willshaw and Dayan 1990; Dayan and Willshaw 1991); however, both it and a related rule, fprd = [-pr, -pr, -pr, 1 - pr] (Sejnowski 1977a), have the drawback that a # 0, that is, a weight should change even if both input and output are silent. Note the motivations behind these rules: fcov N (input - p) x (output - r) fprd input x output - pr Alternative rules have been suggested that better model the physiological phenomena of long-term potentiation (LTP) and depression (LTD) in the visual cortex and hippocampus, including the heterosynaptic rule fhet = [0, -p, O,1 - p] (Stent 1973; Rauschecker and Singer 1979), and the homosynaptic rule fhom = [O, 0, -r,1 - r] (Sejnowski 1977b; Stanton and Sejnowski 1989), motivated as fhet (input - p) x output fhom N input x (output - r) These have been shown to have lower S/Ns than the covariance rule (Willshaw and Dayan 1990); however, for the sparse patterns, that is, low values of p and r, this difference becomes small. The sparse limit is interesting theoretically, because many more patterns can be stored, and empirically, because the cortex has been thought to employ it (see, for example, Abeles et al. 1990). All of these rules are effectively stochastic ap roximations of the covariance between input and output ((ai(w)- a,) &(w)- hi)) where the averages 0, are taken over the distributions generating the pi%terns; they all share this as their common mean.' If inputs and outputs are independent, as is typically the case for heteroassociative memories, or autoassociative ones without the identity terms, then their common expected value is zero. However, the rules differ in their variances as estimates of the covariance. Since it is departures of this quantity from its expected value that mark the particular patterns the matrix has learned, one would expect that the lower the variance of the estimate the better the rule. This turns out to be true, and for independent inputs and outputs the S/N of the rules: 'This is true for all rules [yr, -y(l - r), -(I - y)r, (1 - y)(l - r)] and bv, -p(l - 111, -0 -PI% (1 - p)(l - all for any 7 or 17.

Variance of Covariance Rules for Matrix Memories 207 are inversely proportional to their variances. fco, is the best, f,d the worst, but the ordering of the other two depends on p and r. Circumstances arise under which the optimal rule differs, as for instance if patterns are presented multiple times but input lines can fail to fire on particular occasions-this would favor the homosynaptic rule. Exactly the same effect underlies the differences in efficacy between various comparison rules for reinforcement learning. Sutton (1984) studied a variety of two-armed bandit problems, which are conventional tasks for stochastic learning automata. On trial w, a system emits action y(w) E {O,1} (ie., pulls either the left or the right arm) and receives a probabilistic reward r(w) E {0,1} from its environment, where In the supervised learning case above, the goal was to calculate the covariance between the input and output. Here, however, the agent has to measure the covariance between its output and the reward in order to work out which action it is best to emit (i.e., which arm it is best to pull). Sutton evaluated r(w)[y(w) - (y(w))] and an approximation to [r(w)- (r(w))][y(w) - (~(w))], where (y(w)) averages over the stochastic process generating the outputs and is the expected reinforcement given the stochastic choice of y(w). These are direct analogues of fhet or fhom [depending on whether y(w) is mapped to a(w) or b(w)] and fco,, respectively, and Sutton showed the latter significantly outperformed the former. There is, however, an even better estimator. In the previous case, a and b were independent; here, by contrast, r(w) is a stochastic function of y(w). The learning rule that minimizes the variance of the estimate of the covariance is actually where 3 = P[Y(w) = O]p, + P[y(w) = l]po pairs the probability of emitting action 0 with the reinforcement for emitting action 1. Williams (personal communication) suggested 3 on just these grounds and simulations (Dayan 1991) confirm that it does indeed afford an improvement. Four previously suggested Hebbian learning rules have been shown to be variants of stochastic covariance estimators. The differences between their performances in terms of the signal-to-noise ratio they produce in an associative matrix memory may be attributed to the differences in the variance of their estimates of the covariance. The same effect underlies the performance of reinforcement comparison learning rules, albeit suggesting a different optimum.

208 Peter Dayan and Terrence Sejnowski Acknowledgments - We are very grateful to Steve Nowlan and David Willshaw for helpful comments. Support was from the SERC and the Howard Hughes Medical Institute. References Abeles, M., Vaadia, E., and Bergman, H. 1990. Firing patterns of single units in the prefrontal cortex and neural network models. Network 1, 13-25. Anderson, J. A., and Rosenfeld, E., eds. 1988. Neurocomputing: Foundations of Research. MIT Press, Cambridge, MA. Baudry, M., and Davis, J. L. 1991. Long-Term Potentiation: A Debate of Current Issues. MIT Press, Cambridge, MA. Brown, T. H., Kairiss, E. W., and Keenan, C. L. 1990. Hebbian synapses: Biophysical mechanisms and algorithms. Annu. Rev. Neurosci. 13, 475-512. Dayan, P. 1991. Reinforcement comparison. In Connectionist Models: Proceedings of the 1990 Summer School, D. S. Touretzky, J. L. Elman, T. J. Sejnowski and G. E. Hinton, eds. Morgan Kaufmann, San Mateo, CA. Dayan, P., and Willshaw, D. J. 1991. Optimal synaptic learning rules in linear associative memories. Biol. Cybernet. 65, 253-265. Hinton, G. E., and Anderson, J. A., eds. 1981. Parallel Models of Associative Memo y. Lawrence Erlbaum, Hillsdale, NJ. Hopfield, J. J. 1982. Neural networks and physical systems with emergent computational abilities. Proc. Natl. Acad. Sci. U. S.A. 79, 2554-2558. Kohonen, T. 1987. Content-addressable Memories, 2nd ed. Springer-Verlag, Berlin. Palm, G. 1988a. On the asymptotic information storage capacity of neural networks. In Neural Computers. NATO ASI Series, R. Eckmiller and C. von der Malsburg, eds., Vol. F41, 271-280. Springer-Verlag, Berlin. Palm, G. 198813. Local synaptic rules with maximal information storage capacity. In Neural 8 Synergetic Computers, Springer Series in Synergetics, H. Waken, ed., Vol. 42, 100-110. Springer-Verlag, Berlin. Perez-Vincente, C. J., and Amit, D. J. 1989. Optimised network for sparsely coded patterns. J. Pkys. A: Math. General 22, 559-569. Rauschecker, J. P., and Singer, W. 1979. Changes in the circuitry of the kitten's visual cortex are gated by postsynaptic activity. Nature (London) 280, 58-60. Sejnowski, T. J. 1977a. Storing covariance with nonlinearly interacting neurons. J. Math. Biol. 4, 303-321. Sejnowski, T. J. 197%. Statistical constraints on synaptic plasticity. J. Theoret. Biol. 69, 385-389. Stanton, P., and Sejnowski, T. J. 1989. Associative long-term depression in the hippocampus: Induction of synaptic plasticity by Hebbian covariance. Nature (London) 339, 215-218. Stent, G. S. 1973. A physiological mechanism for Hebb's postulate of learning. Proc. Natl. Acad. Sci. 70, 997-1001.

Variance of Covariance Rules for Matrix Memories 209 Sutton, R. S. 1984. Temporal Credit Assignment in Reinforcernent Learning. Ph.D. Thesis, University of Massachusetts, Amherst, MA. Tsodyks, M. V., and Feigel'man, M. V. 1988. The enhanced storage capacity in neural networks with low activity level. Europhys. Lett. 6,101-105. Willshaw, D. J., and Dayan, P. 1990. Optimal plasticity in matrix memories: What goes up MUST come down. Neural Comp. 2,85-93. Received 20 January 1992; accepted 14 September 1992.