Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

Similar documents
Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Plasticity and Learning

Learning and Meta-learning

The Hebb rule Neurons that fire together wire together.

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

How do biological neurons learn? Insights from computational modelling of

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Neural networks: Unsupervised learning

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function

Effect of Correlated LGN Firing Rates on Predictions for Monocular Eye Closure vs Monocular Retinal Inactivation

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

An Introductory Course in Computational Neuroscience

Time-Skew Hebb Rule in a Nonisopotential Neuron

Covariance and Correlation Matrix

A MEAN FIELD THEORY OF LAYER IV OF VISUAL CORTEX AND ITS APPLICATION TO ARTIFICIAL NEURAL NETWORKS*

Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity

Spike timing dependent plasticity - STDP

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico

arxiv: v1 [q-bio.nc] 4 Jan 2016

Commonly used pattern classi ers include SVMs (support

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Invariant object recognition in the visual system with error correction and temporal difference learning

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Beyond Hebbian plasticity: Effective learning with ineffective Hebbian learning rules

Brains and Computation

The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Sampling-based probabilistic inference through neural and synaptic dynamics

Plasticity Kernels and Temporal Statistics

Investigations of long-term synaptic plasticity have revealed

Introduction to Neural Networks

The Role of Constraints in Hebbian Learning

arxiv: v1 [cs.ne] 30 Mar 2013

Novel VLSI Implementation for Triplet-based Spike-Timing Dependent Plasticity

Synaptic plasticity in neuromorphic hardware. Stefano Fusi Columbia University

Synergies Between Intrinsic and Synaptic Plasticity Mechanisms

Lateral organization & computation

Adaptation in the Neural Code of the Retina

Marr's Theory of the Hippocampus: Part I

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

0 b ject ive Functions for Neural Map Format ion

Training and spontaneous reinforcement of neuronal assemblies by spike timing

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Natural Image Statistics

Effective Neuronal Learning with Ineffective Hebbian Learning Rules

Design and Implementation of BCM Rule Based on Spike-Timing Dependent Plasticity

Optimal In-Place Self-Organization for Cortical Development: Limited Cells, Sparse Coding and Cortical Topography

Viewpoint invariant face recognition using independent component analysis and attractor networks

DYNAMICAL SYSTEMS OF THE BCM LEARNING RULE: by Lawrence C. Udeigwe

Computational Explorations in Cognitive Neuroscience Chapter 4: Hebbian Model Learning

Tuning tuning curves. So far: Receptive fields Representation of stimuli Population vectors. Today: Contrast enhancment, cortical processing

Divisive Inhibition in Recurrent Networks

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Abstract. Author Summary

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS

Efficient temporal processing with biologically realistic dynamic synapses

Higher Order Statistics

The Role of Weight Normalization in Competitive Learning

An objective function for self-limiting neural plasticity rules.

Model of a Biological Neuron as a Temporal Neural Network

A gradient descent rule for spiking neurons emitting multiple spikes

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Neuroinformatics. Marcus Kaiser. Week 10: Cortical maps and competitive population coding (textbook chapter 7)!

Synaptic Rewiring for Topographic Map Formation

Financial Informatics XVII:

Tensor Decomposition by Modified BCM Neurons Finds Mixture Means Through Input Triplets

Synaptic Input. Linear Model of Synaptic Transmission. Professor David Heeger. September 5, 2000

Nonlinear reverse-correlation with synthesized naturalistic noise

Copyright by. Changan Liu. May, 2017

Correlations and neural information coding Shlens et al. 09

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

Principal Component Analysis CS498

Neurophysiology of a VLSI spiking neural network: LANN21

How do synapses transform inputs?

Synaptic dynamics. John D. Murray. Synaptic currents. Simple model of the synaptic gating variable. First-order kinetics

Introduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity.

Biological Modeling of Neural Networks:

Synaptic Devices and Neuron Circuits for Neuron-Inspired NanoElectronics

Effects of Interactive Function Forms and Refractoryperiod in a Self-Organized Critical Model Based on Neural Networks

Intrinsic Stabilization of Output Rates by Spike-Based Hebbian Learning

Bayesian synaptic plasticity makes predictions about plasticity experiments in vivo

Deep learning in the brain. Deep learning summer school Montreal 2017

Principles of DCM. Will Penny. 26th May Principles of DCM. Will Penny. Introduction. Differential Equations. Bayesian Estimation.

EEE 241: Linear Systems

REPORT DOCUMENTATION PAGE

Models for Pattern Formation in Development.

Fundamentals of Computational Neuroscience 2e

A summary of Deep Learning without Poor Local Minima

15 Grossberg Network 1

Principal Component Analysis

Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity

Neocortical Pyramidal Cells Can Control Signals to Post-Synaptic Cells Without Firing:

Connecting Epilepsy and Alzheimer s Disease: A Computational Modeling Framework

Leo Kadanoff and 2d XY Models with Symmetry-Breaking Fields. renormalization group study of higher order gradients, cosines and vortices

Spike-Timing Dependent Plasticity and Relevant Mutual Information Maximization

Decoding. How well can we learn what the stimulus is by looking at the neural responses?

Transcription:

Outline NIP: Hebbian Learning Neural Information Processing Amos Storkey 1/36 Overview 2/36 Types of Learning Types of learning, learning strategies Neurophysiology, LTP/LTD Basic Hebb rule, covariance rule, BCM rule supervised: input u, output v, model p(v u) Eigenanalysis reinforcement: input u and scalar reward r; often associated with a temporal credit assignment problem Constraints: subtractive and multiplicative normalization unsupervised: model p(u) Multiple output neurons Timing-based rules Reading: Reading: [Dayan and Abbott, 2002] ch.8, [Hertz et al., 1991] 5/36 6/36

Learning Strategies Hebb [Hebb, 1949] conjectured that if input from neuron A often contributes to the firing of neuron B, then the synapse from A to B should be strengthened. strengthening long-term potentiation (LTP), weakening long-term depression (LTD) Bottom-up: rules of neural plasticity, e.g. Hebbian Top-down: use of objective function E(w), updates determined by gradient w E(w) strong element of causality extensive biochemical mechanisms, protein synthesis Focus in the course will be mostly on the objective function approach, but we start with Hebbian learning Observed in many systems: hippocampus, neocortex, cerebellum Hebbian learning: long-term modification of synapses based on pre- and post-synaptic activity. (In contrast to: homeostasis, excitability changes, non-local learning such as backpropagation, etc.) 7/36 8/36 Firing-rate model dv = v + w u Assume that τr is small compared to timescale of weight change, so set dv/ = 0 to give v=w u τr Then = f(v, u, w) Basic Hebb rule is LTP and LTD in Schaffer collateral inputs to CA1. Figure from Dayan and Abbott (2001) 9/36 = uv 10/36

Basic Hebb Rule Covariance Rule Average over input distribution hi to give Implementing LTD at low activity = huvi = huut iw = Qw or where Q is the input correlation matrix = h(u θ u )vi If θv = hvi or θ u = hui then Positive feedback instability d w 2 = 2 w = 2hv2 i = Cw where C = h(u hui)(u hui)t i is the input covariance matrix Still unstable as Discrete time version of Hebb rule w w + Qw = hu(v θv )i 11/36 Hebb Rule: Single Postsynaptic Neuron d w 2 = 2hv(v hvi)i = 2var(v) > 0 12/36 PCA Basic Hebb rule = Qw, analyze using an eigendecomposition of Q Qeµ = λµ eµ, λ1 λ2... As Q is SPD, eigenvalues are real and non-negative, and the eigenvectors are orthogonal. Write w(t) = Nu X (w(0) eµ ) exp(λµ t/ )eµ. µ=1 As e1 has largest eigenvalue it grows fastest, so w e1 for large t. Similar analysis for = Cw, 13/36 [Figure: Dayan and Abbott 2001] 14/36

Energy Function Analysis BCM Rule Under [Linsker, 1988] 1 E = wt Cw 2 Using gradient descent dynamics = Cw 15/36 Stability and Competition = (u θ u )v 16/36 Subtractive Normalization For non-negative w set X Hebbian learning involves positive feedback. Control by: i LTD: not enough, covariance and correlation rules wi = n w = const where n = (1, 1,..., 1)T. Dynamics v(n u) = vu n Nu n w is constant, as dn w n n = vn u 1 =0 Nu saturation or bounds prevent weights getting too big or too small normalization over pre-synaptic or post-synaptic arbors subtractive: decrease all synapses by the same amount whether large or small multiplicative: decrease large synapses more than small synapses BCM rule Bienenstock, Cooper and Munro [Bienenstock et al., 1982] proposed = vu(v θv ), for which there is experimental evidence If θv is fixed then the BCM rule is unstable. But if θv is set to a low-pass filtered version of v2 (sliding threshold) then stability is achieved. With a sliding threshold the BCM rule implements competition between synapses we have or one gets LTD with v = 0 or u = 0 resp., which is odd E = w = u(v θv ) Subtractive normalization needs a lower bound to avoid weights becoming negative. If there is no upper saturation constraint then final outcome is that all weights but one become 0. 17/36 18/36

Multiplicative Normalization and the Oja Rule The Effect of Constraints Ensure w 2 is constant by setting [Oja, 1982] Oja rule gives w(t) e1 / α = vu αv2 w Saturation can change this outcome Dot product with 2w gives Subtractive normalization: if e1 n 6= 0 then growth of w in direction of n is stunted d w 2 = 2v2 (1 α w 2 ) which shows that w 2 relaxes to 1/α, preventing the weights growing without bound. Hebbian dynamics with saturation. Principal eigen vector is (1, 1)T / 2 19/36 [Dayan and Abbott 2001] 20/36 Example: Ocular Dominance T Q = huu i = Consider simplified model where a single cortical cell that receives input from two LGN afferents qs qd qd qs with qs > qd (Same and Different). Eigenvectors/values λ 1 = qs + qd e1 = (1, 1)T / 2 T e2 = (1, 1) / 2 λ 1 = qs qd Afferents have activities ul, ur from left and right eyes v = wr ur + wl ul weights are constrained to be non-negative. Ocular dominance occurs when one weight becomes 0 while the other is positive Principal eigenvector does not give rise to OD solution OD can be obtained by use of subtractive normalization, as e1 n 21/36 22/36

Example: Orientation Selectivity Linsker s model [Linsker, 1988] Layered architecture and arbor function constraints + Hebbian learning Layer D Layer C Layer B [Figure: Dayan and Abbott (2001), after Miller (1994)] Given ON and OFF LGN cells, Hebbian learning can give rise to cortical orientation selectivity [Miller, 1994] Layer A Needs constraints to avoid uniform component Uncorrelated noise in retina (layer A) Learned centre-surround receptive fields in layer C, and orientation-selective RF in layer G (bar detector) 23/36 24/36 25/36 26/36 Analysis by MacKay and Miller [MacKay and Miller, 2001] = (Q + k2 J)w + k1 n where J is the matrix of ones and n is the vector of ones Depending on the choice of parameters k1 and k2, uniform, centre-surround and orientation-selective RFs can be obtained

Multiple Output Neurons Fixed recurrent connections M dv τr = v + Wu + Mv at steady state gives With multiple output neurons and no interaction between them, each output neuron would behave similary. (Different initial conditions could still lead to different RFs). def v = (I M) 1 Wu =KWu Hebbian learning of W gives [Dayan and Abbott 2001] 27/36 dw = hvut i = KWQ Can consider 3 cases (i) plastic W, fixed M, (ii) fixed W, plastic M, (iii) plastic W and M Dayan and Abbott (2001) show OD stripes arising from fixed Mexican hat interactions K and plastic W 28/36 Anti-Hebbian modification of lateral connections [Figure: Dayan and Abbott 2001, after (right) Nicholls et al. 1992] Instead of fixed M, make it plastic. One possible goal is to decorrelate outputs, i.e. hvvt i = I Anti-Hebbian learning: synapses decrease when there is simultaneous pre- and postsynaptic activity Recurrent interactions can prevent different outputs from representing the same eigenvector dm = vvt + βm ( ) (need β > 0 to avoid M weights decaying to 0) For suitable β and τm a combination of the Oja rule and ( ) leads to rows of W being different eigenvectors of Q, and M ultimately decaying to 0. Other neural algorithms for multiple-output PCA are possible, e.g. [Sanger, 1989, Oja, 1989] τm 29/36 30/36

Timing-based Rules Timing-based Rules = Z 0 [H(τ )v(t)u(t τ ) + H( τ )v(t τ )u(t)] dτ An approximation to spiking models [Dayan and Abbott (2001) after (left) Markram et al. (1997) and (right) Zhang et al. (1998) ] If H(τ ) is positive for τ > 0 and negative for τ < 0 then first term on RHS is LTP, second is LTD Left: LTP and LTD Right: spike timing dependent plasticity (STDP) in Xenopus. Window of ±50ms, causal order agrees with Hebb conjecture 31/36 Temporal Hebbian rules and trace learning Z 0 T v(t) Z dτ H(τ )u(t τ ) Dayan, P. and Abbott, L. F. (2002). Theoretical Neuroscience. MIT press, Cambridge, MA. F oldi ak, P. (1991). Learning invariance from transformation sequences. Neural Comp., 3:194 200. Filtered version is a (memory) trace of u(t) Hebb, D. (1949). The organization of behavior. New York: Wiley. This rule can be used to model development of invariant responses, e.g. the presence of an object in an image [F oldi ak, 1991, Wallis and Rolls, 1996] 32/36 Bienenstock, E. L., Cooper, L. N., and Munro, P. W. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci., 2:32 48. Temporally dependent Hebbian plasticity depends on the correlation between v(t) and a filtered version of u(t) References I Approximate solution of timing-based rule is 1 w= Hertz, J., Krogh, A., and Palmer, R. G. (1991). Introduction to the theory of neural computation. Perseus, Reading, MA. 33/36 34/36

References II References III Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21(3):105 117. MacKay, D. and Miller, K. (2001). Analysis of Linsker s Simulations of Hebbian Rules. Self-Organizing Map Formation: Foundations of Neural Computation. Sanger, T. (1989). Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 2(6):459 473. Miller, K. D. (1994). A model for the development of simple cell receptive fields and the ordered arrangement of orientation columns through activity-dependent competition between on- and off-center inputs. J Neurosci, 14(1):409 441. Wallis, G. and Rolls, E. T. (1996). A model of invariant object recognition in the visual system. Prog Neurobiol., 51:167 194. Oja, E. (1982). A simplified neuron model as a principal component analyzer. J. Math. Biol., 15:267 273. Oja, E. (1989). Neural networks, principal components, and subspaces. International Journal of Neural Systems, 1(1):61 68. 35/36 36/36