Evolution of the Average Synaptic Update Rule

Similar documents
The homogeneous Poisson process

Consider the following spike trains from two different neurons N1 and N2:

Supporting Online Material for

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

Neuronal Dynamics: Computational Neuroscience of Single Neurons

This script will produce a series of pulses of amplitude 40 na, duration 1ms, recurring every 50 ms.

Linear discriminant functions

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Learning Spatio-Temporally Encoded Pattern Transformations in Structured Spiking Neural Networks 12

REAL-TIME COMPUTING WITHOUT STABLE

Neural Encoding: Firing Rates and Spike Statistics

Hidden Markov Models Part 1: Introduction

Disambiguating Different Covariation Types

Revision: Neural Network

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Linear & nonlinear classifiers

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Synapse Model. Neurotransmitter is released into cleft between axonal button and dendritic spine

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

Lecture 13: Structured Prediction

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Neural Networks and the Back-propagation Algorithm

Lecture 6. Regression

ECE521 Lecture7. Logistic Regression

2.3 Estimating PDFs and PDF Parameters

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Synaptic dynamics. John D. Murray. Synaptic currents. Simple model of the synaptic gating variable. First-order kinetics

Two special equations: Bessel s and Legendre s equations. p Fourier-Bessel and Fourier-Legendre series. p

Classification with Perceptrons. Reading:

How do synapses transform inputs?

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Midterm: CS 6375 Spring 2015 Solutions

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Mid Year Project Report: Statistical models of visual neurons

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.

Nature Neuroscience: doi: /nn.2283

Fast neural network simulations with population density methods

Relative Loss Bounds for Multidimensional Regression Problems

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

arxiv: v3 [q-bio.nc] 18 Feb 2015

Neural Networks in Structured Prediction. November 17, 2015

arxiv: v1 [cs.lg] 10 Aug 2018

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Linear Regression, Neural Networks, etc.

Neural Networks and Deep Learning

Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity

Artifical Neural Networks

Is the superposition of many random spike trains a Poisson process?

Single-Trial Neural Correlates. of Arm Movement Preparation. Neuron, Volume 71. Supplemental Information

#A offered. #B offered. Firing rate. Value of X

Introduction to Machine Learning Midterm Exam

The connection of dropout and Bayesian statistics

CS798: Selected topics in Machine Learning

Math in systems neuroscience. Quan Wen

Week 4: Differentiation for Functions of Several Variables

Problem Sheet 1 Examples of Random Processes

Neuromorphic Sensing and Control of Autonomous Micro-Aerial Vehicles

Neural networks III: The delta learning rule with semilinear activation function

Lecture 4: Training a Classifier

The Spike Response Model: A Framework to Predict Neuronal Spike Trains

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

Neuron, Volume 63. Supplemental Data. Generating Coherent Patterns of Activity. from Chaotic Neural Networks. David Sussillo and L.F.

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Results: MCMC Dancers, q=10, n=500

A brief introduction to Conditional Random Fields

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Experiment 2 Random Error and Basic Statistics

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation

Inventory of Supplemental Information

1 INFO Sep 05

MA6451 PROBABILITY AND RANDOM PROCESSES

Linear & nonlinear classifiers

How do we compare the relative performance among competing models?

Fast classification using sparsely active spiking networks. Hesham Mostafa Institute of neural computation, UCSD

1 R.V k V k 1 / I.k/ here; we ll stimulate the action potential another way.) Note that this further simplifies to. m 3 k h k.

Mathematical Models of Dynamic Behavior of Individual Neural Networks of Central Nervous System

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

epochs epochs

THE retina in general consists of three layers: photoreceptors

Several ways to solve the MSO problem

ENGR352 Problem Set 02

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Gaussian with mean ( µ ) and standard deviation ( σ)

The Perceptron Algorithm

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Data Mining Part 5. Prediction

Probabilistic Models in Theoretical Neuroscience

Numerical Methods Lecture 7 - Statistics, Probability and Reliability

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

CS:4420 Artificial Intelligence

CSC321 Lecture 4: Learning a Classifier

probability of k samples out of J fall in R.

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Graphical models for part of speech tagging

Transcription:

Supporting Text Evolution of the Average Synaptic Update Rule In this appendix we evaluate the derivative of Eq. 9 in the main text, i.e., we need to calculate log P (yk Y k, X k ) γ log P (yk Y k ). () w j P (y k Y k ) P (y k Y k ) Before we start let us recall some notation. The average of an arbitrary function f w with arguments x and y is by definition f w (x, y) x,y = p w (x, y) f w (x, y) (2) x y where p w (x, y) denotes the joint probability of the pair (x, y) to occur and the sum runs over all configurations of x and y. The subscript w indicates that both the probability distribution p w and the function f w may depend on a parameter w. By definition, we have p w (x, y) = p w (y x)p(x) where p(x) is a given input distribution and p w (y x) the (parameter-dependent) conditional probability of generating an output y given x. Hence Eq. 2 can be transformed into f w (x, y) x,y = p(x) p w (y x) f w (x, y) = p w (y x) f w (x, y) (3) x y If we now take the derivative with respect to the parameter w, the product rule yields two terms w f w(x, y) x,y = p w (y x) w f w(x, y) (4) y x [ ] + p w (y x) w log p w(y x) f w (x, y) y x The first term contains the derivative of the function f w whereas the second term contains the derivative of the conditional probability p w. We note that Eq. 4 can also be written in the form w f w(x, y) x,y = w f w(x, y) + x,y y [ ] w log p w(y x) f w (x, y) x,y x, (5)

i.e., as an average over the joint distribution of x and y. This formulation will be useful for the problem at hand. The gradient in Eq. contains several terms and for the moment we pick only one of these. The others will then be treated analogously. Let us focus on the term log P (y k Y k, X k ) and apply steps completely analogous to those leading from Eqs. 2-5. log P (y k Y k, X k ) w j = log P (y k Y k, X k ) (6) w j Y k,x [ ] k + log P (Y k X k ) log P (y k Y k, X k ) w j We now evaluate the averages using the identity Y k,x = k y k Y k,x k Y k,xk. We find that the first term on the righthand side of Eq. 6 vanishes, since log P (y k Y k, X k ) w j y k Y k,x k = [ log P (y k Y k, X k ) ] P (y k Y k, X k ) w j y k {,} = P (y k Y k, X k ) = (7) w j y k {,} because of the normalization of probabilities. The same argument can be repeated to show that = w j log P (y k Y k ) The reference y k Y k,x k. distribution P (y k Y k ) is by definition independent of w j. Hence the only term that gives a non-trivial contribution on the righthand side of Eq. 6 is the second term. With an analogous argument for the other factors in Eq. we have w j [ log P (Y k X k ) = w j log P (yk Y k, X k ) P (y k Y k ) γ log P (yk Y k ) P (y k Y k ) ] ( log P (yk Y k, X k ) γ log P (yk Y k ) P (y k Y k ) P (y k Y k ) 2 ) (8)

An identification of the factors C, F, and G in the main text is straightforward. From Eq. 4 in the main text we have log P (y k Y k, X k ) = y k log(ρ k ) + ( y k ) log( ρ k ) (9) Hence we can evaluate the factors F k = log P (yk Y k, X k ) P (y k Y k ) =y k log ρk ρ k +( yk ) log ρk ρ k G k = log P (yk Y k ) P (y k Y k ) =yk log ρk ρ +( yk ) log ρk ρ Furthermore we can calculate the derivative needed in Eq. 8 using the chain rule from Eq. 6 of the main text, i.e., P (Y k X k ) = k P (y l Y l, X l ) () l= which yields log P (Y k X k ) = w j w j k = l= k log P (y l Y l, X l ) () l= ρ ] yl l ρ l [ y l ρ l n ɛ(t l t n ) x n j (2) We note that in Eq. 8 the factor w j log P (Y k X k ) has to be multiplied with F k or with G k before taking the average. Multiplication generates terms of the form y l y k Y k,x = k yl y k Y k X k X For any given input X k, the k autocorrelation y l y k Y k X with l < k of the postsynaptic neuron will have k a trivial value y l y k Y k X k = yl Y k X k yk Y k X k for k l > k a (3) where k a t is the width of the autocorrelation. As a consequence [ y l ρ ] yl (F k γg k) = for k l > k l ρ l a (4) 3

Hence, for k > k a, we can truncate the sum over l in Eq. 8, i.e., k l= k l=k k a which yields exactly the coincidence measure C j introduced in the main text; cf. Eq. in the main text, and which we repeat here for convenience C k j = k l=k k a [ y l ρ ] yl l ρ l From Averages to an Online Rule ρ l n ɛ(t l t n ) x n j (5) The coincidence measure Cj k counts coincidences in a rectangular time window. If we replace the rectangular time window by an exponential one with time constant τ C and go to continuous time, the summation k l=k k a... in Eq. 5 turns into an integral t dt exp[ (t t )/τ C ]... which can be transformed into a differential equation dc j (t) dt = C j(t δ) + τ C f ɛ(t t (f) j )S(t) [ δ(t ˆt δ) g(u(t))r(t) ] ; (6) cf. Eq. 5 in the main text. Based on the considerations in the previous paragraph, the time constant τ C should best be chosen in the range k a t τ C k a t. Similarly, the average firing rate ρ(t) = ḡ(t) R(t) can be estimated using a running average dḡ(t) τḡ dt = ḡ(t) + g(u(t)) (7) with time constant τḡ. In Fig. 6, we compare the performance of three different update schemes in numerical simulations. In particular, we show that (i) the exact value of the truncation of the sum in Eq. 5 is not relevant, as long as k a t is larger than the width of the autocorrelation; and (ii) that the online rule is a good approximation to the exact solution. To do so we take the scenario from Fig. 3 of the main text. For each segment of s, we simulate one hundred pairs of input and output spike trains. We evaluate numerically Eq. 8 by averaging over the samples. After each segment of second (=, time steps) we update the weights 4

using a rule without truncation in the sum of Eq. 5. We call this the full batch update; compare. Fig. 6 (Top). Second, we use the definition of Cj k with the truncated sum and repeat the above steps; Fig. 6 (Middle). The truncation is set to k a t = 2ms which is well above the expected width of the autocorrelation function of the postsynaptic neuron. We call this the truncated batch rule. Third, we use the online rule discussed in the main body of the paper with τ C = s; Fig. 6 (Bottom). Comparison of top and center graphs of Fig. 6 shows that there is no difference in the evolution of mean synaptic efficacies, i.e., the truncation of the sum is allowed, as expected from the theoretical arguments. A further comparison with Fig. 6 Bottom shows that updates based on the online rule add some fluctuations to the results, but its trend captures nicely the evolution of the batch rules. Supplement to the Pattern Detection Paradigm In Fig. 3 we presented a pattern detection paradigm where patterns defined by input rates were chosen randomly and applied for one second. After learning, the spike count over one second is sensitive to the index of the pattern. Fig. 7A shows the histogram of spike counts for each pattern. Optimal classification is achieved by choosing for each spike count the pattern which is most likely. With this criterion 8 percent of the patterns will be classified correctly. The update of synaptic efficacies depends on the choice of the parameter γ in the learning rule. According the the optimality criterion in Eq. 8 of the main text, a high level of γ implies a strong homeostatic control of the firing rate of the postsynaptic neuron whereas a low level of γ induces only a weak homeostatic control. In order to study the role of γ, we repeated the numerical experiments for the above pattern detection paradigm with a value of γ = instead of our standard value of γ =. Fig. 7B shows that the output firing rate is still modulated by the pattern index, the modulation at γ = is, however, weaker than that at γ =. As a results, pattern detection is less reliably with 45 percent correct classification only. We note that this is still significantly higher than the chance level of 25 percent. 5

wbatch wtrunc PSfrag replacements won line.5.5.5 2 4 6 8 time [min] Figure 6: Evolution of the synaptic efficacies for the pattern detection paradigm of Fig. 3 during the first minutes of simulated time. Red: mean synaptic efficacy of the 25 synapses that received pattern-dependent input rates. Blue: mean synaptic efficacy of the remaining 75 synapses. The batch update rule (top), the truncated batch rule (middle) and the online rule (bottom) yield comparable results. PSfrag replacements pattern A n/np.35.3.25.2.5..5 2 3 4 n sp B ν post [Hz] PSfrag replacements n sp n/n p 35 3 25 2 5 5 2 3 4 pattern Figure 7: Pattern detection. A Histograms of spike counts n sp over one second (horizontal axis, bin size 2) during presentation of pattern (dark blue), pattern 2 (light blue), pattern 3 (yellow), and pattern 4 (red). Vertical scale: number of trials n with a given spike count divided by total number N p of trials for that pattern. B Spike count during one second (mean and variance) for each of the four patterns with a parameter value γ = (light blue) and γ = (dark blue). The values for γ = are redrawn from Fig. 3. 6