Bayesian synaptic plasticity makes predictions about plasticity experiments in vivo

Size: px
Start display at page:

Download "Bayesian synaptic plasticity makes predictions about plasticity experiments in vivo"

Transcription

1 Bayesian synaptic plasticity makes predictions about plasticity experiments in vivo Laurence Aitchison and Peter E. Latham October 13, 014 arxiv: v [q-bio.nc] 10 Oct 014 Abstract Humans and other animals learn by updating synaptic weights in the brain. Rapid learning allows animals to adapt quickly to changes in their environment, giving them a large selective advantage. As brains have been evolving for several hundred million years, we might expect biological learning rules to be close to optimal, by exploiting all locally available information in order to learn as rapidly as possible. However, no previously proposed learning rules are optimal in this sense. We therefore use Bayes theorem to derive optimal learning rules for supervised, unsupervised and reinforcement learning. As expected, these rules prove to be significantly more effective than the best classical learning rules. Our learning rules make two predictions about the results of plasticity experiments in active networks. First, we predict that learning rates should vary across time, increasing when fewer inputs are active. Second, we predict that learning rates should vary across synapses, being higher for synapses whose presynaptic cells have a lower average firing rate. Finally, our methods are extremely flexible, allowing the derivation of optimal learning rules based solely on the information that is assumed, or known, to be available to the synapse. This flexibility should allow for the derivation of optimal learning rules for progressively more complex and realistic synaptic and neural models allowing us to connect theory with complex biological reality. 1 Introduction Synapses connect neurons, allowing an action potential in the presynaptic neuron to influence the membrane potential of the postsynaptic neuron. The size of this membrane potential change is known as the postsynaptic potential PSP, or alternatively the synaptic weight. Synaptic weights are updated based on information available locally at the synapse, for instance pre and postsynaptic activity. Theoretical neuroscientists have noted that there are three types of learning, supervised learning, reinforcement learning and unsupervised learning, corresponding to three essential cognitive tasks [1, ]. First, supervised learning involves making predictions e.g. about future events from a set of inputs [3]. For instance, a simple supervised learning task is to predict a single scalar output from a vector of inputs. To learn how to accomplish this task, the system repeatedly receives an input, makes a prediction and is given an error signal 1

2 saying whether the prediction was too high or too low. It is believed that the cerebellum uses supervised learning to make short timescale predictions e.g. predicting air puffs from aural cues [4] and to assist in fine motor control [5]. Second, reinforcement learning involves learning behavioural responses that maximize reward e.g. food reward [6]. For instance, a simple reinforcement learning task is to map an input vector to an action represented by a single scalar value. To learn how to accomplish this task, the system repeatedly receives an input, chooses an action, and is told the reward for that action. It is believed that the basal ganglia uses reinforcement learning to perform the critical task of action selection []. Finally, unsupervised learning involves finding structure in the inputs, perhaps by finding a direction in the input space with an interesting non-gaussian distribution [7]. It is believed that the cortex performs unsupervised learning, allowing us to understand and utilize the incredible volume of sensory data entering the brain [8]. Local learning rules have been proposed for supervised [], reinforcement [9] and unsupervised [10, 11, 1] learning. These rules work well when given a large amount of data whose statistical properties do not change over time. However, being non-bayesian, these rules do not exploit all locally available information in order to learn as rapidly as possible, which becomes problematic when there is less data available, or when the data changes over time. We therefore derived Bayesian synaptic learning rules. These learning rules do use all locally available information up to approximations, are more effective than optimized classical learning rules, and make predictions about the results of plasticity experiments in active networks. Results We begin by writing down probabilistic generative models for the information received by a neuron. Then we derive Bayesian learning rules using the statistical structure given by the generative model. We demonstrate, in simulation, that our learning rules are far more effective than classical rules. Finally, our learning rules give strong predictions about the results of plasticity experiments under realistic patterns of network activity..1 The model The neuron s output, y, is a weighted sum of its imputs, x i, plus noise, y = i x i w i + γ y η. 1 where η is standard Gaussian noise, P η = N η; 0, 1, and every future mention of η is independently drawn standard Gaussian noise. In practice, PSPs, or equivalently synaptic weights, w i, are noisy, in large part due to stochastic vesicle release [13]. We model these noisy PSPs using w i = m i + km i η i, where m i is the mean PSP and km i is the variance. We chose this particular relationship between the mean and variance of PSPs in order to match experimental data [14]. As y is not constrained to be positive, we can think of y as

3 either the membrane potential, or the difference between the actual firing rate and some baseline firing rate. For supervised and reinforcement learning, the number of spikes, x i, is drawn from a Poisson distribution, whose rates are chosen to mimic the distribution of rates in cortex see Methods. These neurons receive a feedback signal, f, which is a deterministic function of a noisy error signal, δ = y opt y + γ f η. 3 Here, y opt is the optimal output, which is defined in terms of the optimal weights, w opt,j, y opt = j w opt,j x j. 4 Note that the neuron s eventual goal will be to set its average weights, m i, as close as possible to the optimal weights, w opt,i, so that the cell s outputs, y are as close as possible to the optimal outputs, y opt. For supervised learning with continuous feedback, the feedback, f, is the noisy error signal, f = δ. 5 For supervised learning with binary feedback, f = 1 if the noisy error signal is above a threshold, θ, θ,and f = 1 otherwise, { 1 if θ < δ f y, y opt = 6 1 otherwise. Finally, for reinforcement learning the feedback, or reward, reports the magnitude of the noisy error signal, but not its direction, f = δ. 7 In supervised and reinforcement learning, there was a feedback signal that depends on w opt, so the goal was to find w opt from the feedback signal. In contrast, for unsupervised learning, there is no feedback signal. Instead, there is interesting statistical structure in the inputs, x, that depends on w opt, so the goal is to find w opt solely from the statistics of the inputs. For a Bayesian algorithm to find such interesting structure, and to generate data on which to test our model neuron we must first write down a generative model, describing how to generate inputs, x with the desired interesting structure. We chose to have the inputs, x, be Gaussian in every direction except w opt. Along w opt, we set the distribution over x to be Laplacian. In order to enforce that structure, we generated y opt on every time step from P y opt = Laplacian y opt ; 0, b, 8 where b is chosen so that the variance along w opt is similar to the variance in other directions. In order to generate a set of inputs, x, that is consistent with both y opt and the firing rate distribution, we sample x from, P x y opt N x; 0, Λ δ j w opt,j x j y opt 9 3

4 where the diagonal elements of Λ are chosen to match the variance of a Poisson process with the appropriate rate, and the off-diagonal elements are set to 0, Λ ij = δ ij ν i. 10 With these inputs any x i can be positive on one timestep and negative on another, so there is no distinction between excitatory and inhibitory cells. However, it is possible to interpret x i not as the inputs themselves, but as the difference between the actual input and the mean input. Using this interpretation, we could set the mean input to be large and positive for excitatory cells, and large and negative for inhibitory cells giving distinct excitatory and inhibitory populations. Finally, we do not expect the optimal weights to remain the same over all time, but to change slowly as the surrounding neural circuits and the statistics of the environment change. While these changes may actually be quite systematic, to a single synapse deep in the brain they will appear to be random. We formalise these random changes by assuming that the logarithm of the optimal weights, λ opt,i = log w opt,i follows a discretised Ornstein-Uhlenbeck process, λ opt,i t + 1 λ opt,i t = 1 1 σprior λ opt,i t µ prior + µ prior + η i. τ τ 11 where t, and the timescale τ are written in terms of time steps. We chose this particular noise process for three reasons. First, w opt,i = e λopt,i takes only positive values, so excitatory weights cannot become inhibitory, and vice versa. Second, spine sizes obey this stochastic process [15], and while synaptic weights are not spine sizes, they are correlated [16]. Third, this noise process gives a log-normal stationary distribution of weights, as is observed [14].. Inference The synapse s goal is to set its mean weight, m i t + 1, as close as possible to the optimal weight, w opt,i t + 1. Of course, the synapse does not know the optimal weight, so the best it can do is to set m i t + 1 equal to the expected value of the optimal weight given all past data, m i t + 1 = E [w opt,i t + 1 D i t]. 1 Here, D i t = {d i t, d i t 1... d i 1} is all past data, and d i t is the data at time step t. The data at time t, denoted d i t, consists of the presynaptic activity, x i, the postsynaptic activity, y, the actual PSP, w i, a feedback signal, f if performing supervised or reinforcement learning, and information about the membrane time constant. Knowledge of the membrane time constant is not essential for learning, but proves to be useful for setting the learning rate on short timescales. In particular, the learning rate will turn out to depend on s like = j x j s j + km j, 13 where s i = Var [w i D i t 1]. 14 4

5 Notice that s like, like the membrane time constant, is a measure of the overall level of input regardless of whether those inputs are excitatory or inhibitory, so, while s like and the membrane time constant are not exactly the same, the synapse should be able to estimate s like reasonably accurately from the membrane time constant, especially if the cell has many inputs. We therefore assume that s like is known by the synapse, even if, in reality, the cell has some uncertainty about the value of s like. To compute P w opt,i t + 1 D i t, and hence E [w opt,i t + 1 D i t], the neuron takes the distribution at the previous time step, P w opt,i t D i t 1, uses Bayes theorem to incorporate new data, then takes into account the random changes in optimal weights across time. To take random changes into account, the neuron marginalises out the weights at the previous time step, P w opt,i t + 1 D i = dw opt,i P w opt,i t + 1 w opt,i P w opt,i D i. 15 Here and in future, all quantities without an explicitly specified time are evaluated at time step t, so for instance w opt,i = w opt,i t. To compute P w opt,i D i, the neuron must take its distribution at the previous time step, P w opt,i D i t 1, and use Bayes theorem to incorporate the data received at timestep t, P w opt,i D i P d i w opt,i P w opt,i D i t The resulting inferred distributions can be far too complex to deal with computationally, and certainly too complex for a synapse to work with, so we need to specify a family of approximate distributions. We chose to approximate the distribution over the log-weight, λ opt,i = log w opt,i using a Gaussian, with mean µ i, and variance σ i, P λ opt,i Dt 1 = N λ opt,i ; µ i, σ i. 17 This approximation has two advantages. First, w opt,i = e λopt,i takes only positive values, leading to learning rules that cannot, for instance, take an excitatory synapse and turn it inhibitory. Second, if the synapse is not given any data, then the dynamics Equation 11 imply that the distribution over λ opt,i will be Gaussian, which can be captured perfectly by this approximating distribution. In order to set the parameters, µ i and σi, of the approximate distribution, we do not need the full distribution, P λ opt,i t + 1 D i, but only its mean and variance, which substantially simplifies the learning rules. In particular, for supervised and reinforcement learning the update rules are see Methods for a full derivation µ i t + 1 = E [λ opt,i t + 1 D i ] = 1 1 µ i + E [ δ f, s ] τ like, x i, w i + xi w i m i x i m i σi s like,i σi t + 1 = Var [λ opt,i t + 1 D i ] = σi Var [ δ f, s like, x ] i, w i s like,i τ s like,i σ i x i m i s like,i µ prior + µ prior, 18a + σ like τ 18b 5

6 A Supervised continuous B Supervised binary Weight mv Weight mv Time min C Reinforcement D Unsupervised 10.0 Optimal weight Inferred weight Weight mv Weight mv Error bars Time min Time min Time min Figure 1: Bayesian learning rules track the true weight, and estimate uncertainty. The blue line is the true weight, the red line represents the mean of the inferred distribution, eµi, and the red area represents two standard deviation error bars, eµi ±σi. The time course is 3 time constants. A Supervised learning, continuous feedback. B Supervised learning, binary feedback. C Reinforcement learning. D Unsupervised learning. where slike,i = slike xi kmi, 19 and the exact form for E δ f, slike, xi, wi and Var δ f, slike, xi, wi are given in the methods. While these rules may look complex, the update rule for the mean is similar to the original delta rule []. In particular, the change in weight is given by the product of three terms. First, xi, ensures that the synpse only gets updated if the input was on. Second, there is an error signal, E δ f, slike, xi, wi + xi wi mi, stating whether this particular synaptic weight should increase or decrease. Third, there is an adaptive learning rate, mi σi /slike,i. The learning rule for σi is similar, though more difficult to interpret..3 Simulations Simulating each of these rules gives good results the inferred log-weight, µi tracks the optimal log-weight, λopt,i, with λopt,i straying outside of the two standard deviation error bars about 5% of the time Figure 1. We used 1000 inputs, and a time step of 10 ms. For supervised learning we used a time constant, τ, of 1, 000 s or around 15 minutes, and for reinforcement and unsupervised learning we used a time constant of 10, 000 s or around 1 hours. The firing rate distribution was chosen to be intermediate between relatively narrow ranges found by some [17], and the extremely broad ranges found by others [18]. In particular, we use a log-normal distribution, with median at 1 Hz, and with 95% of firing rates being between 0.1 Hz and 10 Hz. The parameters of the dynamics, µprior were chosen by fitting a lognormal distribution to measured synaptic and σprior weights [14]. We found that our Bayesian learning rules are more effective than standard classical learning rules with any learning rate Figure. We evaluated perfor6

7 Classical MSE Bayesian MSE A Supervised continuous Learning rate Classical MSE Bayesian MSE B Supervised binary Learning rate C Reinforcement 15 D Unsupervised 15 Classical MSE Bayesian MSE 10 5 Classical MSE Bayesian MSE Learning rate Learning rate Figure : Bayesian learning rules have a lower mean square error than classical learning rules. The black line is the mean square error for the classical learning rule with a given learning rate x-axis, divided by the mean square error for the Bayesian learning rule, which is constant as it is parameter-free. The red line is 1. A Supervised learning, continuous feedback. B Supervised learning, binary feedback. C Reinforcement learning. D Unsupervised learning. mance using the mean square error MSE between w opt,i and m i. More details, including the classical learning rules used for each type of learning are given in the Methods..4 Predictions Our learning rules make two predictions, both of which are intuitively sensible and experimentally testable. First, the learning rate should vary across time, falling when many inputs are active, and rising when few inputs are active in our equations, this occurs because of the 1/s like,i term in Equation 18a. We can see that this prediction is intuitively sensible by considering a time step upon which only one input fires. In this case, the synaptic weight associated with the input that fired is entirely responsible for any error, so that synaptic weight should be updated drastically. In contrast, if many cells fire then it is not clear which synaptic weights are responsible for an error, so it is sensible to make only small updates to the weights. In our learning rules this occurs because s like, which appears in the denominator of the learning rate in Equation 18a, increases as the number of active inputs increases, Equation 13. We therefore predict fast learning during plasticity protocols in silent networks, when only the stimulated cells are active, and slow learning in active networks. There is some circumstantial evidence in favour of this prediction, in that weights can be changed within a few minutes during plasticity experiments in silent slices [19], whereas weight changes in vivo, in active networks occur over much longer timescales [15]. However, a direct test of this prediction remains necessary. Second, learning rates should vary across synapses, being lower for synapses whose pre-synaptic cells have a higher average firing rate. We can see that this 7

8 s m A Supervised continuous ν s m B Supervised binary ν s m C Reinforcement ν s m D Unsupervised ν Figure 3: Uncertainty, and hence learning rate, increases as firing rate decreases. A Supervised learning, continuous feedback. B Supervised learning, binary feedback. C Reinforcement learning. D Unsupervised learning. is intuitively sensible by comparing a synapse whose presynaptic cell fires rarely to a synapse whose presynaptic cell fires frequently. If the pre-synaptic cell fires rarely, the synapse is given only a few opportunities to update the weight, so should make large updates. In contrast, if the presynaptic cell fires rapidly, then the synapse is given many opportunities to update the weight, so should update the weight only a little for each presynaptic spike. In our learning rules, this occurs because lower firing rates give the cell fewer opportunities to learn, leading to higher uncertainty and hence higher learning rates. More formally, we show in the Methods that, δm i m i s i m i 1 νi. 0 The first relationship comes from manipulating the learning rules for the mean Equation 18a, and the second comes from mean field calculations detailed in Methods, and confimed in simultations Figure 3. 3 Discussion We derived Bayes optimal learning rules, which allow a synpase to exploit all locally available information in order to learn as rapidly as possible. These rules allow synapses to learn far more rapidly than optimized classical methods when performing supervised, unsupervised and reinforcement learning tasks. Furthermore, our Bayes optimal learning rules suggest that learning rates should vary across time and across synapses. In particular, learning rates should be higher at times when there are active fewer inputs into a cell, and learning rates should be higher for synapses whose presynaptic cells fire more infrequently. Both of these predictions can, in principle, be tested, something that we hope will occur in the next generation of plasticity experiments. 8

9 Out of necessity, we have made a variety of specific choices about the information synapses have available to them, and about the statistical properties of a neuron s inputs. However, our framework is unique in providing the machinery to derive a learning rule for any combination of locally available information and input statistics. There are therefore two important directions for future work: considering more complex and realistic locally available information, and considering more complex and realistic input statistics. There are many directions to explore concerning the information available to a synapse. First, the synapse might have access to more information than we assumed, perhaps it might have some information about the activity of the surrounding synapses, allowing the synapse to learn something about the correlations between its weight and surrounding weights. Second, the synapse might have access to less information than we assumed, perhaps the postsynaptic activity does not backpropagate effectively to the synapse, or the synapse is not aware of a presynaptic spike because of a vesicle release failure. Finally, with a detailed characterisation of the electrical properties of dendrites, it may be possible to find a learning rule that is based only on the membrane potential and calcium signals at the synapse. In each of these cases the resulting learning rules may well make interesting and testable predictions. Our choices about the statistics of the inputs were intended to be a first step, so there are again many interesting directions to explore. Neural networks exhibit a variety of different global behaviours, from oscillations with changing power and frequency [0] to up-down states [1]. Furthermore, there are debates about the structure of neural activity whether it is correlated [] or whether it exists on a low-dimensional manifold [3]. When considering learning in realistic neural networks, it will be important for neurons and synapses to take into account both the structure and state of neural activity. The flexibility of our approach provides a means to connect relatively abstract theory to complex biological reality, an approach that, we hope, will significantly increase our understanding of synaptic plasticity. Acknowledgements We would like to acknowledge Jesper Sjöstrum for valuable feedback on this manuscript. This work was supported by the Gatsby Charitable Foundation. References [1] K. Doya, Complementary roles of basal ganglia and cerebellum in learning and motor control, Current Opinion in Neurobiology, vol. 10, no. 6, pp , 000. [] P. Dayan and L. F. Abbott, Theoretical Neuroscience. Cambridge, MA: MIT Press, 001. [3] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, The elements of statistical learning: data mining, inference and prediction, The Mathematical Intelligencer, vol. 7, no., pp ,

10 [4] J. J. Kim and R. E. Thompson, Cerebellar circuits and synaptic mechanisms involved in classical eyeblink conditioning, Trends in Neurosciences, vol. 0, no. 4, pp , [5] E. R. Kandel, J. H. Schwartz, T. M. Jessell, and others, Principles of Neural Science. McGraw-Hill New York, 000. [6] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning. MIT Press, [7] N. Intrator and L. N. Cooper, Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions, Neural Networks, vol. 5, no. 1, pp. 3 17, 199. [8] K. Koch, J. McLean, R. Segev, M. A. Freed, M. J. Berry II, V. Balasubramanian, and P. Sterling, How much the eye tells the brain, Current Biology, vol. 16, pp , 006. [9] H. Sprekeler, G. Hennequin, and W. Gerstner, Code-specific policy gradient rules for spiking neurons, in Advances in Neural Information Processing Systems, pp , 009. [10] D. O. Hebb, The organization of behavior: A neuropsychological approach. John Wiley & Sons, [11] R. P. Rao and D. H. Ballard, Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects, Nature Neuroscience, vol., no. 1, pp , [1] J. Zylberberg, J. T. Murphy, and M. R. DeWeese, A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields, PLoS Computational Biology, vol. 7, no. 10, p. e10050, 011. [13] T. Branco and K. Staras, The probability of neurotransmitter release: variability and feedback control at single synapses, Nature Reviews Neuroscience, vol. 10, no. 5, pp , 009. [14] S. Song, P. J. Sjöström, M. Reigl, S. Nelson, and D. B. Chklovskii, Highly nonrandom features of synaptic connectivity in local cortical circuits, PLoS Biology, vol. 3, no. 3, p. e68, 005. [15] Y. Loewenstein, A. Kuras, and S. Rumpel, Multiplicative dynamics underlie the emergence of the log-normal distribution of spine sizes in the neocortex in vivo, The Journal of Neuroscience, vol. 31, pp , June 011. [16] M. Matsuzaki, N. Honkura, G. C. Ellis-Davies, and H. Kasai, Structural basis of long-term potentiation in single dendritic spines, Nature, vol. 49, no. 6993, pp , 004. [17] D. H. O Connor, S. P. Peron, D. Huber, and K. Svoboda, Neural activity in barrel cortex underlying vibrissa-based object localization in mice, Neuron, vol. 67, no. 6, pp ,

11 [18] K. Mizuseki and G. Buzsáki, Preconfigured, skewed distribution of firing rates in the hippocampus and entorhinal cortex, Cell Reports, vol. 4, no. 5, pp , 013. [19] P. J. Sjöström, G. G. Turrigiano, and S. B. Nelson, Rate, timing, and cooperativity jointly determine cortical synaptic plasticity, Neuron, vol. 3, no. 6, pp , 001. [0] S. Ray and J. H. Maunsell, Differences in gamma frequencies across visual cortex restrict their possible use in computation, Neuron, vol. 67, no. 5, pp , 010. [1] A. Destexhe, S. W. Hughes, M. Rudolph, and V. Crunelli, Are corticothalamic up states fragments of wakefulness?, Trends in Neurosciences, vol. 30, no. 7, pp , 007. [] A. S. Ecker, P. Berens, G. A. Keliris, M. Bethge, N. K. Logothetis, and A. S. Tolias, Decorrelated neuronal firing in cortical microcircuits, Science, vol. 37, no. 5965, pp , 010. [3] M. Y. Byron, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity, in Advances in neural information processing systems, pp , 009. [4] S. J. Julier and J. K. Uhlmann, A new extension of the kalman filter to nonlinear systems, in Int. symp. aerospace/defense sensing, simul. and controls, vol. 3, pp. 3, Orlando, FL, Methods 4.1 Bayesian update rules for supervised and reinforcement learning Here we show how to update the approximate posterior distribution over λ opt,i given new data. We begin by rewriting Equations 15 and 16 in terms of λ opt,i rather than w opt,i, P λ opt,i t + 1 D i = dλ opt,i P λ opt,i t + 1 λ opt,i P λ opt,i D i, 1 P λ opt,i D i P d i λ opt,i P λ opt,i D i t 1. Our approach is to approximate P λ opt,i D i by a Gaussian, in which case, Equation 1 is a Gaussian integral that can be computed straightforwardly. However, computing a Gaussian approximation to P λ opt,i D i is non-trivial. We begin by writing out the posterior in full, P λ opt,i D i = P λ opt,i f, y, s like, x i, w i, λ opt,i, D i t 1. 3 As f is a deterministic function of δ, we can introduce and integrate out δ, P λ opt,i D i = dδp λ opt,i δ, y, s like, x i, w i, D i t 1 P δ f, s like, x i, w i 4 11

12 This step converts one hard inference problem, inferring λ opt,i from the feedback signal, into two easier inference problems, inferring δ from the feedback signal, and inferring λ opt,i from δ. The process of inferring λ opt,i from δ does not depend on details of the feedback signal, so we consider this problem first. We then consider the posterior over δ induced by various types of feedback signal. In order to find the distribution over λ opt,i given δ we use Bayes theorem, P λ opt,i δ, y, s like, x i, w i, D i t 1 = P δ y, s like, x i, w i, λ opt,i P λopt,i D i t 1. 5 The prior, P λ opt,i D i t 1, is given by the approximating distribution Equation 17, but the correct form for the likelihood, P δ y, s like, x i, w i, λ opt,i is less clear. In order to find this likelihood, we note that if all the inputs were known then we could write P y x, w i, λ opt,i = N y; m cell + x i w i m i, km j x j + γ y 6 j i and similarly, P y opt x, w i, λ opt,i = N y opt ; m cell + x i e λ opt,i m i, s jx j 7 where, j i m cell = j m j x j. 8 Therefore, δ, as defined by Equation 3, does not depend on m cell, P δ x, w i, λ opt,i = N δ; x i e λ opt,i m i xi w i m i, s like km i + s i x i. 9 In fact, the distribution over δ depends only on x through x i and s like, P δ x, w i, λ opt,i = P δ s like, x i, w i, λ opt,i 30 = N δ; x i e λ opt,i m i xi w i m i, s like km i + s i x i. 31 where s like,i is given by Equation 19. Now that we have found the likelihood Equation 31, we must think about how to deal with the nonlinearity in the mean of the likelihood. We can linearise the problematic term using e λopt,i m i 1 + λ opt,i µ i. 3 While this approximation does not have exactly the right behaviour around λ opt,i = µ i, it does have the correct expectation, expected gradient with respect 1

13 to λ opt,i, and approximately the correct variance properties that are important to ensure the stability and accuracy of the resulting learning rule. This approximation, in combination with Equation 31, gives P λ opt,i δ, y, s like, x i, w i, D i t 1 33 exp δ x i m i λ opt,i µ i w i m i exp s like,i x i s i λ opt,i µ i σi. 34 Rewriting the first term in the product as a Gaussian over λ opt,i, using the fact that s i m i σ i, gives, λ opt,i µ i + δ+xiwi mi x im i = exp s exp λ opt,i µ i like,i σi σi. 35 x i m i This distribution can be written as a Gaussian over λ opt,i, with mean, µ λopt,i δ = µ i + δ + x i w i m i x im i σ i s like,i 36 and variance, σ λ opt,i δ = σ i 1 σ i x i m i s like,i. 37 In order to find the next approximate distribution over λ opt,i Equation 17, we only need to know the mean and variance of P λ opt,i f, y, x i, w i, not the full distribution. Therefore, instead of integrating out δ, as prescribed by Equation 4, we use the law of totat expectation and variance to compute the mean and variance of λ opt,i directly, µ post,i = µ i + E [δ f, y, x i, w i ] + x i w i m i x im i σ i σ post,i = σ i + s like,i 38a s like Var [δ f, y, x i, w i ] σi x i m i s like,i s. 38b like,i Finally, computing the Gaussian integral in Equation 1, and thereby taking into account the random changes in the weights across time is straightforward. We simply propagate the posterior distribution through the dynamics Equation 11, µ i t + 1 = σ i t + 1 = 1 1 µ post,i, 39a τ 1 1 σ τ post,i + σ prior. 39b τ Substituting Equation 38 into Equation 39 gives Equation

14 Finally, we must find the mean and variance of the posterior distribution over δ. We use Bayes theorem, P δ f, s like, x i, w i = P f δ P δ s like, x i, w i. 40 The prior is given by integrating out λ opt,i in Equation 31, P δ s like, x i, w i = N δ; xi w i m i, s like,i, 41 but the likelihood, P f δ, is specific to the feedback signal, and hence to the type of learning, as described below Supervised learning: continuous feedback For supervised learning with continuous feedback, the likelihood is a delta function, P f δ = δδ f, 4 so the posterior over δ Equation 40 is also delta function located at f Supervised learning: binary feedback For supervised learning with binary feedback, the likelihood is a step function, { Θ δ θ if f = 1 P f δ = 43 Θ δ θ if f = 1 so the posterior over δ Equation 40 is a truncated Gaussian, whose mean and variance can be computed using library functions Policy search For reinforcement learning, the likelihood is a pair of delta functions, P f δ = Θ f δδ f + δδ + f, 44 so the posterior over δ Equation 40 is a pair of delta-functions, with different weights, whose mean and variance can again be computed straightforwardly. 4. Unsupervised learning Unsupervised learning is considerably more difficult than supervised or reinforcement learning. We proceed by first we describing how to generate inputs from the distribution in Equation 9, then by describing how to perform inference. 14

15 4..1 Generating x We can rewrite the distribution in Equation 9 using the fact that a Gaussian with 0 variance is a delta-function, P x w opt, y opt lim L 0 N x; b, Λ N y opt ; j w opt,j x j, L 45 This has the same form as a standard linear Gaussian inference problem, where the first term, N x; b, Λ, is the prior, and the second term is the likelihood. The solution to such problems is well-known, P x w opt, y opt = N x; µ x yopt, Σ 46 where Σ = lim L 0 Λw optw T optλ Λ L + w opt Λw opt 47 µ x yopt = lim L 0 Σ L 1 y opt w opt + Λ 1 b. 48 We need to further manipulate µ x yopt before its limit becomes obvious, µ x yopt = lim L 0 Λw optw T optλ Λ L + w opt Λw opt L 1 y optw opt + Λ 1 b. 49 = b + y opt w T optb w T optλw opt Λw opt 50 We can therefore sample x given y opt by taking, x = x prior + Λw opt yopt woptλw T w T optx prior opt 51 where P x prior = N x prior ; b, Λ, 5 which can be confirmed by computing the moments of x that result from this process. 4.. Inference Now, we wish to compute the posterior, P λ opt,i y, s like, x i. While we would apply Bayes theorem directly, it turns out to be easier to introduce and integrate out another variable, as we did for supervised and reinforcement learning. In this case, the relevant variable is y opt, P λ opt,i y, s like, x i = dy opt P λ opt,i y opt, y, s like, x i P yopt y, s like, x i

16 The two distributions in the above expression are given by Bayes theorem, P λ opt,i y opt, y, s like, x i P λopt,i P x i y opt, s like, λ opt,i P y yopt, s like, λ opt,i, x i, 54 P y opt y, x i P y opt P x i y opt, s like P y yopt, s like, x i. 55 We must compute all the distributions in Equations 54 and 55. First, the distribution over x i given y opt and λ opt,i is given by Equation 50 and Equation 47 with b = 0, and where we approximate e λopt,i with its expected value, P x i y opt, λ opt,i N x i ; y optν i w opt,i n νwopt,i, ν i ν i m i + s i n νwopt,i. 56 The distribution over x i given y opt is similar, but e λopt,i is replaced by its expected value, y opt ν i m i P x i y opt N x i ; n, νwopt,i ν i ν i m i + s i n νwopt,i. 57 Note that in both of the above expressions, we have approximated w opt Λw opt using its expected value, w T optλw opt νw opt,i = ν e µ prior+σ prior. 58 Second, the distribution over y is given y opt again given by Equations 6 and 7, P y y opt, w i, λ opt,i = N y opt ; y + x i e λ opt,i m i xi w i m i, s like,i 59 P y y opt, w i, λ opt,i = N y opt ; y + x i e λ opt,i m i, s like,i 60 In order to actually compute the mean and variance of λ opt,i, we perform many of the algebraic steps in code, which both simplifies the required derivations, and reduces the probability of errors. First, we compute the mean and variance of the distribution over y opt by combining, in code, the two Gaussians in Equation 57 and Equation 60. We combine the resulting Gaussian with the Laplacian prior in Equation 8 using generic routines. In particular, a generic version of this problem can be written, x µ x m P x exp σ 61 b exp P x exp x µ σ x µ σ + x m b x m b if x < m otherwise. 6 16

17 Writing this distribution as Truncated Gaussians and incorporating constant factors A and B, we get, A exp P x = B exp x x µ+ σ b σ µ σ b σ if x < m otherwise. 63 To solve for A and B, we exploit the fact that at x = m, the probability density is continuous, and the distribution must normalize. The resulting distribution is a mixture of two truncated Gaussians, whose mean and variance can be computed using standard routines. Finally, we compute the distribution over λ opt,i for a given y opt using a similar approach, combining Equation 56, Equation 59, and the approximation, Equation 3 in code. Finally, while we could sample multiple values of y opt, and compute the resulting mean and variance of λ opt,i, this is slow and gives a stochastic learning rule. Instead, an accurate, deterministic estimate is given by a sigma-point approximation [4], which involves sampling y opt at only the mean plus the standard deviation, and the mean minus the standard deviation. 4.3 Classical learning rules In order to make comparisons in Figure, we need to specify classical learning rules for each type of learning. Each classical rule has a learning rate, α, which is changed in Figure. For supervised learning with continuous feedback, we use the delta rule, m i t + 1 = m i t + αfx i t. 64 For supervised learning with binary feedback, we use something similar to the delta rule, where the updates have been calibrated to ensure that increases balance decreases, m i t + 1 = m i t + α [1 f = 1 P f = 1 1 f = 1 P f = 1] 65 where 1 is the indicator function, which is 1 if the condition is true, and 0 if the condition is false. The probability P f = 1 is computed by taking an average over past observations, and is necessary ensure that there is bias caused by having a threshold that is far from 0. For reinforcement learning, we use a standard policy gradient method, m i t + 1 = m i t α f E [f] w i m i 66 where the expected loss, E [f] is computed over past trials. For unsupervised learning, we use a standard hebbian like learning rule, inspired by our equations for inference, m i t + 1 = m i t + α 1 a 1 y signy b 67 Note that if x i is positive and y is large and positive, then the weight is increased as a is typically smaller than 1, whereas if y is small, then the weight may fall. 17

18 4.4 Mean field We wish to find relationships between the measured quantities, m i and σ i. To do so, we look at Equation 18b in steady state i.e. where, σ i t + 1 = σ i t. To simplify our derivations, we neglect the reduction in the variance caused by the slow drift towards the mean the 1 1/τ term, as this only becomes important when there is very little incoming information. Although s prior s post, s prior x i, and s prior vary rapidly over short time periods, we expect their variations to be largely uncorrelated, so we can take averages separately, which allows us to write, σ τ = s like,i s post s like,i By rearranging and taking a square root, we get 4.5 Predictions t s 4 i ν i/m i. 68 s like,i s i m i νi To derive Equation 0, we ask how a weight should change during an LTP protocol. As µ i is the mean of the log-weight, The learning rules Equation 18a give, t δm i m i δµ i. 70 As s i σ i m i, we can write, δm i m i σ i m i. 71 δm i m i s i m i. 7 Finally, substituting the mean field result in Equation 69 gives Equation 0. 18

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What

More information

Probabilistic Models in Theoretical Neuroscience

Probabilistic Models in Theoretical Neuroscience Probabilistic Models in Theoretical Neuroscience visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Neural models of probabilistic sampling: introduction

More information

Sampling-based probabilistic inference through neural and synaptic dynamics

Sampling-based probabilistic inference through neural and synaptic dynamics Sampling-based probabilistic inference through neural and synaptic dynamics Wolfgang Maass for Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology, Austria Institute

More information

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity: Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Computational Explorations in Cognitive Neuroscience Chapter 2

Computational Explorations in Cognitive Neuroscience Chapter 2 Computational Explorations in Cognitive Neuroscience Chapter 2 2.4 The Electrophysiology of the Neuron Some basic principles of electricity are useful for understanding the function of neurons. This is

More information

An Introductory Course in Computational Neuroscience

An Introductory Course in Computational Neuroscience An Introductory Course in Computational Neuroscience Contents Series Foreword Acknowledgments Preface 1 Preliminary Material 1.1. Introduction 1.1.1 The Cell, the Circuit, and the Brain 1.1.2 Physics of

More information

Consider the following spike trains from two different neurons N1 and N2:

Consider the following spike trains from two different neurons N1 and N2: About synchrony and oscillations So far, our discussions have assumed that we are either observing a single neuron at a, or that neurons fire independent of each other. This assumption may be correct in

More information

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey Outline NIP: Hebbian Learning Neural Information Processing Amos Storkey 1/36 Overview 2/36 Types of Learning Types of learning, learning strategies Neurophysiology, LTP/LTD Basic Hebb rule, covariance

More information

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017 The Bayesian Brain Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester May 11, 2017 Bayesian Brain How do neurons represent the states of the world? How do neurons represent

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

How do biological neurons learn? Insights from computational modelling of

How do biological neurons learn? Insights from computational modelling of How do biological neurons learn? Insights from computational modelling of neurobiological experiments Lubica Benuskova Department of Computer Science University of Otago, New Zealand Brain is comprised

More information

Plasticity and Learning

Plasticity and Learning Chapter 8 Plasticity and Learning 8.1 Introduction Activity-dependent synaptic plasticity is widely believed to be the basic phenomenon underlying learning and memory, and it is also thought to play a

More information

Food delivered. Food obtained S 3

Food delivered. Food obtained S 3 Press lever Enter magazine * S 0 Initial state S 1 Food delivered * S 2 No reward S 2 No reward S 3 Food obtained Supplementary Figure 1 Value propagation in tree search, after 50 steps of learning the

More information

Math in systems neuroscience. Quan Wen

Math in systems neuroscience. Quan Wen Math in systems neuroscience Quan Wen Human brain is perhaps the most complex subject in the universe 1 kg brain 10 11 neurons 180,000 km nerve fiber 10 15 synapses 10 18 synaptic proteins Multiscale

More information

Biological Modeling of Neural Networks

Biological Modeling of Neural Networks Week 4 part 2: More Detail compartmental models Biological Modeling of Neural Networks Week 4 Reducing detail - Adding detail 4.2. Adding detail - apse -cable equat Wulfram Gerstner EPFL, Lausanne, Switzerland

More information

Biological Modeling of Neural Networks:

Biological Modeling of Neural Networks: Week 14 Dynamics and Plasticity 14.1 Reservoir computing - Review:Random Networks - Computing with rich dynamics Biological Modeling of Neural Networks: 14.2 Random Networks - stationary state - chaos

More information

Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks

Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks Yanping Huang University of Washington huangyp@cs.uw.edu Rajesh P.N. Rao University of Washington rao@cs.uw.edu Abstract

More information

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1 Comparing integrate-and-fire models estimated using intracellular and extracellular data 1 Liam Paninski a,b,2 Jonathan Pillow b Eero Simoncelli b a Gatsby Computational Neuroscience Unit, University College

More information

Fast neural network simulations with population density methods

Fast neural network simulations with population density methods Fast neural network simulations with population density methods Duane Q. Nykamp a,1 Daniel Tranchina b,a,c,2 a Courant Institute of Mathematical Science b Department of Biology c Center for Neural Science

More information

Emergence of resonances in neural systems: the interplay between adaptive threshold and short-term synaptic plasticity

Emergence of resonances in neural systems: the interplay between adaptive threshold and short-term synaptic plasticity Emergence of resonances in neural systems: the interplay between adaptive threshold and short-term synaptic plasticity Jorge F. Mejias 1,2 and Joaquín J. Torres 2 1 Department of Physics and Center for

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits Wolfgang Maass, Robert Legenstein, Nils Bertschinger Institute for Theoretical Computer Science Technische

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/319/5869/1543/dc1 Supporting Online Material for Synaptic Theory of Working Memory Gianluigi Mongillo, Omri Barak, Misha Tsodyks* *To whom correspondence should be addressed.

More information

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto 1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost

More information

Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity

Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity The Journal of Neuroscience, September 20, 2006 26(38):9673 9682 9673 Behavioral/Systems/Cognitive Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity Jean-Pascal Pfister and Wulfram Gerstner

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Spiking Neuron Implementation of the Helmholtz Machine

Spiking Neuron Implementation of the Helmholtz Machine 1 Spiking Neuron Implementation of the Helmholtz Machine Pavel Sountsov 1,2, Paul Miller 2,3, 1 Neuroscience Graduate Program, Brandeis University, Waltham, MA, USA 2 Volen National Center for Complex

More information

Dynamic Working Memory in Recurrent Neural Networks

Dynamic Working Memory in Recurrent Neural Networks Dynamic Working Memory in Recurrent Neural Networks Alexander Atanasov Research Advisor: John Murray Physics 471 Fall Term, 2016 Abstract Recurrent neural networks (RNNs) are physically-motivated models

More information

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern. 1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend

More information

Neuron. Detector Model. Understanding Neural Components in Detector Model. Detector vs. Computer. Detector. Neuron. output. axon

Neuron. Detector Model. Understanding Neural Components in Detector Model. Detector vs. Computer. Detector. Neuron. output. axon Neuron Detector Model 1 The detector model. 2 Biological properties of the neuron. 3 The computational unit. Each neuron is detecting some set of conditions (e.g., smoke detector). Representation is what

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback Robert Legenstein, Dejan Pecevski, Wolfgang Maass Institute for Theoretical Computer Science Graz

More information

Lateral organization & computation

Lateral organization & computation Lateral organization & computation review Population encoding & decoding lateral organization Efficient representations that reduce or exploit redundancy Fixation task 1rst order Retinotopic maps Log-polar

More information

Learning Spatio-Temporally Encoded Pattern Transformations in Structured Spiking Neural Networks 12

Learning Spatio-Temporally Encoded Pattern Transformations in Structured Spiking Neural Networks 12 Learning Spatio-Temporally Encoded Pattern Transformations in Structured Spiking Neural Networks 12 André Grüning, Brian Gardner and Ioana Sporea Department of Computer Science University of Surrey Guildford,

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Linearization of F-I Curves by Adaptation

Linearization of F-I Curves by Adaptation LETTER Communicated by Laurence Abbott Linearization of F-I Curves by Adaptation Bard Ermentrout Department of Mathematics, University of Pittsburgh, Pittsburgh, PA 15260, U.S.A. We show that negative

More information

Memory capacity of networks with stochastic binary synapses

Memory capacity of networks with stochastic binary synapses Memory capacity of networks with stochastic binary synapses Alexis M. Dubreuil 1,, Yali Amit 3 and Nicolas Brunel 1, 1 UMR 8118, CNRS, Université Paris Descartes, Paris, France Departments of Statistics

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

How to read a burst duration code

How to read a burst duration code Neurocomputing 58 60 (2004) 1 6 www.elsevier.com/locate/neucom How to read a burst duration code Adam Kepecs a;, John Lisman b a Cold Spring Harbor Laboratory, Marks Building, 1 Bungtown Road, Cold Spring

More information

Sparse Coding as a Generative Model

Sparse Coding as a Generative Model Sparse Coding as a Generative Model image vector neural activity (sparse) feature vector other stuff Find activations by descending E Coefficients via gradient descent Driving input (excitation) Lateral

More information

Modelling stochastic neural learning

Modelling stochastic neural learning Modelling stochastic neural learning Computational Neuroscience András Telcs telcs.andras@wigner.mta.hu www.cs.bme.hu/~telcs http://pattern.wigner.mta.hu/participants/andras-telcs Compiled from lectures

More information

Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball

Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball Dimitri Probst 1,3, Wolfgang Maass 2, Henry Markram 1, and Marc-Oliver Gewaltig 1 1 Blue Brain Project, École Polytechnique

More information

Rate- and Phase-coded Autoassociative Memory

Rate- and Phase-coded Autoassociative Memory Rate- and Phase-coded Autoassociative Memory Máté Lengyel Peter Dayan Gatsby Computational Neuroscience Unit, University College London 7 Queen Square, London WCN 3AR, United Kingdom {lmate,dayan}@gatsby.ucl.ac.uk

More information

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics Processing of Time Series by Neural Circuits with iologically Realistic Synaptic Dynamics Thomas Natschläger & Wolfgang Maass Institute for Theoretical Computer Science Technische Universität Graz, ustria

More information

When is an Integrate-and-fire Neuron like a Poisson Neuron?

When is an Integrate-and-fire Neuron like a Poisson Neuron? When is an Integrate-and-fire Neuron like a Poisson Neuron? Charles F. Stevens Salk Institute MNL/S La Jolla, CA 92037 cfs@salk.edu Anthony Zador Salk Institute MNL/S La Jolla, CA 92037 zador@salk.edu

More information

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)

More information

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015 Flexible Gating of Contextual Influences in Natural Vision Odelia Schwartz University of Miami Oct 05 Contextual influences Perceptual illusions: no man is an island.. Review paper on context: Schwartz,

More information

Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity

Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity Bernhard Nessler 1 *, Michael Pfeiffer 1,2, Lars Buesing 1, Wolfgang Maass 1 1 Institute for Theoretical

More information

High-conductance states in a mean-eld cortical network model

High-conductance states in a mean-eld cortical network model Neurocomputing 58 60 (2004) 935 940 www.elsevier.com/locate/neucom High-conductance states in a mean-eld cortical network model Alexander Lerchner a;, Mandana Ahmadi b, John Hertz b a Oersted-DTU, Technical

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Experiments on the Consciousness Prior

Experiments on the Consciousness Prior Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to

More information

Correlations and neural information coding Shlens et al. 09

Correlations and neural information coding Shlens et al. 09 Correlations and neural information coding Shlens et al. 09 Joel Zylberberg www.jzlab.org The neural code is not one-to-one ρ = 0.52 ρ = 0.80 d 3s [Max Turner, UW] b # of trials trial 1!! trial 2!!.!.!.

More information

Causality and communities in neural networks

Causality and communities in neural networks Causality and communities in neural networks Leonardo Angelini, Daniele Marinazzo, Mario Pellicoro, Sebastiano Stramaglia TIRES-Center for Signal Detection and Processing - Università di Bari, Bari, Italy

More information

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Jonathan Pillow HHMI and NYU http://www.cns.nyu.edu/~pillow Oct 5, Course lecture: Computational Modeling of Neuronal Systems

More information

A gradient descent rule for spiking neurons emitting multiple spikes

A gradient descent rule for spiking neurons emitting multiple spikes A gradient descent rule for spiking neurons emitting multiple spikes Olaf Booij a, Hieu tat Nguyen a a Intelligent Sensory Information Systems, University of Amsterdam, Faculty of Science, Kruislaan 403,

More information

Activity Driven Adaptive Stochastic. Resonance. Gregor Wenning and Klaus Obermayer. Technical University of Berlin.

Activity Driven Adaptive Stochastic. Resonance. Gregor Wenning and Klaus Obermayer. Technical University of Berlin. Activity Driven Adaptive Stochastic Resonance Gregor Wenning and Klaus Obermayer Department of Electrical Engineering and Computer Science Technical University of Berlin Franklinstr. 8/9, 187 Berlin fgrewe,obyg@cs.tu-berlin.de

More information

Abstract. Author Summary

Abstract. Author Summary 1 Self-organization of microcircuits in networks of spiking neurons with plastic synapses Gabriel Koch Ocker 1,3, Ashok Litwin-Kumar 2,3,4, Brent Doiron 2,3 1: Department of Neuroscience, University of

More information

RESEARCH STATEMENT. Nora Youngs, University of Nebraska - Lincoln

RESEARCH STATEMENT. Nora Youngs, University of Nebraska - Lincoln RESEARCH STATEMENT Nora Youngs, University of Nebraska - Lincoln 1. Introduction Understanding how the brain encodes information is a major part of neuroscience research. In the field of neural coding,

More information

neural networks Balázs B Ujfalussy 17 october, 2016 idegrendszeri modellezés 2016 október 17.

neural networks Balázs B Ujfalussy 17 october, 2016 idegrendszeri modellezés 2016 október 17. neural networks Balázs B Ujfalussy 17 october, 2016 Hierarchy of the nervous system behaviour idegrendszeri modellezés 1m CNS 10 cm systems 1 cm maps 1 mm networks 100 μm neurons 1 μm synapses 10 nm molecules

More information

Fast and exact simulation methods applied on a broad range of neuron models

Fast and exact simulation methods applied on a broad range of neuron models Fast and exact simulation methods applied on a broad range of neuron models Michiel D Haene michiel.dhaene@ugent.be Benjamin Schrauwen benjamin.schrauwen@ugent.be Ghent University, Electronics and Information

More information

Novel VLSI Implementation for Triplet-based Spike-Timing Dependent Plasticity

Novel VLSI Implementation for Triplet-based Spike-Timing Dependent Plasticity Novel LSI Implementation for Triplet-based Spike-Timing Dependent Plasticity Mostafa Rahimi Azghadi, Omid Kavehei, Said Al-Sarawi, Nicolangelo Iannella, and Derek Abbott Centre for Biomedical Engineering,

More information

THE retina in general consists of three layers: photoreceptors

THE retina in general consists of three layers: photoreceptors CS229 MACHINE LEARNING, STANFORD UNIVERSITY, DECEMBER 2016 1 Models of Neuron Coding in Retinal Ganglion Cells and Clustering by Receptive Field Kevin Fegelis, SUID: 005996192, Claire Hebert, SUID: 006122438,

More information

The Spike Response Model: A Framework to Predict Neuronal Spike Trains

The Spike Response Model: A Framework to Predict Neuronal Spike Trains The Spike Response Model: A Framework to Predict Neuronal Spike Trains Renaud Jolivet, Timothy J. Lewis 2, and Wulfram Gerstner Laboratory of Computational Neuroscience, Swiss Federal Institute of Technology

More information

Tuning tuning curves. So far: Receptive fields Representation of stimuli Population vectors. Today: Contrast enhancment, cortical processing

Tuning tuning curves. So far: Receptive fields Representation of stimuli Population vectors. Today: Contrast enhancment, cortical processing Tuning tuning curves So far: Receptive fields Representation of stimuli Population vectors Today: Contrast enhancment, cortical processing Firing frequency N 3 s max (N 1 ) = 40 o N4 N 1 N N 5 2 s max

More information

High-dimensional geometry of cortical population activity. Marius Pachitariu University College London

High-dimensional geometry of cortical population activity. Marius Pachitariu University College London High-dimensional geometry of cortical population activity Marius Pachitariu University College London Part I: introduction to the brave new world of large-scale neuroscience Part II: large-scale data preprocessing

More information

Fundamentals of Computational Neuroscience 2e

Fundamentals of Computational Neuroscience 2e Fundamentals of Computational Neuroscience 2e Thomas Trappenberg March 21, 2009 Chapter 9: Modular networks, motor control, and reinforcement learning Mixture of experts Expert 1 Input Expert 2 Integration

More information

Statistical models for neural encoding

Statistical models for neural encoding Statistical models for neural encoding Part 1: discrete-time models Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary discussion 1: Most excitatory and suppressive stimuli for model neurons The model allows us to determine, for each model neuron, the set of most excitatory and suppresive features. First,

More information

Finding a Basis for the Neural State

Finding a Basis for the Neural State Finding a Basis for the Neural State Chris Cueva ccueva@stanford.edu I. INTRODUCTION How is information represented in the brain? For example, consider arm movement. Neurons in dorsal premotor cortex (PMd)

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Introduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity.

Introduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity. Introduction The goal of neuromorphic engineering is to design and implement microelectronic systems that emulate the structure and function of the brain. Address-event representation (AER) is a communication

More information

Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity

Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity Sander M. Bohte a Michael C. Mozer b a CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands b Dept. of Computer

More information

1 Balanced networks: Trading speed for noise

1 Balanced networks: Trading speed for noise Physics 178/278 - David leinfeld - Winter 2017 (Corrected yet incomplete notes) 1 Balanced networks: Trading speed for noise 1.1 Scaling of neuronal inputs An interesting observation is that the subthresold

More information

Scalable Inference for Neuronal Connectivity from Calcium Imaging

Scalable Inference for Neuronal Connectivity from Calcium Imaging Scalable Inference for Neuronal Connectivity from Calcium Imaging Alyson K. Fletcher Sundeep Rangan Abstract Fluorescent calcium imaging provides a potentially powerful tool for inferring connectivity

More information

Application of density estimation methods to quantal analysis

Application of density estimation methods to quantal analysis Application of density estimation methods to quantal analysis Koichi Yoshioka Tokyo Medical and Dental University Summary There has been controversy for the quantal nature of neurotransmission of mammalian

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Deep learning in the brain. Deep learning summer school Montreal 2017

Deep learning in the brain. Deep learning summer school Montreal 2017 Deep learning in the brain Deep learning summer school Montreal 207 . Why deep learning is not just for AI The recent success of deep learning in artificial intelligence (AI) means that most people associate

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Statistics Random Variables

Statistics Random Variables 1 Statistics Statistics are used in a variety of ways in neuroscience. Perhaps the most familiar example is trying to decide whether some experimental results are reliable, using tests such as the t-test.

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Reducing the Variability of Neural Responses: A Computational Theory of Spike-Timing-Dependent Plasticity

Reducing the Variability of Neural Responses: A Computational Theory of Spike-Timing-Dependent Plasticity LETTER Communicated by Gal Chechik Reducing the Variability of Neural Responses: A Computational Theory of Spike-Timing-Dependent Plasticity Sander M. Bohte sbohte@cwi.nl Netherlands Centre for Mathematics

More information

Ranking Neurons for Mining Structure-Activity Relations in Biological Neural Networks: NeuronRank

Ranking Neurons for Mining Structure-Activity Relations in Biological Neural Networks: NeuronRank Ranking Neurons for Mining Structure-Activity Relations in Biological Neural Networks: NeuronRank Tayfun Gürel a,b,1, Luc De Raedt a,b, Stefan Rotter a,c a Bernstein Center for Computational Neuroscience,

More information

Dynamical Constraints on Computing with Spike Timing in the Cortex

Dynamical Constraints on Computing with Spike Timing in the Cortex Appears in Advances in Neural Information Processing Systems, 15 (NIPS 00) Dynamical Constraints on Computing with Spike Timing in the Cortex Arunava Banerjee and Alexandre Pouget Department of Brain and

More information

Autoassociative recall as probabilistic inference

Autoassociative recall as probabilistic inference Supplementary Note Autoassociative recall as probabilistic inference We first specify the task for autoassociative recall in a normative manner 1,2. This specification leads to a natural account of the

More information

Correlations strike back (again): the case of associative memory retrieval

Correlations strike back (again): the case of associative memory retrieval Correlations strike back (again): the case of associative memory retrieval Cristina Savin 1 cs664@cam.ac.uk Peter Dayan 2 dayan@gatsby.ucl.ac.uk Máté Lengyel 1 m.lengyel@eng.cam.ac.uk 1 Computational &

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run

More information

Leo Kadanoff and 2d XY Models with Symmetry-Breaking Fields. renormalization group study of higher order gradients, cosines and vortices

Leo Kadanoff and 2d XY Models with Symmetry-Breaking Fields. renormalization group study of higher order gradients, cosines and vortices Leo Kadanoff and d XY Models with Symmetry-Breaking Fields renormalization group study of higher order gradients, cosines and vortices Leo Kadanoff and Random Matrix Theory Non-Hermitian Localization in

More information

Neurophysiology. Danil Hammoudi.MD

Neurophysiology. Danil Hammoudi.MD Neurophysiology Danil Hammoudi.MD ACTION POTENTIAL An action potential is a wave of electrical discharge that travels along the membrane of a cell. Action potentials are an essential feature of animal

More information

Mathematical Models of Dynamic Behavior of Individual Neural Networks of Central Nervous System

Mathematical Models of Dynamic Behavior of Individual Neural Networks of Central Nervous System Mathematical Models of Dynamic Behavior of Individual Neural Networks of Central Nervous System Dimitra Despoina Pagania 1, Adam Adamopoulos 1,2 and Spiridon D. Likothanassis 1 1 Pattern Recognition Laboratory,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Introduction. Stochastic Processes. Will Penny. Stochastic Differential Equations. Stochastic Chain Rule. Expectations.

Introduction. Stochastic Processes. Will Penny. Stochastic Differential Equations. Stochastic Chain Rule. Expectations. 19th May 2011 Chain Introduction We will Show the relation between stochastic differential equations, Gaussian processes and methods This gives us a formal way of deriving equations for the activity of

More information

Balance of Electric and Diffusion Forces

Balance of Electric and Diffusion Forces Balance of Electric and Diffusion Forces Ions flow into and out of the neuron under the forces of electricity and concentration gradients (diffusion). The net result is a electric potential difference

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Introduction to Neural Networks U. Minn. Psy 5038 Spring, 1999 Daniel Kersten. Lecture 2a. The Neuron - overview of structure. From Anderson (1995)

Introduction to Neural Networks U. Minn. Psy 5038 Spring, 1999 Daniel Kersten. Lecture 2a. The Neuron - overview of structure. From Anderson (1995) Introduction to Neural Networks U. Minn. Psy 5038 Spring, 1999 Daniel Kersten Lecture 2a The Neuron - overview of structure From Anderson (1995) 2 Lect_2a_Mathematica.nb Basic Structure Information flow:

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information