Sampling-based probabilistic inference through neural and synaptic dynamics

Similar documents
Modelling stochastic neural learning

Probabilistic Models in Theoretical Neuroscience

Noise as a Resource for Computation and Learning in Networks of Spiking Neurons

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

arxiv: v1 [cs.ne] 19 Sep 2015

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

Biological Modeling of Neural Networks:

Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

How do biological neurons learn? Insights from computational modelling of

Learning Spatio-Temporally Encoded Pattern Transformations in Structured Spiking Neural Networks 12

Consider the following spike trains from two different neurons N1 and N2:

The Planning-by-Inference view on decision making and goal-directed behaviour

arxiv: v1 [q-bio.nc] 17 Jul 2017

arxiv: v3 [cs.lg] 9 Aug 2016

Markov Transitions between Attractor States in a Recurrent Neural Network

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

Biosciences in the 21st century

Dynamic Modeling of Brain Activity

First Passage Time Calculations

arxiv: v1 [q-bio.nc] 13 Nov 2013

On the Computational Complexity of Networks of Spiking Neurons

Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity

Deep learning in the brain. Deep learning summer school Montreal 2017

Principles of DCM. Will Penny. 26th May Principles of DCM. Will Penny. Introduction. Differential Equations. Bayesian Estimation.

Lecture 4: Importance of Noise and Fluctuations

Quantum stochasticity and neuronal computations

How to do backpropagation in a brain

Math in systems neuroscience. Quan Wen

Lecture 14 Population dynamics and associative memory; stable learning

Neuron. Detector Model. Understanding Neural Components in Detector Model. Detector vs. Computer. Detector. Neuron. output. axon

THE MAN WHO MISTOOK HIS COMPUTER FOR A HAND

Dynamical Constraints on Computing with Spike Timing in the Cortex

Computational Explorations in Cognitive Neuroscience Chapter 2

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball

Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Dynamic Causal Modelling for fmri

Localized Excitations in Networks of Spiking Neurons

Computing with Inter-spike Interval Codes in Networks of Integrate and Fire Neurons

COMP304 Introduction to Neural Networks based on slides by:

Dynamic Causal Modelling for EEG/MEG: principles J. Daunizeau

Credit Assignment: Beyond Backpropagation

arxiv: v1 [stat.co] 2 Nov 2017

, and rewards and transition matrices as shown below:

Effective Connectivity & Dynamic Causal Modelling

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Statistical Machine Learning

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Information Dynamics Foundations and Applications

3 Detector vs. Computer

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

A Model for Real-Time Computation in Generic Neural Microcircuits

The Mixed States of Associative Memories Realize Unimodal Distribution of Dominance Durations in Multistable Perception

Evaluation Methods for Topic Models

Rate- and Phase-coded Autoassociative Memory

Neuronal Dynamics: Computational Neuroscience of Single Neurons

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Analyzing Neuroscience Signals using Information Theory and Complexity

Linking non-binned spike train kernels to several existing spike train metrics

Systems Biology: A Personal View IX. Landscapes. Sitabhra Sinha IMSc Chennai

Probabilistic Graphical Models

An Introductory Course in Computational Neuroscience

Machine Learning I Continuous Reinforcement Learning

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Probabilistic Graphical Models for Image Analysis - Lecture 1

Introduction to Neural Networks

Marr's Theory of the Hippocampus: Part I

DEVS Simulation of Spiking Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

This script will produce a series of pulses of amplitude 40 na, duration 1ms, recurring every 50 ms.

Bayesian synaptic plasticity makes predictions about plasticity experiments in vivo

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Signal, donnée, information dans les circuits de nos cerveaux

CPSC 540: Machine Learning

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Lecture : Probabilistic Machine Learning

Gaussian Process Approximations of Stochastic Differential Equations

Sean Escola. Center for Theoretical Neuroscience

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Nonparametric Bayesian Methods (Gaussian Processes)

STUDENT PAPER. Santiago Santana University of Illinois, Urbana-Champaign Blue Waters Education Program 736 S. Lombard Oak Park IL, 60304

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Handbook of Stochastic Methods

Biological Modeling of Neural Networks

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Dendrites - receives information from other neuron cells - input receivers.

Chasing down the neural code with mathematics and modeling

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015

Unraveling the mysteries of stochastic gradient descent on deep neural networks

STA414/2104 Statistical Methods for Machine Learning II

Artificial Neural Network and Fuzzy Logic

Novel VLSI Implementation for Triplet-based Spike-Timing Dependent Plasticity

Transcription:

Sampling-based probabilistic inference through neural and synaptic dynamics Wolfgang Maass for Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology, Austria Institute for Theoretical Computer Science http://www.igi.tugraz.at/legi/

I will address two time scales of sampling in neural networks: Part I: Neural Sampling Time scale of spikes up to short term dynamics of synapses and neurons Part II: Synaptic Sampling Time scale of synaptic plasticity and rewiring Question: On which of these two time scales can (and does) the brain implement probabilistic inference through sampling?.

Part I: Neural Sampling Inspiration from data: Berkes et al. recorded the distribution of network states with 16 electrodes in area V1 of ferrets: Their interpretation: This distribution of network states becomes during development an internal model for visual inputs P. Berkes, G. Orban, M. Lengyel, and J. Fiser. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science, 331:83-87, 2011

Such stochastic internal models in brain networks have in fact been proposed by many researchers K. Friston, A theory of cortical responses, 2005 E. Vul and H. Pashler, Measuring the crowd within, 2008 S. Denison, E.B. Bonawitz, A. Gopnik, T.L. Griffiths, Preschoolers sample from probability distributions, 2010, Some of this work suggests furthermore, that the brain carries out probabilistic inference from distributions p through MCMC sampling from p.. Assume for example that = is the stationary distribution over K binary random variables, each represented by one neuron, of a brain network C. If concrete values e ( evidence ) are plugged in for some of these variables then the posterior marginal can be estimated by observing the firing rate of the neuron variable. that is associated with the binary random

These studies suggest the analysis of stationary distributions of neural networks Stumbling block for theory: The Markov chains that are defined by networks of spiking neurons are nonreversable, both because of their inherent dynamics (a spike causes temporally extended changes in other neurons), and because of nonsymmetric synaptic connections.

General theoretical result Theorem: Virtually any network C of spiking neurons with noise has a unique stationary distribution of network states, to which it converges exponentially fast from any initial state. Two relevant notions of network state: A biological neural network can only be viewed as a MC if one moves to temporally extended notions of network state These are MCs with continuous time and continuous state spaces This Theorem is one of few that also hold for data-based nonlinear models of neurons and synapses. Habenschuss, Jonke, Maass, PLoS Comp. Biol. 2013

What types of probability distributions can arise as stationary distribution of a network of spiking neurons? I am considering here the same convention as Berkes et al. for relating spikes to bits:...

We cannot use the common detailed balance criterion for verifying that a particular distribution p is the stationary distribution, but the Neural Computability Condition (NCC) is a replacement Theorem: The Neural Computability Condition (NCC) provides a sufficient condition for representing p through a network of spiking neurons: For each RV there is some neuron whose membrane potential at time t is Note: This result holds rigorously only for an idealized type of spiking neuron with firing probabilty and step functions as PSPs But numerical simulations suggest that the error is not large for biologically more realistic models. Büsing, Bill, Nessler, Maass. PLoS Comp. Biology, 2011

Neural Computability Condition (NCC) For p with only 2nd-order dependencies, i.e.: Boltzmann distributions A network of spiking neurons with symmetric weights automatically satisfies the NCC. For p with arbitrary higher-order dependencies: Such p can be represented directly via networks of spiking neurons with asymmetric connections. In fact, such networks can learn to approximate any discrete distribution p from examples (drawn from p). L. Büsing, J. Bill, B. Nessler, and W. Maass. PLOS Comp. Biol. 2011 Neural sampling is structrally different from Gibbs sampling, also for this case! (Jonke, Habenschuss, Maass, Arxiv 2015) D. Pecevski, L. Büsing, W. Maass, PLOS Comp. Biol.,.2011 D. Pecevski, W. Maass, 2015 (under review)

Some open questions about neural sampling What is the speed (rhythm) of neural sampling in the brain? Apparently fastest known sampling-like dynamics (Jezek et al, Nature 2011): Blue and red colors indicate strong correlation with the place map of the blue or red maze ( context ), Hardly any theta cycles with mixed contexts were found To what extent does the brain use results of neural sampling for decision making etc? Can stationary distributions be learnt from examples (that are generated by some external distribution p*) by networks of spiking neurons? How are behaviourally relevant random variables encoded?

Part II: Synaptic Sampling Many biological network parameters are fluctuating (more or less) all the time: A postsynaptic density consists of over 1000 different types of proteins, many in small numbers. Since these molecules have a lifetime of only weeks or months, their number is subject to permanent stochastic fluctuations. Receptors etc. are subject to Brownian motion within the membrane. Furthermore axons sprout and dendritic spines come and go on a time scale of days (even in adult cortex, perhaps even in the absence of neural activity) Data from Svoboda Lab

Longterm recordings show that neural codes drift on the time-scale of weeks and months Ziv, Y., Burns, L. D., Cocker, E. D., Hamel, E. O., Ghosh, K. K., Kitch, L. J.,... & Schnitzer, M. J.. Long-term dynamics of CA1 hippocampal place codes. Nature Neuroscience, 2013 See also: Rokni, U., Richardson, A. G., Bizzi, E., & Seung. Motor learning with unstable neural representations. Neuron, 2007 and forthcoming new data.

Mathematical framework for capturing these phenomena: Synaptic Sampling We model the evolution of network parameters through Stochastic Differential Equations (SDEs): dθ i = b log p (θ) dt + 2Tb dw θ i i drift diffusion The diffusion term dw i in the SDE denotes an infinitesimal step of a random walk ( Brownian motion), whose temporal evolution from time s to time t satisfies W i t W i s ~NORMAL 0, t s. time t

Mathematical framework for capturing these phenomena: Synaptic Sampling Integration of this SDE yields infinitely many different solutions for the evolution of the parameters. But the evolution of their probability density is given by a deterministic PDE (Fokker-Planck equation): t p FP(θ, t) = i θ i b θ i log p θ i x, θ \i p FP θ, t + 2 θ i 2 Tb p FP θ, t By setting the left-hand side to 0, this FP-equation makes the resulting stationary distribution 1 Z p (θ) 1 T for the vector θ of all network parameters θ i explicit. Implication: One can program into stochastic plasticity rules dθ i = desired target distributions b θ i log p (θ) dt + 1 Z p (θ) 1 T 2Tb dw i of the parameters.

Hence synaptic sampling can implement sampling from a posterior distribution of network parameters synaptic sampling with prior p S θ unsupervised learning (generative models) where p θ x p S θ p N x θ x are repeatedly occurring network inputs p N x θ is the generative model provided by a neural network N with parameters θ reinforcement learning p θ p S θ p N R = max θ) where R signals reward This integrates policy gradient RL with probabilistic inference. Kappel, Habenschuss, Legenstein, Maass; Network plasticity as Bayesian inference, PLoS Comp Biol, in press, and NIPS 2015 Kappel, Habenschuss, Legenstein, Maass; Reward-based network plasticity as Bayesian inference, RLDM 2015

Functional implications of synaptic sampling for learning 1. Better generalization (predicted by MacKay, 1992) 2. Structural plasticity can easily be integrated with synaptic plasticity in a principled manner 3. Automatic self-repair capabilities of the network (without requiring a clever supervisor that switches learning back on after a perturbation)

Spine dynamics and synaptic plasticity can easily be integrated into a SDE for a parameter that regulates both Ansatz: A single parameter θ i controls the spine volume and once a synaptic connection has been formed the weight of this synaptic connection. Not only STDP, but also experimentally observed power-law survival curves for synaptic connections are reproduced by this combined rule: Experimental data from (Löwenstein, Kuras, Rumpl, J. of Neuroscience, in press)

some slides with unpublished results are deleted

Summary of part II: Synaptic sampling One arrives at a new understanding of network learning The parameter vector permanently wanders around (with varying speed) n some very high-dimensional space of parameter values θ It spends most of its time in a low-dimensional submanifold where both the prior and the performance (likelihood of inputs, or reward probability) are high Hence changes in the input distribution or lesions are no big deal, since the parameter vector θ does not have a permanent home anyway Priors enable the network to combine experience dependent learning with structural rules in a theoretically optimal way (Bayesian inference)

Credits: David Kappel Stefan Habenschuss Dejan Pecevski Zeno Jonke Lars Buesing