Probabilistic Models in Theoretical Neuroscience visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Neural models of probabilistic sampling: introduction Matt Graham 16 th January 2014
Overview 1 Motivation What is the neural sampling hypothesis? Why is it interesting? Toy example 2 Theory review Stochastic networks Why sigmoidal conditionals? Boltzmann machines 3 Neural dynamics as sampling Introduction Overview of model Simulations 4 More recent work
Overview 1 Motivation What is the neural sampling hypothesis? Why is it interesting? Toy example 2 Theory review Stochastic networks Why sigmoidal conditionals? Boltzmann machines 3 Neural dynamics as sampling Introduction Overview of model Simulations 4 More recent work
Neural sampling hypothesis Model for probabilistic perception and learning. Proposes activity patterns across networks of neurons represent samples from posterior distribution over interpretations given input. Neural response variability uncertainty in interpretation of inputs. Spontaneous network activity samples from prior distribution over inputs and interpretations. Some experimental support from systematic variation in response variability, high degree of structure in spontaneous network activity and similarity to stimulus evoked activity.
Computational advantages (I) Anytime computing y 2 Increasing accuracy y 2 Decreasing time y 1 y 1
Computational advantages (II) Marginalisation at no extra cost y 2 y 1
Computational advantages (III) Consistency of representation distinction between input and output arbitrary hierarchical and recurrent models learning from examples naturally deals with incomplete input
Toy example Is a 'o' Is a 'b' Is a 'o' Is a '6' Is a '6' Is a 'b'
Toy example Is a 'o' Is a 'b' Is a 'o' Is a '6' Is a '6' Is a 'b'
Toy example Next to digits Is a 'o' Is a 'b' Is a 'o' Next to digits Is a '6' Is a '6' Is a 'b'
Overview 1 Motivation What is the neural sampling hypothesis? Why is it interesting? Toy example 2 Theory review Stochastic networks Why sigmoidal conditionals? Boltzmann machines 3 Neural dynamics as sampling Introduction Overview of model Simulations 4 More recent work
Stochastic binary neural network models Spiking point neuron models. Inter-neuron communication assumed to be entirely spike based (binary). Neural spiking stochastic - network dynamics define probability of each neuron firing given current state of network. Typically discrete time models - time binned into small intervals and network state defined as set of binary variables indicating if neurons fired in last interval or not.
General sigmoidal stochastic binary network (SSBN) Network of N binary neurons, states s = [s i ] i {1...N} {0, 1} N Parametrised by: weight matrix W = [w ij ] i,j {1...N} R N N bias vector b = [b i ] i {1...N} R N Local potential weighted sum of states of other units N ( ) u (t) i = w ij s (t) j + b i j=1 If unit i updated at t + 1, new state sampled from conditional P ( s (t+1) i = 1 s (t)) ( = σ u (t) i ) = 1 1 + e u(t) i Special case of more general Markov random field. σ(u) 1-1 0 1 u
A brief aside Is there any biological justification for using sigmoidal conditional distributions? (Yes)
Origins of stochasticity in biological neurons For a fixed injected current signal neural firing tends to be highly consistent Variability appears to mainly arise from synaptic transmission Number and transmitter content of synaptic vesicles released on arrival of a presynaptic spike both fluctuate Figure source: Mainen and Sejnowski (1995)
Synaptic noise model (I) Number of vesicles released Poisson distribution. Transmitter content of each vesicle Gaussian distribution. Figure source: Castillo and Katz (1954)
y Motivation Theory review Neural dynamics as sampling More recent work Synaptic noise model (II) 1.0 0.8 y =Φ(x) y =σ( πx) 0.6 0.4 0.2 0.0 4 3 2 1 0 1 2 3 4 x Assumption of independent distributions and large number of synaptic connections Central limit theorem conditional distribution on membrane potential given spiking state of rest of network Gaussian. conditional probability of neuron being super threshold and so spiking takes form of Gaussian CDF. Gaussian CDF Φ(x) well approximated by scaled sigmoid σ(x) = [1 + exp( x)] 1.
Boltzmann machines (BM) visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Analytically tractable variant of SSBN. Also known as an Ising model within statistical physics. Constrained to have symmetric connectivity (w ij = w ji i, j) and zero self-connectivity (w ii = 0 i). Visible units are fixed to known values, hidden units are freely varying. Restricted and semi-restricted BMs are special cases of general BM with restricted connectivity graphs allowing simpler updates.
Boltzmann machine dynamics Each time-step a single unit picked to update, either deterministic sequence or randomly (Gibbs sampling). Symmetric connectivity enforces detailed balance condition i.e. that transitions are reversible, guaranteeing existence of equilibrium distribution. ( ) ( ) P P s (t) = u ( s (t+1) = v P s (t+1) = v s (t) = u = ) ( ) P s (t) = u s (t+1) = v After initial burn in, dynamics of network cause it to sample from Boltzmann distribution at equilibrium P (s) = 1 Z exp ( E(s)) = 1 ( ) 1 Z exp 2 st Ws + b T s
Boltzmann machine learning Boltzmann machines can be trained so that the equilibrium distribution tends towards any arbitrary distribution across binary vectors given samples from that distribution 1. Log likelihood derivative (s = [ s T h s T ] T) v log [P (s v )] w ij = s h {s i s j P (s h s v )} s = s i s j + s i s j {s i s j P (s)} Expectations generally analytically intractable approximated with MCMC sampling based methods. Learning rule is local & Hebbian-like biologically plausible. For large networks, learning very slow due to need to allow network to converge to equilibrium distribution. 1 Ackley, Hinton and Sejnowski (1985)
Parallel updates and asymmetric connectivity Updating all units in parallel but maintaining symmetric connectivity gives different but still tractable equilibrium distribution and learning rule 2. Relaxing symmetry constraint generally means no longer tractable to find analytic form for stationary distribution and possible there will be none if Markov chain non-ergodic. Irreversibility introduced by weight asymmetry may however improve speed of convergence to stationary distribution while also being more biologically relevant. Learning rule can still be derived using time-dependent state distribution but this introduces requirement to take expectations over history of states 3. 2 Apolloni and de Falco (1990) 3 Apolloni, Bertoni, Campadelli and de Falco (1991)
Boltzmann machines as a model for cortical computation + Distributed computation + Binary communication between units + High representational power + Local learning rule - Discrete time formulation - Reversible dynamics - Symmetric connectivity - Slow convergence
Boltzmann machines as a model for cortical computation + Distributed computation + Binary communication between units + High representational power + Local learning rule - Discrete time formulation - Reversible dynamics - Symmetric connectivity - Slow convergence
Overview 1 Motivation What is the neural sampling hypothesis? Why is it interesting? Toy example 2 Theory review Stochastic networks Why sigmoidal conditionals? Boltzmann machines 3 Neural dynamics as sampling Introduction Overview of model Simulations 4 More recent work
Neural Dynamics as Sampling: A model for stochastic computation in recurrent networks of spiking neurons L. Buesing, J. Bill, B. Nessler and W. Maass - PLOS Computational Biology (2011) Demonstrates a network model with more biologically plausible dynamics than a BM which samples from a Boltzmann distribution. Consists of a recurrently connected network of spiking neurons with irreversible dynamics. Irreversible dynamics allow inclusion of refractory mechanism and finite duration post-synaptic potentials. Discrete time models with both absolute and relative refractory mechanisms demonstrated. Continuous time formulation shown as a limiting-case of discrete time dynamics.
Relation between spike activity and network state k 1 2 3 4 5 6 7 8 ζ k [t] z k [t] 9 0 2 0 0 7 3 10 1 0 1 0 0 1 1 1 t-τ t Network state defined by ζ[t] = [ζ 1 [t]... ζ N [t]] T with Markov property P (ζ[t + 1] ζ[t], ζ[t 1],...) = P (ζ[t + 1] ζ[t]) Here τ = absolute refractory period = PSP duration
Discrete time model with absolute refractory mechanism for k = 1 to N: if ζ k [t] > 1: ζ k [t] = ζ k [t 1] 1 else: u k = N j=1 (w kjz j [t]) + b k r rand(0, 1) z k [t] = r σ(u k log τ) if z k [t] = 1: ζ k [t] = τ else: ζ k [t] = 0
Discrete time model with relative refractory mechanism Relaxes assumption of hard refractory period with no firing. Probability of any neuron firing defined as product of functions of last firing time and membrane potential P (z k [t] = 1 ζ k [t 1], u k [t 1]) = f (u k [t 1]) g (ζ k [t 1])
Sampling from random distributions with relative refractory mechanism
Effect of using more realistic post-synaptic potentials
Toy model of perceptual multistability
Overview 1 Motivation What is the neural sampling hypothesis? Why is it interesting? Toy example 2 Theory review Stochastic networks Why sigmoidal conditionals? Boltzmann machines 3 Neural dynamics as sampling Introduction Overview of model Simulations 4 More recent work
Bayesian computation emerges in generic cortical microcircuits through spike-timing-dependent plasticity B. Nessler, M. Pfeiffer, L. Buesing and W. Maass - PLOS Computational Biology (2013) Proposes biologically plausible probabilistic learning rule Spike timing dependent plasticity updates within soft winner take all cortical microcircuits shown to approximate expectation maximisation Limited to single layer networks in this paper, proposes potentially could be extended to deep and/or recurrent structures
Stochastic Computations in Cortical Microcircuit Models S. Habenschuss, Z. Jonke and W Maass - PLOS Computational Biology (2013) Shows that under quite general conditions, the activity of a network of neurons with some degree of stochasticity in dynamics will converge to a stationary distribution Oscillatory input / activity shown to lead to phase specific stationary distributions. Simulations performed with cortical microcircuit model with anatomically based laminar structure with separate inhibitory / excitatory populations and data-based network connectivity and short-term dynamics.
Thank you - any questions? References Mainen, Z. F., & Sejnowski, T. J. (1995). Reliability of spike timing in neocortical neurons. Science, 268(5216), 1503-1506. Del Castillo, J., & Katz, B. (1954). Quantal components of the end-plate potential. The Journal of physiology, 124(3), 560-573. Ackley, D., Hinton, G., and Sejnowski, T. (1985). A Learning Algorithm for Boltzmann Machines. Cognitive Science, 9(1):147-169. Apolloni, B., & de Falco, D. (1991). Learning by parallel Boltzmann machines. Information Theory, IEEE Transactions on, 37(4), 1162-1165. Apolloni, B., Bertoni, A., Campadelli, P., & de Falco, D. (1991). Asymmetric Boltzmann machines. Biological cybernetics, 66(1), 61-70. Buesing, L., Bill, J., Nessler, B., & Maass, W. (2011). Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS computational biology, 7(11), e1002211. Resources Kappen, H. J. (2001). An introduction to stochastic neural networks. Handbook of Biological Physics, 4, 517-552. Hinton, G. H. (2007) Boltzmann machine. Scholarpedia, 2(5):1668.